Xem mẫu

Optimal Constituent Alignment with Edge Covers for Semantic Projection Sebastian Padó Computational Linguistics Saarland University Saarbrücken, Germany pado@coli.uni-sb.de Mirella Lapata School of Informatics University of Edinburgh Edinburgh, UK mlap@inf.ed.ac.uk Abstract Given a parallel corpus, semantic projec-tion attempts to transfer semantic role an-notations from one language to another, typically by exploiting word alignments. In this paper, we present an improved method for obtaining constituent align-ments between parallel sentences to guide the role projection task. Our extensions are twofold: (a) we model constituent alignment as minimum weight edge cov-ers in a bipartite graph, which allows us to findagloballyoptimalsolutionefficiently; (b)weproposetreepruningasapromising strategy for reducing alignment noise. Ex-perimental results on an English-German parallelcorpusdemonstrateimprovements over state-of-the-art models. 1 Introduction these are typically words but can also be chunks or syntactic constituents; (b) inducing alignments between the projection units and projecting anno-tations along these alignments; (c) reducing the amountofnoiseintheprojectedannotations,often duetoerrorsandomissionsinthewordalignment. Thedegreetowhichanalysesareparallelacross languages is crucial for the success of projection approaches. A number of recent studies rely on this notion of parallelism and demonstrate that an-notations can be adequately projected for parts of speech (Yarowsky and Ngai, 2001; Hi and Hwa, 2005),chunks(YarowskyandNgai,2001),andde-pendencies (Hwa et al., 2002). In previous work (Padó and Lapata, 2005) we considered the annotation projection of seman-tic roles conveyed by sentential constituents such as AGENT, PATIENT, or INSTRUMENT. Semantic roles exhibit a high degree of parallelism across languages (Boas, 2005) and thus appear amenable Recent years have witnessed increased interest in data-driven methods for many natural language processing (NLP) tasks, ranging from part-of-speech tagging, to parsing, and semantic role la-belling.Thesuccessofthesemethodsisduepartly totheavailabilityoflargeamountsoftrainingdata annotated with rich linguistic information. Unfor-tunately, such resources are largely absent for al-most all languages except English. Given the data requirements for supervised learning, and the cur-rent paucity of suitable data for many languages, methods for generating annotations (semi-)auto-matically are becoming increasingly popular. Annotation projection tackles this problem by leveraging parallel corpora and the high-accuracy tools (e.g., parsers, taggers) available for a few languages. Specifically, through the use of word alignments, annotations are transfered from resource-rich languages onto low density ones. The projection process can be decomposed into threesteps:(a)determiningtheunitsofprojection; to projection. Furthermore, corpora labelled with semantic role information can be used to train shallow semantic parsers (Gildea and Jurafsky, 2002), which could in turn benefit applications in need of broad-coverage semantic analysis. Exam-ples include question answering, information ex-traction, and notably machine translation. Our experiments concentrated primarily on the first projection step, i.e., establishing the right level of linguistic analysis for effecting projec-tion. We showed that projection schemes based onconstituentalignmentssignificantlyoutperform schemes that rely exclusively on word alignments. Alocaloptimisationstrategywasusedtofindcon-stituent alignments, while relying on a simple fil-tering technique to handle noise. The study described here generalises our earlier semantic role projection framework in two impor-tant ways. First, we formalise constituent projec-tionasthesearchforaminimumweightedgecover in a weighted bipartite graph. This formalisation 1161 Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 1161–1168, Sydney, July 2006. 2006 Association for Computational Linguistics efficiently yields constituent alignments that are globally optimal. Second, we propose tree prun-ingasageneralnoisereductionstrategy,whichex-ploits both structural and linguistic information to enable projection. Furthermore, we quantitatively assesstheimpactofnoiseonthetaskbyevaluating both on automatic and manual word alignments. In Section 2, we describe the task of role-semantic projection and the syntax-based frame-work introduced in Padó and Lapata (2005). Sec-tion 3 explains how semantic role projection can be modelled with minimum weight edge covers in bipartite graphs. Section 4 presents our tree prun-ing strategy. We present our evaluation framework and results in Section 5. A discussion of related and future work concludes the paper. 2 Cross-lingual Semantic Role projection Semantic role projection is illustrated in Figure 1 using English and German as the source-target language pair. We assume a FrameNet-style se-mantic analysis (Fillmore et al., 2003). In this paradigm, the semantics of predicates and their arguments are described in terms of frames, con-ceptual structures which model prototypical situ-ations. The English sentence Kim promised to be on time in Figure 1 is an instance of the COM-MITMENT frame. In this particular example, the frame introduces two roles, i.e., SPEAKER (Kim) and MESSAGE (to be on time). Other possible, though unrealised, roles are ADDRESSEE, MES-SAGE, and TOPIC. The COMMITMENT frame can be introduced by promise and several other verbs and nouns such as consent or threat. We also assume that frame-semantic annota-tions can be obtained reliably through shallow semantic parsing.1 Following the assignment of semantic roles on the English side, (imperfect) word alignments are used to infer semantic align-ments between constituents (e.g., to be on time is aligned with pünktlich zu kommen), and the role labels are transferred from one language to the other. Note that role projection can only take place if the source predicate (here promised) is word-alignedtoatargetpredicate(hereversprach) evoking the same frame; if this is not the case (e.g., in metaphors), projected roles will not be generally appropriate. We represent the source and target sentences as sets of linguistic units, Us and U , respectively. 1See Carreras and Màrquez (2005) for an overview of re-cent approaches to semantic parsing. Commitment NP S Kim promised to be on time Kim versprach, pünktlich zu kommen NP S Commitment Figure 1: Projection of semantic roles from En-glish to German (word alignments as dotted lines) The assignment of semantic roles on the source side is a function roles : R → 2Us from roles to sets of source units. Constituent alignments are obtained in two steps. First, a real-valued func-tion sim : Us ×U → R estimates pairwise simi-larities between source and target units. To make our model robust to alignment noise, we use only content words to compute the similarity func-tion. Next, a decision procedure uses the similar-ity function to determine the set of semantically equivalent, i.e., aligned units A⊆Us ×U . Once A is known, semantic projection reduces to transfer-ring the semantic roles from the source units onto their aligned target counterparts: rolet(r)={ut |∃us ∈roles(r):(us,ut)∈A} In Padó and Lapata (2005), we evaluated two main parameters within this framework: (a) the choiceoflinguisticunitsand(b)methodsforcom-puting semantic alignments. Our results revealed thatconstituent-basedmodelsoutperformedword-based ones by a wide margin (0.65 Fscore vs. 0.46), thus demonstrating the importance of bracketing in amending errors and omissions in the automatic word alignment. We also com-pared two simplistic alignment schemes, back-ward alignment and forward alignment. The first scheme aligns each target constituent to its most similar source constituent, whereas the sec-ond (Af) aligns each source constituent to its most similar target constituent: Af ={(us,ut)|ut =argmaxsim(us,u0)} ut∈U 1162 An example constituent alignment obtained from the forward scheme is shown in Figure 2 (left side). The nodes represent constituents in the source and target language and the edges indicate the resulting alignment. Forward alignment gener-ally outperformed backward alignment (0.65 Fs-core vs. 0.45). Both procedures have a time com-plexity quadratic in the maximal number of sen-tence nodes: O(|Us||U |) = O(max(|Us|,|U |)2). Ashortcomingcommontobothdecisionproce-dures is that they are local, i.e., they optimise the alignmentforeachnodeindependentlyofallother nodes. Consider again Figure 2. Here, the for-ward procedure creates alignments for all source nodes, but leaves constituents from the target set unaligned (see target node (1)). Moreover, local alignmentmethodsconstitutearatherweakmodel of semantic equivalence since they allow one tar-get node to correspond to any number of source nodes (see target node (3) in Figure 2, which is aligned to three source nodes). In fact, by allow-ing any alignment between constituents, the lo-cal models can disregard important linguistic in-formation, thus potentially leading to suboptimal results. We investigate this possibility by propos-ing well-understood global optimisation models which suitably constrain the resulting alignments. Besides matching constituents reliably, poor word alignments are a major stumbling block for achieving accurate projections. Previous re-searchaddressesthis probleminapost-processing step, by reestimating parameter values (Yarowsky and Ngai, 2001), by applying transformation rules (Hwa et al., 2002), by using manually la-belled data (Hi and Hwa, 2005), or by relying on linguistic criteria (Padó and Lapata, 2005). In this paper,wepresentanovelfilteringtechniquebased on tree pruning which removes extraneous con-stituents in a preprocessing stage, thereby disasso- ciating filtering from the alignment computation. In the remainder of this paper, we present the details of our global optimisation and filtering techniques. We only consider constituent-based models, since these obtained the best performance in our previous study (Padó and Lapata, 2005). 3 Globally optimal constituent alignment We model constituent alignment as a minimum weight bipartite edge cover problem. A bipartite graph is a graph G = (V,E) whose node set V is partitioned into two nonempty sets V1 and V2 in such a way that every edge E joins a node in V1 to a node in V2. In a weighted bipartite graph a weight is assigned to each edge. An edge cover is asubgraphofabipartitegraphsothateachnodeis linked to at least one node of the other partition. A minimum weight edge cover is an edge cover with the least possible sum of edge weights. In our projection application, the two parti-tions are the sets of source and target sentence constituents, Us and U , respectively. Each source node is connected to all target nodes and each tar-get node to all source nodes; these edges can be thoughtofaspotentialconstituentalignments.The edge weights, which represent the (dis)similarity between nodes us and ut are set to 1−sim(us,ut).2 The minimum weight edge cover then represents the alignment with the maximal similarity be-tween source and target constituents. Below, we present details on graph edge covers and a more restricted kind, minimum weight perfect bipartite matchings. We also discuss their computation. Edgecovers GivenabipartitegraphG,amin-imum weight edge cover Ae can be defined as: Ae = argmin ∑ 1−sim(us,ut) Edge cover E (us,ut)∈E An example edge cover is illustrated in Figure 2 (middle). Edge covers are somewhat more con-strained compared to the local model described above:allsourceandtargetnodeshavetotakepart in some alignment. We argue that this is desirable in modelling constituent alignment, since impor-tant linguistic units will not be ignored. As can be seen, edge covers allow one-to-many alignments whicharecommonwhentranslatingfromonelan-guage to another. For example, an English con-stituent might be split into several German con-stituents or alternatively two English constituents mightbemerged intoasingleGermanconstituent. In Figure 2, the source nodes (3) and (4) corre-spond to target node (4). Since each node of either side has to participate in at least one alignment, edge covers cannot account for insertions arising when constituents in the source language have no counterpart in their target language, or vice versa, as is the case for deletions. Weighted perfect bipartite matchings Per-fect bipartite matchings are a more constrained versionofedgecovers,inwhicheachnodehasex-actly one adjacent edge. This restricts constituent 2The choice of similarity function is discussed in Sec-tion 5. 1163 Us Ut Us Ut Us Ut r1 1 1 r1 1 1 r1 1 1 2 2 r1 2 2 r1 2 2 r1 r2 3 3 r2 3 3 r2 3 3 r2 4 4 r2 r2 4 4 r2 r2 4 4 r2 5 5 5 d 6 6 6 d Figure 2: Constituent alignments and role projections resulting from different decision procedures (Us,U : sets of source and target constituents; r1,r2: two semantic roles). Left: local forward alignment; middle: edge cover; right: perfect matching with dummy nodes alignment to a bijective function: each source constituent is linked to exactly one target con-stituent, and vice versa. Analogously, a minimum weight perfect bipartite matching Am is a mini-mum weight edge cover obeying the one-to-one constraint: Am = argmin ∑ 1−sim(us,ut) Matching M(us,ut)∈M An example of a perfect bipartite matching is given in Figure 2 (right), where each node has ex-actly one adjacent edge. Note that the target side contains two nodes labelled (d), a shorthand for the equivalent linear assignment problem (Jonker and Volgenant, 1987; time O(max(|Us|,|U |)3)). Their complexity is a linear factor slower than the quadratic runtime of the local optimisation meth-ods presented in Section 2. The computation of (general) edge covers has been investigated by Eiter and Mannila (1997) in thecontextofdistancemetricsforpointsets.They showthatedgecoverscanbereducedtominimum weight perfect matchings of an auxiliary bipar-tite graph with two partitions of size |Us|+|U |. This allows the computation of general minimum weight edge covers in time O((|Us|+|U |)3). “dummy” node. Since sentence pairs will often differ in length, the resulting graph partitions will 4 Filtering via Tree Pruning have different sizes as well. In such cases, dummy nodes are introduced in the smaller partition to enable perfect matching. Dummy nodes are as-signed a similarity of zero with all other nodes. Alignments to dummy nodes (such as for source nodes (3) and (6)) are ignored during projection. Perfect matchings are more restrictive models of constituent alignment than edge covers. Being bijective, the resulting alignments cannot model splitting or merging operations at all. Insertions and deletions can be modelled only indirectly by aligning nodes in the larger partition to dummy nodes on the other side (see the source side in Fig-ure 2 where nodes (3) and (6) are aligned to (d)). Section 5 assesses if these modelling limitations impact the quality of the resulting alignments. Weintroducetwofilteringtechniqueswhicheffec-tively remove constituents from source and target treesbeforealignmenttakesplace.Treepruningas apreprocessingstepismoregeneralandmoreeffi-cient than our original post-processing filter (Padó and Lapata, 2005) which was embedded into the similarity function. Not only does tree pruning not interfere with the similarity function but also re-duces the size of the graph, thus speeding up the algorithms discussed in the previous section. We present two instantiations of tree pruning: word-based filtering, which subsumes our earlier method,andargument-basedfiltering,whichelim-inates unlikely argument candidates. Word-based filtering This technique re-moves terminal nodes from parse trees accord- Algorithms Minimum weight perfect match- ing to certain linguistic or alignment-based crite- ings in bipartite graphs can be computed effi-ciently in cubic time using algorithms for net-work optimisation (Fredman and Tarjan, 1987; timeO(|Us|2log|Us|+|Us|2|U |))oralgorithmsfor ria. We apply two word-based filters in our ex-periments. The first removes non-content words, i.e., all words which are not adjectives, adverbs, verbs, or nouns, from the source and target sen- 1164 S VP entire English-German Europarl bitext as training data (20M words). We used the GIZA++ default settings to induce alignments for both directions (source-target, target-source). Following common S VP Kim versprach, pünktlich zu kommen. practiseinMT(Koehnetal.,2003),weconsidered only their intersection (bidirectional alignments are known to exhibit high precision). We also pro-duced manual word alignments for all sentences Figure 3: Filtering of unlikely arguments (predi-cate in boldface, potential arguments in boxes). tences (Padó and Lapata, 2005). We also use a novelfilterwhichremovesallwordswhichremain unaligned in the automatic word alignment. Non-terminal nodes whose terminals are removed by these filters, are also pruned. Argument filtering Previous work in shal-low semantic parsing has demonstrated that not all nodes in a tree are equally probable as seman-tic roles for a given predicate (Xue and Palmer, 2004). In fact, assuming a perfect parse, there is a “set of likely arguments”, to which almost all semantic roles roles should be assigned to. This set of likely arguments consists of all constituents which are a child of some ancestor of the pred-icate, provided that (a) they do not dominate the predicate themselves and (b) there is no sentence boundary between a constituent and its predicate. This definition covers long-distance dependencies such as control constructions for verbs, or support constructions for nouns and adjectives, and can be extended slightly to accommodate coordination. This argument-based filter reduces target trees to a set of likely arguments. In the example in Fig-ure 3, all tree nodes are removed except Kim and pünktlich zu kommen. 5 Evaluation Set-up Data For evaluation, we used the parallel cor-pus3 from our earlier work (Padó and Lapata, 2005). It consists of 1,000 English-German sen-tence pairs from the Europarl corpus (Koehn, 2005). The sentences were automatically parsed (using Collin’s 1997 parser for English and Dubey’s 2005 parser for German), and manually annotated with FrameNet-like semantic roles (see Padó and Lapata 2005 for details.) Word alignments were computed with the GIZA++ toolkit (Och and Ney, 2003), using the in our corpus, using the GIZA++ alignments as a startingpointandfollowingtheBlinkerannotation guidelines (Melamed, 1998). Method and parameter choice The con-stituent alignment models we present are unsu-pervised in that they do not require labelled data for inferring correct alignments. Nevertheless, our models have three parameters: (a) the similarity measure for identifying semantically equivalent constituents; (b) the filtering procedure for remov-ing noise in the data (e.g., wrong alignments); and (c) the decision procedure for projection. We retained the similarity measure introduced in Padó and Lapata (2005) which computes the overlap between a source constituent and its can-didate projection, in both directions. Let y(cs) and y(ct) denote the yield of a source and target con-stituent, respectively, and al(T) the union of all word alignments for a token set T: |y(ct)∩al(y(cs))||y(cs)∩al(y(ct))| s t |y(cs)| |y(ct)| We examined three filtering procedures (see Sec-tion 4): removing non-aligned words (NA), re-moving non-content words (NC), and removing unlikely arguments (Arg). These were combined with three decision procedures: local forward alignment (Forward), perfect matching (Perf-Match), and edge cover matching (EdgeCover) (see Section 3). We used Jonker and Vol-genant’s (1987) solver4 to compute weighted per-fect matchings. In order to find optimal parameter settings for our models, we split our corpus randomly into a development and test set (both 50% of the data) and examined the parameter space exhaustively on the development set. The performance of the best models was then assessed on the test data. The models had to predict semantic roles for Ger-man, using English gold standard roles as input, and were evaluated against German gold standard 3The corpus can be downloaded from http://www. coli.uni-saarland.de/~pado/projection/. 4The software is available from http://www. magiclogic.com/assignment.html. 1165 ... - tailieumienphi.vn
nguon tai.lieu . vn