Paraphrasing and Translation - part 1

Diễn giải và bản dịch trước đây đã được coi là nhiệm vụ xử lý ngôn ngữ tự nhiên không có liên quan. Trong khi đó, bản dịch đại diện cho việc bảo tồn có nghĩa là khi một ý tưởng là ra trong các từ trong một ngôn ngữ khác nhau, diễn giải đại diện cho việc bảo tồn có nghĩa là khi một ý tưởng được thể hiện bằng cách sử dụng các từ khác nhau trong cùng một ngôn ngữ Paraphrasing and Translation Chris Callison-Burch N Doctor of Philosophy Institute for Communicating and Collaborative Systems School of

Thể loại Tài liệu miễn phí Kỹ năng đọc tiếng Anh

Số trang 21

Ngày tạo 8/29/2018 11:35:05 PM +00:00

Loại tệp PDF

Kích thước 0.14 M

Tên tệp

Tải Paraphrasing and Translation - part 1 (.pdf)

Xem mẫu

Paraphrasing and Translation Chris Callison-Burch N Doctor of Philosophy Institute for Communicating and Collaborative Systems School of Informatics University of Edinburgh 2007 Abstract Paraphrasing and translation have previously been treated as unconnected natural lan-guage processing tasks. Whereas translation represents the preservation of meaning when an idea is rendered in the words in a different language, paraphrasing represents the preservation of meaning when an idea is expressed using different words in the same language. We show that the two are intimately related. The major contributions of this thesis are as follows: • We deﬁne a novel technique for automatically generating paraphrases using bilingual parallel corpora, which are more commonly used as training data for statistical models of translation. • We show that paraphrases can be used to improve the quality of statistical ma-chinetranslationbyaddressingtheproblemofcoverageandintroducingadegree of generalization into the models. • Weexplorethetopicofautomaticevaluationoftranslationquality,andshowthat the current standard evaluation methodology cannot be guaranteed to correlate with human judgments of translation quality. Whereas previous data-driven approaches to paraphrasing were dependent upon either data sources which were uncommon such as multiple translation of the same source text, or language speciﬁc resources such as parsers, our approach is able to harness more widely parallel corpora and can be applied to any language which has a parallel corpus. The technique was evaluated by replacing phrases with their para-phrases, and asking judges whether the meaning of the original phrase was retained and whether the resulting sentence remained grammatical. Paraphrases extracted from a parallel corpus with manual alignments are judged to be accurate (both meaningful and grammatical) 75% of the time, retaining the meaning of the original phrase 85% of the time. Using automatic alignments, meaning can be retained at a rate of 70%. Being a language independent and probabilistic approach allows our method to be easily integrated into statistical machine translation. A paraphrase model derived from parallel corpora other than the one used to train the translation model can be used to increase the coverage of statistical machine translation by adding translations of pre-viously unseen words and phrases. If the translation of a word was not learned, but a translation of a synonymous word has been learned, then the word is paraphrased iii and its paraphrase is translated. Phrases can be treated similarly. Results show that augmenting a state-of-the-art SMT system with paraphrases in this way leads to sig-niﬁcantlyimprovedcoverageandtranslationquality. Foratrainingcorpuswith10,000 sentencepairs, weincreasethecoverageofuniquetestsetunigramsfrom48%to90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches. iv Acknowledgements I had the great fortune to be doing research in machine translation at a time when the subject was just beginning to ﬂourish at Edinburgh. When I began my graduate work, I was the only person working on the topic at the university. As I leave, there are ﬁve other PhD students, three full-time researchers, and two faculty members all striving towards the same goal. The School of Informatics is undoubtedly the best place in the world to be studying computational linguistics, and the intellectual community here is simply amazing. I am grateful to every member of that community but would like to single out the following people to whom I am especially indebted: • MyPhDsupervisor,MilesOsborne,whosedata-intensivelinguisticsclassopened my eyes to statistical NLP and played a crucial role in my deciding to stay at Edinburgh for the PhD. His endlessly creative ideas and boundless enthusiasm made our weekly meetings in his ofﬁce (and at the pub) a true joy. As much as it is due to any one person, my success at Edinburgh is due to Miles. • My best friend and business partner, Colin Bannard, without whom I would not have founded Linear B. One of my fondest memories of Edinburgh is sitting in our living room trying to name the company. Linear B was perfect since it allowed us to convey to investors that we use clever methods to decipher foreign languages, while at the same time tacitly acknowledging that it might take us decades to do so. • Josh Schroeder, who is the primary reason that it did not take decades to achieve all that we did at Linear B. Josh lived in the boxroom in my ﬂat for a year, in-trepidly writing code so elegant and easy to maintain that I still use it to this day. Linear B put me in the enviable position of having two full-time programmers working for me during my PhD. The quality and amount of research that I was able to produce as a result far outstripped what I would have been able do alone. • Philipp Koehn joined the faculty at Edinburgh after I hounded him to apply and then lobbied the head of the school to allow student input into the hiring deci-sion (a diplomatic means of me getting my way). When Philipp arrived at the university he became the center of gravity for the machine translation group and allowed us to form a coherent whole. He has been a wonderful collaborator and I value the time that I had to work with him. v ... - tailieumienphi.vn

nguon tai.lieu . vn

Tiếng Anh thương mại TOEFL - IELTS - TOEIC Kỹ năng viết tiếng Anh Ngữ pháp tiếng Anh Tiếng Anh phổ thông Tiếng Anh thông dụng Chứng chỉ A Nhật - Pháp - Hoa - Others Kỹ năng đọc tiếng Anh Tiếng Anh trẻ em Kỹ năng nghe tiếng Anh Giáo dục học