Bilingual Lexicon Induction Through Unsupervised Machine Translation
2019 Β· Mikel Artetxe, Gorka Labaka, Eneko Agirre
Abstract
A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-li
Authors
(none)
Tags
Stats
Related papers
- Aligning Multilingual Word Embeddings For Cross-modal Retrieval Task (2019)2.26
- Massively Multilingual Sentence Embeddings For Zero-shot Cross-lingual Transfer And Beyond (2018)26.33
- Evaluating Multilingual Text Encoders For Unsupervised Cross-lingual Retrieval (2021)7.50
- Margin-based Parallel Corpus Mining With Multilingual Sentence Embeddings (2018)10.97
- CL2CM: Improving Cross-lingual Cross-modal Retrieval Via Cross-lingual Knowledge Transfer (2023)8.60
- On Cross-lingual Retrieval With Multilingual Text Encoders (2021)10.35
- UC2: Universal Cross-lingual Cross-modal Vision-and-language Pre-training (2021)13.05
- Image Search Using Multilingual Texts: A Cross-modal Learning Approach Between Image And Text (2019)0.00