A Reproduction Of Apple’s Bi-directional LSTM Models For Language Identification In Short Strings

A Reproduction Of Apple's Bi-directional LSTM Models For Language Identification In Short Strings

Mads Toftrup, Søren Asger Sørensen, Manuel R. Ciosici, Ira Assent · Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop · 2021

Language Identification is the task of identifying a document’s language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model’s performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.

Stay Updated

A Reproduction Of Apple's Bi-directional LSTM Models For Language Identification In Short Strings

Similar Work