Can String Kernels Pass The Test Of Time In Native Language Identification?
2017 Β· Radu Tudor Ionescu, Marius Popescu
Abstract
We describe a machine learning approach for the 2017 shared task on Native Language Identification (NLI). The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from essays or speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided by the shared task organizers. For the learning stage, we choose Kernel Discriminant Analysis (KDA) over Kernel Ridge Regression (KRR), because the former classifier obtains better results than the latter one on the development set. In our previous work, we have used a similar machine learning approach to achieve state-of-the-art NLI results. The goal of this paper is to demonstrate that our shallow and simple approach based on string kernels (with minor improvements) can pass the test of time and reach state-of-the-art performance in the 2017 NLI shared task, despite the recent
Authors
(none)
Tags
Stats
Related papers
- Native Language Identification On Text And Speech (2017)8.60
- Unibuckernel Reloaded: First Place In Arabic Dialect Identification For The Second Year In A Row (2018)0.00
- Native Language Identification Using I-vector (2018)0.00
- Native Language Identification Using Stacked Generalization (2017)0.00
- The Relevance Of Text And Speech Features In Automatic Non-native English Accent Identification (2018)0.00
- Kernel Approximation Methods For Speech Recognition (2017)0.00
- Phonetic Temporal Neural Model For Language Identification (2017)12.40
- Enhancing Neural Spoken Language Recognition: An Exploration With Multilingual Datasets (2025)0.00