Abstract
This study investigates the extent to which Mel-Frequency Cepstral Coefficients (MFCCs) capture first language (L1) transfer in extended second language (L2) English speech. Speech samples from Mandarin and American English L1 speakers were extracted from the GMU Speech Accent Archive, converted to WAV format, and processed to obtain thirteen MFCCs per speaker. A multi-method analytic framework combining inferential statistics (t-tests, MANOVA, Canonical Discriminant Analysis) and machine learning (Random Forest classification) identified MFCC-1 (broadband energy), MFCC-2 (first formant region), and MFCC-5 (voicing and fricative energy) as the most discriminative features for distinguishing L1 backgrounds. A reduced-feature model using these MFCCs significantly outperformed the full-feature model, as confirmed by McNemar's test and non-overlapping confidence intervals. The findings empirically support the Perceptual Assimilation Model for L2 (PAM-L2) and the Speech Learning Model (SLM)