Meralion-speechencoder: Towards A Speech Foundation Model For Singapore And Beyond
2024 Β· Muhammad Huzaifah, Geyu Lin, Tianchi Liu, et al.
Abstract
This technical report describes the MERaLiON-SpeechEncoder, a foundation model designed to support a wide range of downstream speech applications. Developed as part of Singapore's National Multimodal Large Language Model Programme, the MERaLiON-SpeechEncoder is tailored to address the speech processing needs in Singapore and the surrounding Southeast Asian region. The model currently supports mainly English, including the variety spoken in Singapore. We are actively expanding our datasets to gradually cover other languages in subsequent releases. The MERaLiON-SpeechEncoder was pre-trained from scratch on 200,000 hours of unlabelled speech data using a self-supervised learning approach based on masked language modelling. We describe our training procedure and hyperparameter tuning experiments in detail below. Our evaluation demonstrates improvements to spontaneous and Singapore speech benchmarks for speech recognition, while remaining competitive to other state-of-the-art speech encoder
Authors
(none)
Tags
Stats
Related papers
- Meralion-audiollm: Bridging Audio And Language With Large Language Models (2024)0.00
- Advancing Singlish Understanding: Bridging The Gap With Datasets And Multimodal Models (2025)0.00
- Spoken Language Identification System For English-mandarin Code-switching Child-directed Speech (2023)4.52
- Fireredasr: Open-source Industrial-grade Mandarin Speech Recognition Models From Encoder-decoder To LLM Integration (2025)6.54
- Merlion CCS Challenge: A English-mandarin Code-switching Child-directed Speech Corpus For Language Identification And Diarization (2023)0.00
- A Comprehensive Solution To Connect Speech Encoder And Large Language Model For ASR (2024)0.00
- Mmmmodal -- Multi-images Multi-audio Multi-turn Multi-modal (2024)0.00
- LESS: Large Language Model Enhanced Semi-supervised Learning For Speech Foundational Models Using In-the-wild Data (2025)0.00