TIMIT
Canonical42papers using it
2022first seen
The TIMIT dataset contains a set of phonetically rich recordings of American English speech, used to evaluate speech recognition and phonetic transcription systems.
Papers using TIMIT (42)
- Robust Disentangled Variational Speech Representation Learning For Zero-shot Voice ConversionEstimation Of Speaker Age And Height From Speech Signal Using Bi-encoder Transformer Mixture ModelDisentangled Speech Representation Learning Based On Factorized Hierarchical Variational Autoencoder With Self-supervised ObjectiveOn Speech Pre-emphasis As A Simple And Inexpensive Method To Boost Speech EnhancementAISHELL6-whisper: A Chinese Mandarin Audio-visual Whisper Speech Dataset with Speech Recognition BaselinesEfficient Multi-channel Speech Enhancement With Spherical Harmonics Injection For Directional EncodingBEST-STD: Bidirectional Mamba-enhanced Speech Tokenization For Spoken Term DetectionUnsupervised Speech Recognition With N-skipgram And Positional Unigram MatchingFlowW2N: Whispered-to-Normal Speech Conversion via Flow-MatchingSingle Channel Blind Dereverberation of Speech SignalsBFA: Real-time Multilingual Text-to-speech Forced AlignmentEvaluating the Representation of Vowels in Wav2Vec Feature Extractor: A Layer-Wise Analysis Using MFCCsState-Space Models in Efficient Whispered and Multi-dialect Speech RecognitionTradition Or Innovation: A Comparison Of Modern ASR Methods For Forced AlignmentComplex Recurrent Variational Autoencoder With Application To Speech EnhancementPhaseperturbation: Speech Data Augmentation Via Phase Perturbation For Automatic Speech RecognitionBack To Supervision: Boosting Word Boundary Detection Through Frame ClassificationHierarchical Modeling Of Spatial Cues Via Spherical Harmonics For Multi-channel Speech EnhancementPDPCRN: Parallel Dual-path CRN With Bi-directional Inter-branch Interactions For Multi-channel Speech EnhancementTime-frequency Network for Robust Speaker RecognitionImproving Deep Attractor Network by BGRU and GMM for Speech SeparationTradition or Innovation: A Comparison of Modern ASR Methods for Forced
AlignmentLearning Phone Recognition from Unpaired Audio and Phone Sequences Based
on Generative Adversarial NetworkPhoneme Segmentation Using Self-Supervised Speech ModelsEURO: ESPnet Unsupervised ASR Open-source ToolkitEnhancing Unsupervised Speech Recognition with Diffusion GANsWeakly-supervised forced alignment of disfluent speech using
phoneme-level modelingTimestamped Embedding-Matching Acoustic-to-Word CTC ASRRepresentation Learning With Hidden Unit Clustering For Low Resource
Speech ApplicationsPDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch
Interactions for Multi-Channel Speech EnhancementHierarchical Modeling of Spatial Cues via Spherical Harmonics for
Multi-Channel Speech EnhancementEfficient Multi-Channel Speech Enhancement with Spherical Harmonics
Injection for Directional EncodingUnsupervised Speech Recognition with N-Skipgram and Positional Unigram
MatchingImproving Whispered Speech Recognition Performance using
Pseudo-whispered based Data AugmentationPhasePerturbation: Speech Data Augmentation via Phase Perturbation for
Automatic Speech RecognitionOn Speech Pre-emphasis as a Simple and Inexpensive Method to Boost
Speech EnhancementREBORN: Reinforcement-Learned Boundary Segmentation with Iterative
Training for Unsupervised ASRLeveraging Self-Supervised Models for Automatic Whispered Speech
RecognitionMaskCycleGAN-based Whisper to Normal Speech ConversionBack to Supervision: Boosting Word Boundary Detection through Frame
ClassificationBEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken
Term DetectionQuartered Spectral Envelope and 1D-CNN-based Classification of Normally
Phonated and Whispered Speech