AISHELL-1
Emerging73papers using it
2022first seen
Papers using AISHELL-1 (72)
- M2r-whisper: Multi-stage And Multi-scale Retrieval Augmentation For Enhancing WhisperCR-CTC: Consistency Regularization On CTC For Improved Speech RecognitionFast-u2++: Fast And Accurate End-to-end Speech Recognition In Joint Ctc/attention FramesStreaming Decoder-only Automatic Speech Recognition With Discrete Speech Units: A Pilot StudySscformer: Push The Limit Of Chunk-wise Conformer For Streaming ASR Using Sequentially Sampled Chunks And Chunked Causal ConvolutionEfficientasr: Speech Recognition Network Compression Via Attention Redundancy And Chunk-level FFN OptimizationEnhancing The Unified Streaming And Non-streaming Model With Contrastive LearningBridging Speech And Text: Enhancing ASR With Pinyin-to-character Pre-training In LlmsEffectiveasr: A Single-step Non-autoregressive Mandarin Speech Recognition Architecture With High Accuracy And Inference SpeedPAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech RecognitionBreaking Through the Spike: Spike Window Decoding for Accelerated and
Precise Automatic Speech RecognitionCUSIDE-T: Chunking, Simulating Future And Decoding For Transducer Based Streaming ASRLinguistic-enhanced Transformer With CTC Embedding For Speech RecognitionRetrieval-Augmented Self-Taught Reasoning Model with Adaptive Chain-of-Thought for ASR Named Entity CorrectionIKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech RecognitionStreaming Speech Recognition with Decoder-Only Large Language Models and Latency OptimizationEnd-to-end Speech Recognition with similar length speech and textA Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASRObjective Soups: Multilingual Multi-Task Modeling for Speech ProcessingIML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech ProcessingCR-CTC: Consistency regularization on CTC for improved speech
recognitionM2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for
Enhancing WhisperEffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech
Recognition Architecture with High Accuracy and Inference SpeedNextformer: A Convnext Augmented Conformer For End-to-end Speech RecognitionA CTC Triggered Siamese Network With Spatial-temporal Dropout For Speech RecognitionResearch On An Improved Conformer End-to-end Speech Recognition Model With R-drop StructureHypr: A Comprehensive Study For ASR Hypothesis Revising With A Reference CorpusCIF-T: A Novel Cif-based Transducer Architecture For Automatic Speech RecognitionZipformer: A faster and better encoder for automatic speech recognitionNextformer: A ConvNeXt Augmented Conformer For End-To-End Speech
RecognitionImproving Mandarin Speech Recogntion with Block-augmented TransformerTowards Unified All-Neural Beamforming for Time and Frequency Domain
Speech SeparationExploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical StudyParaformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech RecognitionA CTC Triggered Siamese Network with Spatial-Temporal Dropout for Speech
RecognitionKnowledge Transfer and Distillation from Autoregressive to
Non-Autoregressive Speech RecognitionPSVRF: Learning to restore Pitch-Shifted Voice without referenceA context-aware knowledge transferring strategy for CTC-based ASRLinguistic-Enhanced Transformer with CTC Embedding for Speech
RecognitionSAN: a robust end-to-end ASR model architectureFast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint
CTC/Attention FramesImproving Noisy Student Training on Non-target Domain Data for Automatic
Speech RecognitionSSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR
Using Sequentially Sampled Chunks and Chunked Causal ConvolutionKnowledge Transfer from Pre-trained Language Models to Cif-based Speech
Recognizers via Hierarchical DistillationBeyond Universal Transformer: block reusing with adaptor in Transformer
for automatic speech recognitionPyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for
Mandarin Speech RecognitionSelf-regularised Minimum Latency Training for Streaming
Transformer-based Speech RecognitionA Lexical-aware Non-autoregressive Transformer-based ASR ModelGNCformer Enhanced Self-attention for Automatic Speech RecognitionRethinking Speech Recognition with A Multimodal Perspective via Acoustic
and Semantic Cooperative DecodingEnhancing the Unified Streaming and Non-streaming Model with Contrastive
LearningResearch on an improved Conformer end-to-end Speech Recognition Model
with R-Drop StructureTST: Time-Sparse Transducer for Automatic Speech RecognitionCIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
RecognitionApproBiVT: Lead ASR Models to Generalize Better Using Approximated
Bias-Variance Tradeoff Guided Early Stopping and Checkpoint AveragingHypR: A comprehensive study for ASR hypothesis revising with a reference
corpusCross-modal Alignment with Optimal Transport for CTC-based ASRHierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention
for CTC-based ASRSkipformer: A Skip-and-Recover Strategy for Efficient Speech RecognitionEfficientASR: Speech Recognition Network Compression via Attention
Redundancy and Chunk-Level FFN OptimizationUnveiling the Potential of LLM-Based ASR on Chinese Open-Source DatasetsMulti-Channel Multi-Speaker ASR Using Target Speaker's Solo SegmentStreaming Decoder-Only Automatic Speech Recognition with Discrete Speech
Units: A Pilot StudyCUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based
Streaming ASRHydraFormer: One Encoder For All Subsampling RatesAn Effective Context-Balanced Adaptation Approach for Long-Tailed Speech
RecognitionLarge Language Model Should Understand Pinyin for Chinese ASR Error
CorrectionBridging Speech and Text: Enhancing ASR with Pinyin-to-Character
Pre-training in LLMsDeep CLAS: Deep Contextual Listen, Attend and SpellSample adaptive data augmentation with progressive schedulingUCorrect: An Unsupervised Framework for Automatic Speech Recognition
Error CorrectionUniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL
Models