cs.CL
50 papers tagged cs.CL (ordered by heat_score)
Papers
- Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference (2026)Sangyun Lee et al.15.03
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research (2026)Dingbang Wu et al.14.25
- Rethinking Memory as Continuously Evolving Connectivity (2026)Jizhan Fang et al.13.31
- Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling (2026)Xinglin Wang et al.13.12
- GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection (2026)Zheng Wu et al.12.92
- Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning (2026)Jiapeng Zhu et al.12.46
- OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration (2026)Xinchen Zhang et al.11.20
- Coding Speech through Vocal Tract Kinematics (2025)Cheol Jun Cho et al.11.19
- QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents (2026)Ye Yuan et al.10.67
- Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders (2026)Yi Jing et al.10.61
- Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement (2026)Dingwei Chen et al.10.27
- ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence (2026)Rui Meng et al.10.05
- Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems (2026)Bingyu Yan et al.9.75
- MobileMoE: Scaling On-Device Mixture of Experts (2026)Yanbei Chen et al.9.24
- Models That Know How Evaluations Are Designed Score Safer (2026)Katharina Deckenbach et al.9.04
- Multi-Agent Causal Discovery Using Large Language Models (2026)Hao Duong Le et al.7.59
- Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization (2026)Anmol Agarwal et al.6.98
- Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models (2026)Yifan Jiang et al.6.98
- CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations (2026)Mike Zhang et al.6.17
- SIA: Self Improving AI with Harness & Weight Updates (2026)Prannay Hebbar et al.5.68
- Real-time Speech Summarization for Medical Conversations (2025)Khai Le-Duc et al.5.24
- Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions (2026)Antonia Karamolegkou et al.5.06
- Advancing Creative Physical Intelligence in Large Multimodal Models (2026)Cheng Qian et al.5.04
- Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases (2026)Dongyoon Hahm et al.5.04
- DEPART: DEcomposing PARiTy across Multilingual LLMs (2026)Manan Uppadhyay et al.4.54
- Framing Matters: Addressing Framing Sensitivity in Decision-Making through Behaviorally-Grounded Value Alignment (2026)Seojin Hwang et al.4.54
- Pruning and Distilling Mixture-of-Experts into Dense Language Models (2026)Junhyuck Kim et al.4.54
- When Helpful Context Leaks: Privacy Risks in Domain-Adapted ASR (2026)Maike Z\"ufle et al.4.54
- Analyzing Quality-Latency-Resource Trade-offs in a Technical Documentation RAG Assistant Using LoRA Adaptation (2026)Evgenii Palnikov et al.4.54
- Why We Need Speech to Evaluate Speech Translation (2026)Maike Z\"ufle et al.4.54
- PrunePath: Towards Highly Structured Sparse Language Models (2026)Zhexuan Gu et al.4.54
- Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning (2026)Yahan Yu et al.4.54
- Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models (2026)Guanzhi Deng et al.4.54
- PubMedCausal: A Span-Level Annotated Corpus for Causal Relation Extraction in Biomedical Text (2026)Ifeoluwa Kunle-John et al.4.54
- FABSVer: Faster Training and Better Self-Verification for LLM Mathematical Reasoning (2026)Haihui Pan et al.4.54
- Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts (2026)Prasenjit K Mudi et al.4.54
- AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates (2026)Shaolong Chen et al.4.54
- The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment (2026)Junyu Lu et al.4.54
- Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards (2026)Saurabh Dash et al.4.54
- Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News (2026)Alejandro Buitrago L\'opez et al.4.54
- Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents (2026)Zheng Wu et al.4.54
- Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study (2026)Irune Zubiaga et al.4.54
- IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents (2026)Michael Galarnyk et al.4.54
- The Abstraction Gap in Vision-Language Causal Reasoning (2026)Chinh Hoang et al.4.54
- Skill-Conditioned Gated Self-Distillation for LLM Reasoning (2026)Jiazhen Huang et al.4.54
- Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization (2026)Beiduo Chen et al.4.54
- VLMs May Not Globally Enhance Human Alignment over LLMs During Natural Reading (2026)Jinzhou Wu et al.4.54
- Finding Pareto Trade-offs in Fair and Accurate Detection of Toxic Speech (2025)Soumyajit Gupta et al.4.52
- AgentAtlas: Beyond Outcome Leaderboards for LLM Agents (2026)Parsa Mazaheri et al.3.91
- MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation (2026)Szu-Chi Chen et al.3.87