Awesome Papers

Papers

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference (2026)
Sangyun Lee et al.
15.03
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research (2026)
Dingbang Wu et al.
14.25
Rethinking Memory as Continuously Evolving Connectivity (2026)
Jizhan Fang et al.
13.31
Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling (2026)
Xinglin Wang et al.
13.12
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection (2026)
Zheng Wu et al.
12.92
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning (2026)
Jiapeng Zhu et al.
12.46
OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration (2026)
Xinchen Zhang et al.
11.20
Coding Speech through Vocal Tract Kinematics (2025)
Cheol Jun Cho et al.
11.19
QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents (2026)
Ye Yuan et al.
10.67
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders (2026)
Yi Jing et al.
10.61
Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement (2026)
Dingwei Chen et al.
10.27
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence (2026)
Rui Meng et al.
10.05
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems (2026)
Bingyu Yan et al.
9.75
MobileMoE: Scaling On-Device Mixture of Experts (2026)
Yanbei Chen et al.
9.24
Models That Know How Evaluations Are Designed Score Safer (2026)
Katharina Deckenbach et al.
9.04
Multi-Agent Causal Discovery Using Large Language Models (2026)
Hao Duong Le et al.
7.59
Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization (2026)
Anmol Agarwal et al.
6.98
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models (2026)
Yifan Jiang et al.
6.98
CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations (2026)
Mike Zhang et al.
6.17
SIA: Self Improving AI with Harness & Weight Updates (2026)
Prannay Hebbar et al.
5.68
Real-time Speech Summarization for Medical Conversations (2025)
Khai Le-Duc et al.
5.24
Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions (2026)
Antonia Karamolegkou et al.
5.06
Advancing Creative Physical Intelligence in Large Multimodal Models (2026)
Cheng Qian et al.
5.04
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases (2026)
Dongyoon Hahm et al.
5.04
DEPART: DEcomposing PARiTy across Multilingual LLMs (2026)
Manan Uppadhyay et al.
4.54
Framing Matters: Addressing Framing Sensitivity in Decision-Making through Behaviorally-Grounded Value Alignment (2026)
Seojin Hwang et al.
4.54
Pruning and Distilling Mixture-of-Experts into Dense Language Models (2026)
Junhyuck Kim et al.
4.54
When Helpful Context Leaks: Privacy Risks in Domain-Adapted ASR (2026)
Maike Z\"ufle et al.
4.54
Analyzing Quality-Latency-Resource Trade-offs in a Technical Documentation RAG Assistant Using LoRA Adaptation (2026)
Evgenii Palnikov et al.
4.54
Why We Need Speech to Evaluate Speech Translation (2026)
Maike Z\"ufle et al.
4.54
PrunePath: Towards Highly Structured Sparse Language Models (2026)
Zhexuan Gu et al.
4.54
Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning (2026)
Yahan Yu et al.
4.54
Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models (2026)
Guanzhi Deng et al.
4.54
PubMedCausal: A Span-Level Annotated Corpus for Causal Relation Extraction in Biomedical Text (2026)
Ifeoluwa Kunle-John et al.
4.54
FABSVer: Faster Training and Better Self-Verification for LLM Mathematical Reasoning (2026)
Haihui Pan et al.
4.54
Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts (2026)
Prasenjit K Mudi et al.
4.54
AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates (2026)
Shaolong Chen et al.
4.54
The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment (2026)
Junyu Lu et al.
4.54
Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards (2026)
Saurabh Dash et al.
4.54
Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News (2026)
Alejandro Buitrago L\'opez et al.
4.54
Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents (2026)
Zheng Wu et al.
4.54
Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study (2026)
Irune Zubiaga et al.
4.54
IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents (2026)
Michael Galarnyk et al.
4.54
The Abstraction Gap in Vision-Language Causal Reasoning (2026)
Chinh Hoang et al.
4.54
Skill-Conditioned Gated Self-Distillation for LLM Reasoning (2026)
Jiazhen Huang et al.
4.54
Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization (2026)
Beiduo Chen et al.
4.54
VLMs May Not Globally Enhance Human Alignment over LLMs During Natural Reading (2026)
Jinzhou Wu et al.
4.54
Finding Pareto Trade-offs in Fair and Accurate Detection of Toxic Speech (2025)
Soumyajit Gupta et al.
4.52
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents (2026)
Parsa Mazaheri et al.
3.91
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation (2026)
Szu-Chi Chen et al.
3.87