Awesome Papers

Papers

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference (2026)
Sangyun Lee et al.
15.03
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research (2026)
Dingbang Wu et al.
14.25
Rethinking Memory as Continuously Evolving Connectivity (2026)
Jizhan Fang et al.
13.31
OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration (2026)
Xinchen Zhang et al.
11.20
Coding Speech through Vocal Tract Kinematics (2025)
Cheol Jun Cho et al.
11.19
Recursive Flow Matching (2026)
Jiahe Huang et al.
11.02
QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents (2026)
Ye Yuan et al.
10.67
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders (2026)
Yi Jing et al.
10.61
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models (2026)
Mingze Wang et al.
10.48
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence (2026)
Rui Meng et al.
10.05
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions (2026)
Yuxin Chen et al.
10.05
Channel-wise Vector Quantization (2026)
Wei Song et al.
9.33
MobileMoE: Scaling On-Device Mixture of Experts (2026)
Yanbei Chen et al.
9.24
Models That Know How Evaluations Are Designed Score Safer (2026)
Katharina Deckenbach et al.
9.04
Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows (2026)
Harshada Badave et al.
8.91
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance (2025)
Qijun Gan et al.
8.70
Less is More: Early Stopping Rollout for On-Policy Distillation (2026)
Zhou Ziheng et al.
8.54
Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments (2026)
Yuxin Chen et al.
8.11
Multi-Agent Causal Discovery Using Large Language Models (2026)
Hao Duong Le et al.
7.59
Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation (2026)
Shuhong Zheng et al.
7.39
MEMS and ECM Sensor Technologies for Cardiorespiratory Sound Monitoring - A Comprehensive Review (2025)
Yasaman Torabi et al.
7.16
Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization (2026)
Anmol Agarwal et al.
6.98
On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach (2026)
Jiahui Bai et al.
6.77
Can LLMs Introspect? A Reality Check (2026)
Shashwat Singh et al.
6.17
CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations (2026)
Mike Zhang et al.
6.17
Cross-scale Aligned Supervision for Training GANs (2026)
Sangeek Hyun et al.
6.17
SIA: Self Improving AI with Harness & Weight Updates (2026)
Prannay Hebbar et al.
5.68
Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts (2025)
Paige Tuttosi et al.
5.24
Real-time Speech Summarization for Medical Conversations (2025)
Khai Le-Duc et al.
5.24
Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions (2026)
Antonia Karamolegkou et al.
5.06
Preference-Shaped Expected Hypervolume and R2 Improvement: Exact Computation and Monotonicity (2026)
Michael T. M. Emmerich
5.06
Advancing Creative Physical Intelligence in Large Multimodal Models (2026)
Cheng Qian et al.
5.04
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases (2026)
Dongyoon Hahm et al.
5.04
A Sharper Picture of Generalization in Transformers (2026)
Paul Lintilhac et al.
4.54
Variance Reduction for Expectations with Diffusion Teachers (2026)
Jesse Bettencourt et al.
4.54
DEPART: DEcomposing PARiTy across Multilingual LLMs (2026)
Manan Uppadhyay et al.
4.54
Pruning and Distilling Mixture-of-Experts into Dense Language Models (2026)
Junhyuck Kim et al.
4.54
PrunePath: Towards Highly Structured Sparse Language Models (2026)
Zhexuan Gu et al.
4.54
Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning (2026)
Yahan Yu et al.
4.54
Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models (2026)
Guanzhi Deng et al.
4.54
The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment (2026)
Junyu Lu et al.
4.54
Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News (2026)
Alejandro Buitrago L\'opez et al.
4.54
Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study (2026)
Irune Zubiaga et al.
4.54
IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents (2026)
Michael Galarnyk et al.
4.54
Skill-Conditioned Gated Self-Distillation for LLM Reasoning (2026)
Jiazhen Huang et al.
4.54
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents (2026)
Parsa Mazaheri et al.
3.91
Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images (2026)
Jos\'ephine Raugel et al.
3.91
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation (2026)
Szu-Chi Chen et al.
3.87
Promoting the Responsible Development of Speech Datasets for Mental Health and Neurological Disorders Research (2025)
Eleonora Mancini et al.
3.58
Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations (2025)
Bulat Khaertdinov et al.
3.58