Qwen
Emerging11papers using it
2025first seen
Papers using Qwen (11)
- EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware DistillationReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM InferenceReasoning-preserved Efficient Distillation of Large Language Models via Activation-aware InitializationGAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary BudgetsAccess Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model MergingNIRVANA: Structured pruning reimagined for large language models compressionChunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference OptimizationTransMLA: Multi-head Latent Attention Is All You NeedARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix
FactorizationAccelerating Large Language Model Reasoning via Speculative SearchLoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades