Zhaozhuo Xu
14 papers · 227 citations
Most-cited papers
- KV Cache Is 1 Bit Per Channel: Efficient Large Language Model Inference With Coupled Quantization2024 · 75 citations
- KV Cache Compression, But What Must We Give In Return? A Comprehensive Benchmark Of Long Context Capable Approaches2024 · 44 citations
- Zeroth-order Fine-tuning Of Llms With Extreme Sparsity2024 · 34 citations
- Nomad-attention: Efficient LLM Inference On Cpus Through Multiply-add-free Attention2024 · 21 citations
Topics