CAMformer: Associative Memory is All You Need

Tergel Molom-Ochir·Benjamin F. Morris·Mark Horton·Chiyue Wei·Cong Guo·Brady Taylor·Peter Liu·Shan X. Wang·Deliang Fan·Hai Helen Li·and Yiran Chen·2025

arXiv:2511.19740 ↗Google Scholar ↗Semantic Scholar ↗

cs.AR cs.LG

Abstract

Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.

Abstract

Related papers