{"ID":2840758,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.13679","arxiv_id":"2511.13679","title":"QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention","abstract":"Deformable transformers deliver state-of-the-art detection but map poorly to hardware due to irregular memory access and low arithmetic intensity. We introduce QUILL, a schedule-aware accelerator that turns deformable attention into cache-friendly, single-pass work. At its core, Distance-based Out-of-Order Querying (DOOQ) orders queries by spatial proximity; the look-ahead drives a region prefetch into an alternate buffer--forming a schedule-aware prefetch loop that overlaps memory and compute. A fused MSDeformAttn engine executes interpolation, Softmax, aggregation, and the final projection (W''m) in one pass without spilling intermediates, while small tensors are kept on-chip and surrounding dense layers run on integrated GEMMs. Implemented as RTL and evaluated end-to-end, QUILL achieves up to 7.29x higher throughput and 47.3x better energy efficiency than an RTX 4090, and exceeds prior accelerators by 3.26-9.82x in throughput and 2.01-6.07x in energy efficiency. With mixed-precision quantization, accuracy tracks FP32 within \u003c=0.9 AP across Deformable and Sparse DETR variants. By converting sparsity into locality--and locality into utilization--QUILL delivers consistent, end-to-end speedups.","short_abstract":"Deformable transformers deliver state-of-the-art detection but map poorly to hardware due to irregular memory access and low arithmetic intensity. We introduce QUILL, a schedule-aware accelerator that turns deformable attention into cache-friendly, single-pass work. At its core, Distance-based Out-of-Order Querying (DO...","url_abs":"https://arxiv.org/abs/2511.13679","url_pdf":"https://arxiv.org/pdf/2511.13679v1","authors":"[\"Hyunwoo Oh\",\"Hanning Chen\",\"Sanggeon Yun\",\"Yang Ni\",\"Wenjun Huang\",\"Tamoghno Das\",\"Suyeon Jang\",\"Mohsen Imani\"]","published":"2025-11-17T18:34:04Z","proceeding":"cs.AR","tasks":"[\"cs.AR\",\"cs.CV\",\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false}
