{"ID":2838606,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.17235","arxiv_id":"2511.17235","title":"NX-CGRA: A Programmable Hardware Accelerator for Core Transformer Algorithms on Edge Devices","abstract":"The increasing diversity and complexity of transformer workloads at the edge present significant challenges in balancing performance, energy efficiency, and architectural flexibility. This paper introduces NX-CGRA, a programmable hardware accelerator designed to support a range of transformer inference algorithms, including both linear and non-linear functions. Unlike fixed-function accelerators optimized for narrow use cases, NX-CGRA employs a coarse-grained reconfigurable array (CGRA) architecture with software-driven programmability, enabling efficient execution across varied kernel patterns. The architecture is evaluated using representative benchmarks derived from real-world transformer models, demonstrating high overall efficiency and favorable energy-area tradeoffs across different classes of operations. These results indicate the potential of NX-CGRA as a scalable and adaptable hardware solution for edge transformer deployment under constrained power and silicon budgets.","short_abstract":"The increasing diversity and complexity of transformer workloads at the edge present significant challenges in balancing performance, energy efficiency, and architectural flexibility. This paper introduces NX-CGRA, a programmable hardware accelerator designed to support a range of transformer inference algorithms, incl...","url_abs":"https://arxiv.org/abs/2511.17235","url_pdf":"https://arxiv.org/pdf/2511.17235v1","authors":"[\"Rohit Prasad\"]","published":"2025-11-21T13:26:16Z","proceeding":"cs.AR","tasks":"[\"cs.AR\"]","methods":"[\"Transformer\"]","has_code":false}
