{"ID":2824530,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.06065","arxiv_id":"2601.06065","title":"Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking","abstract":"The need for long-context reasoning has led to alternative neural network architectures besides Transformers and self-attention, a popular model being Hyena, which employs causal 1D-convolutions implemented with FFTs. Long convolutions enable efficient global context mixing, but requirements for intermediate results exceed the 2-3 MB Block RAM capacity of FPGAs. We present a chunked FFT convolution approach enabling 450K length sequence by 450K length filter convolutions on an Alveo U200 FPGA with 2.8 MB BRAM through chunking and overlap-add reconstruction. We find that throughput scales proportionally with chunk size while degrading minimally by 7% for our longest sequences, demonstrating that careful memory management enables deployment of long-context primitives on edge FPGAs without sacrificing performance.","short_abstract":"The need for long-context reasoning has led to alternative neural network architectures besides Transformers and self-attention, a popular model being Hyena, which employs causal 1D-convolutions implemented with FFTs. Long convolutions enable efficient global context mixing, but requirements for intermediate results ex...","url_abs":"https://arxiv.org/abs/2601.06065","url_pdf":"https://arxiv.org/pdf/2601.06065v1","authors":"[\"Peter Wang\",\"Neelesh Gupta\",\"Viktor Prasanna\"]","published":"2025-12-28T00:03:22Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AR\"]","methods":"[\"Transformer\"]","has_code":false}
