{"ID":2845503,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04768","arxiv_id":"2511.04768","title":"FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow","abstract":"As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sparse machine learning models written in PyTorch to fused sparse dataflow graphs for reconfigurable dataflow architectures (RDAs). FuseFlow is the first compiler to support general cross-expression fusion of sparse operations. In addition to fusion across kernels (expressions), FuseFlow also supports optimizations like parallelization, dataflow ordering, and sparsity blocking. It targets a cycle-accurate dataflow simulator for microarchitectural analysis of fusion strategies. We use FuseFlow for design-space exploration across four real-world machine learning applications with sparsity, showing that full fusion (entire cross-expression fusion across all computation in an end-to-end model) is not always optimal for sparse models-fusion granularity depends on the model itself. FuseFlow also provides a heuristic to identify and prune suboptimal configurations. Using Fuseflow, we achieve performance improvements, including a ~2.7x speedup over an unfused baseline for GPT-3 with BigBird block-sparse attention.","short_abstract":"As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sparse machine learning models written in PyTorch to fused sparse dataflow graphs for reconfigurable dataflow architectures (RDAs)....","url_abs":"https://arxiv.org/abs/2511.04768","url_pdf":"https://arxiv.org/pdf/2511.04768v2","authors":"[\"Rubens Lacouture\",\"Nathan Zhang\",\"Ritvik Sharma\",\"Marco Siracusa\",\"Fredrik Kjolstad\",\"Kunle Olukotun\",\"Olivia Hsu\"]","published":"2025-11-06T19:40:20Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AR\",\"cs.PL\"]","methods":"[\"LoRA\"]","has_code":false}
