{"ID":2843099,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.07776","arxiv_id":"2511.07776","title":"Streaming Tensor Programs: A Streaming Abstraction for Dynamic Parallelism","abstract":"Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow. However, the limited expressiveness of prior programming abstractions for spatial dataflow accelerators (SDAs) forces these dynamic behaviors to be implemented statically and/or unoptimized. To address these challenges, we present Streaming Tensor Programs (STeP), a streaming abstraction that enables dynamic tensor workloads to run efficiently on SDAs. STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic-shape semantics that expose dynamic data rates and tensor dimensions. These capabilities unlock new optimizations, like dynamic tiling, dynamic parallelization, and configuration time-multiplexing, that adapt SDA execution to dynamic behaviors while preserving dataflow efficiency. Using a cycle-approximate simulator on representative LLM layers and a full model with real-world traces, STeP enables: dynamic tiling that breaks the Pareto-optimal frontier from prior work, dynamic parallelization that improves latency by ~2.72x, and configuration time-multiplexing that increases compute utilization by ~2.64x over prior SDA abstractions and their implementations.","short_abstract":"Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow. However, the limited expressiveness of prior programming abstractions for spatial dataflow accelerators (SDAs) forces these dynamic behaviors to be...","url_abs":"https://arxiv.org/abs/2511.07776","url_pdf":"https://arxiv.org/pdf/2511.07776v2","authors":"[\"Gina Sohn\",\"Genghan Zhang\",\"Konstantin Hossfeld\",\"Jungwoo Kim\",\"Nathan Sobotka\",\"Nathan Zhang\",\"Olivia Hsu\",\"Kunle Olukotun\"]","published":"2025-11-11T02:49:10Z","proceeding":"cs.PL","tasks":"[\"cs.PL\",\"cs.AR\",\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
