{"ID":2841690,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11243","arxiv_id":"2511.11243","title":"Arcee: Differentiable Recurrent State Chain for Generative Vision Modeling with Mamba SSMs","abstract":"State-space models (SSMs), Mamba in particular, are increasingly adopted for long-context sequence modeling, providing linear-time aggregation via an input-dependent, causal selective-scan operation. Along this line, recent \"Mamba-for-vision\" variants largely explore multiple scan orders to relax strict causality for non-sequential signals (e.g., images). Rather than preserving cross-block memory, the conventional formulation of the selective-scan operation in Mamba reinitializes each block's state-space dynamics from zero, discarding the terminal state-space representation (SSR) from the previous block. Arcee, a cross-block recurrent state chain, reuses each block's terminal state-space representation as the initial condition for the next block. Handoff across blocks is constructed as a differentiable boundary map whose Jacobian enables end-to-end gradient flow across terminal boundaries. Key to practicality, Arcee is compatible with all prior \"vision-mamba\" variants, parameter-free, and incurs constant, negligible cost. As a modeling perspective, we view terminal SSR as a mild directional prior induced by a causal pass over the input, rather than an estimator of the non-sequential signal itself. To quantify the impact, for unconditional generation on CelebA-HQ (256$\\times$256) with Flow Matching, Arcee reduces FID$\\downarrow$ from $82.81$ to $15.33$ ($5.4\\times$ lower) on a single scan-order Zigzag Mamba baseline. Efficient CUDA kernels and training code will be released to support rigorous and reproducible research.","short_abstract":"State-space models (SSMs), Mamba in particular, are increasingly adopted for long-context sequence modeling, providing linear-time aggregation via an input-dependent, causal selective-scan operation. Along this line, recent \"Mamba-for-vision\" variants largely explore multiple scan orders to relax strict causality for n...","url_abs":"https://arxiv.org/abs/2511.11243","url_pdf":"https://arxiv.org/pdf/2511.11243v2","authors":"[\"Jitesh Chavan\",\"Rohit Lal\",\"Anand Kamat\",\"Mengjia Xu\"]","published":"2025-11-14T12:44:02Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
