{"ID":2865787,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.20789","arxiv_id":"2509.20789","title":"Aligning Inductive Bias for Data-Efficient Generalization in State Space Models","abstract":"The remarkable success of modern AI has been closely tied to scaling laws, yet the finite supply of high-quality data makes data efficiency--learning more from less--an increasingly important frontier. A model's inductive bias is a critical lever for data efficiency, but foundational sequence models such as State Space Models (SSMs) often rely on fixed, task-agnostic biases. When this fixed prior is misaligned with the underlying structure of a task, the model may require additional samples to overcome its own bias before learning the relevant signal. In this work, we introduce a principled framework for understanding and aligning the inductive bias of linear time-invariant SSMs. We first formalize this bias through an SSM-induced kernel and show theoretically and empirically that its spectrum is governed by the model's frequency response. This characterization motivates Task-Dependent Initialization (TDI), a fast power-spectrum matching method that aligns the initial SSM bias with the task's spectral characteristics before downstream training. Across controlled synthetic experiments, trainable one-layer SSMs, and deep SSMs on diverse real-world benchmarks, TDI can improve data-efficient generalization primarily when task-relevant spectral structure is present and the default SSM bias is spectrally mismatched. Our results provide both a theoretical lens and a practical tool for task-adaptive inductive bias, suggesting a path toward more data-efficient sequence modeling.","short_abstract":"The remarkable success of modern AI has been closely tied to scaling laws, yet the finite supply of high-quality data makes data efficiency--learning more from less--an increasingly important frontier. A model's inductive bias is a critical lever for data efficiency, but foundational sequence models such as State Space...","url_abs":"https://arxiv.org/abs/2509.20789","url_pdf":"https://arxiv.org/pdf/2509.20789v4","authors":"[\"Qiyu Chen\",\"Guozhang Chen\"]","published":"2025-09-25T06:14:44Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
