{"ID":3050092,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T11:27:32.998563389Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04752","arxiv_id":"2606.04752","title":"An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers","abstract":"Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\\text{model}}$-dimensional vector per time step. We empirically audit eight input encoders -- spanning a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP stem, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding -- on a synthetic benchmark designed to make channel identity informative and on ETTh1 as a real-data check, measured in next-step negative log-likelihood (NLL). The headline is one of practical near-equivalence within a wide \"top tier\": the standard per-channel linear projection (nn.Linear(C, $d_{\\text{model}}$)) matches every alternative in that tier up to small, statistically real but practically modest, differences. Two encoders lose decisively: the shared-scalar baseline, which collapses for information-theoretic reasons we make explicit, and the channel-independent PatchTST-spirit baseline, which underperforms on both benchmarks and overfits universally on the synthetic one. Paired tests resolve two small gaps: projecting the sinusoidal positional encoding through a learned linear layer edges the rest at small $C$, with a direct geometric probe showing the mechanism is positional-channel orthogonalisation; a nonlinear MLP stem edges them at the largest $C$ we test, with the gap shrinking under more training data. The practical recommendation is to use nn.Linear(C, $d_{\\text{model}}$) by default and reach for something more elaborate only when the task at hand gives a real reason to do so. Code and data to reproduce every experiment in this paper are available at https://github.com/OssiLehtinen/channel-encoder-audit","short_abstract":"Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\\text{model}}$-dimensional vector per time step. We empirically audit eight input encoders -- spanning a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP stem, block-p...","url_abs":"https://arxiv.org/abs/2606.04752","url_pdf":"https://arxiv.org/pdf/2606.04752v1","authors":"[\"Ossi Lehtinen\"]","published":"2026-06-03T11:35:09Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":612779,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-04T02:13:16.786527022Z","DeletedAt":null,"paper_id":3050092,"paper_url":"https://arxiv.org/abs/2606.04752","paper_title":"An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers","repo_url":"https://github.com/OssiLehtinen/channel-encoder-audit","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
