{"ID":3052315,"CreatedAt":"2026-06-04T04:41:36.695875263Z","UpdatedAt":"2026-06-06T04:55:27.670236374Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04461","arxiv_id":"2606.04461","title":"ChannelTok: Efficient Flexible-Length Vision Tokenization","abstract":"Leading flexible vision tokenizers achieve SOTA quality at an extreme cost, relying on parameter-heavy backbones and slow, multi-step generative decoders. We depart from this complex, spatial-token paradigm and introduce a simple, lightweight, and fast channel-wise flexible-length tokenizer. Our method treats each latent channel as a visual token, enabling a parameter-efficient CNN-Transformer hybrid backbone. Furthermore, employing a stochastic tail-dropping paradigm during training naturally forces channels to organize by semantic importance. This allows for flexible compression at inference by simply retaining the first $k$ channels, and naturally enables variable-length autoregressive image generation. We validate our approach through extensive experiments on ImageNet, demonstrating consistent quality across diverse token budgets. The results establish a new quality-efficiency frontier: our model achieves state-of-the-art perceptual quality (rFID 2.92) while being $8.6\\times$ faster in decoding and $2.1\\times$ smaller (159M params) than the next-best alternative. Our work establishes channel-wise tokenization as a powerful and practical paradigm for efficient visual representation. Project page: https://channeltok.github.io","short_abstract":"Leading flexible vision tokenizers achieve SOTA quality at an extreme cost, relying on parameter-heavy backbones and slow, multi-step generative decoders. We depart from this complex, spatial-token paradigm and introduce a simple, lightweight, and fast channel-wise flexible-length tokenizer. Our method treats each late...","url_abs":"https://arxiv.org/abs/2606.04461","url_pdf":"https://arxiv.org/pdf/2606.04461v1","authors":"[\"Sukriti Paul\",\"Arpit Bansal\",\"Tom Goldstein\"]","published":"2026-06-03T05:10:51Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\",\"Generative Adversarial Network\",\"Convolutional Neural Network\"]","has_code":false}
