{"ID":2831585,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07331","arxiv_id":"2512.07331","title":"The Inductive Bottleneck: Data-Driven Emergence of Representational Sparsity in Vision Transformers","abstract":"Vision Transformers (ViTs) lack the hierarchical inductive biases inherent to Convolutional Neural Networks (CNNs), theoretically allowing them to maintain high-dimensional representations throughout all layers. However, recent observations suggest ViTs often spontaneously manifest a \"U-shaped\" entropy profile-compressing information in middle layers before expanding it for the final classification. In this work, we demonstrate that this \"Inductive Bottleneck\" is not an architectural artifact, but a data-dependent adaptation. By analyzing the layer-wise Effective Encoding Dimension (EED) of DINO-trained ViTs across datasets of varying compositional complexity (UC Merced, Tiny ImageNet, and CIFAR-100), we show that the depth of the bottleneck correlates strongly with the semantic abstraction required by the task. We find that while texture-heavy datasets preserve high-rank representations throughout, object-centric datasets drive the network to dampen high-frequency information in middle layers, effectively \"learning\" a bottleneck to isolate semantic features.","short_abstract":"Vision Transformers (ViTs) lack the hierarchical inductive biases inherent to Convolutional Neural Networks (CNNs), theoretically allowing them to maintain high-dimensional representations throughout all layers. However, recent observations suggest ViTs often spontaneously manifest a \"U-shaped\" entropy profile-compress...","url_abs":"https://arxiv.org/abs/2512.07331","url_pdf":"https://arxiv.org/pdf/2512.07331v1","authors":"[\"Kanishk Awadhiya\"]","published":"2025-12-08T09:18:32Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Convolutional Neural Network\"]","has_code":false}
