{"ID":2896273,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.07995","arxiv_id":"2507.07995","title":"Single-pass Adaptive Image Tokenization for Minimum Program Search","abstract":"According to Algorithmic Information Theory (AIT) -- Intelligent representations compress data into the shortest possible program that can reconstruct its content, exhibiting low Kolmogorov Complexity (KC). In contrast, most visual representation learning systems use fixed-length representations for all inputs, ignoring variations in complexity or familiarity. Recent adaptive tokenization methods address this by allocating variable-length representations but typically require test-time search over multiple encodings to find the most predictive one. Inspired by Kolmogorov Complexity principles, we propose a single-pass adaptive tokenizer, KARL, which predicts the appropriate number of tokens for an image in a single forward pass, halting once its approximate KC is reached. The token count serves as a proxy for the minimum description length. KARL's training procedure closely resembles the Upside-Down Reinforcement Learning paradigm, as it learns to conditionally predict token halting based on a desired reconstruction quality. KARL matches the performance of recent adaptive tokenizers while operating in a single pass. We present scaling laws for KARL, analyzing the role of encoder/decoder size, continuous vs. discrete tokenization and more. Additionally, we offer a conceptual study drawing an analogy between Adaptive Image Tokenization and Algorithmic Information Theory, examining the predicted image complexity (KC) across axes such as structure vs. noise and in- vs. out-of-distribution familiarity -- revealing alignment with human intuition.","short_abstract":"According to Algorithmic Information Theory (AIT) -- Intelligent representations compress data into the shortest possible program that can reconstruct its content, exhibiting low Kolmogorov Complexity (KC). In contrast, most visual representation learning systems use fixed-length representations for all inputs, ignorin...","url_abs":"https://arxiv.org/abs/2507.07995","url_pdf":"https://arxiv.org/pdf/2507.07995v1","authors":"[\"Shivam Duggal\",\"Sanghyun Byun\",\"William T. Freeman\",\"Antonio Torralba\",\"Phillip Isola\"]","published":"2025-07-10T17:59:53Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
