{"ID":2854033,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.15511","arxiv_id":"2510.15511","title":"Language Models are Injective and Hence Invertible","abstract":"Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.","short_abstract":"Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transform...","url_abs":"https://arxiv.org/abs/2510.15511","url_pdf":"https://arxiv.org/pdf/2510.15511v4","authors":"[\"Giorgos Nikolaou\",\"Tommaso Mencattini\",\"Donato Crisostomi\",\"Andrea Santilli\",\"Yannis Panagakis\",\"Emanuele Rodolà\"]","published":"2025-10-17T10:25:30Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}
