{"ID":2850247,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22207","arxiv_id":"2510.22207","title":"The Lossy Horizon: Error-Bounded Predictive Coding for Lossy Text Compression (Episode I)","abstract":"Large Language Models (LLMs) can achieve near-optimal lossless compression by acting as powerful probability models. We investigate their use in the lossy domain, where reconstruction fidelity is traded for higher compression ratios. This paper introduces Error-Bounded Predictive Coding (EPC), a lossy text codec that leverages a Masked Language Model (MLM) as a decompressor. Instead of storing a subset of original tokens, EPC allows the model to predict masked content and stores minimal, rank-based corrections only when the model's top prediction is incorrect. This creates a residual channel that offers continuous rate-distortion control. We compare EPC to a simpler Predictive Masking (PM) baseline and a transform-based Vector Quantisation with a Residual Patch (VQ+RE) approach. Through an evaluation that includes precise bit accounting and rate-distortion analysis, we demonstrate that EPC consistently dominates PM, offering superior fidelity at a significantly lower bit rate by more efficiently utilising the model's intrinsic knowledge.","short_abstract":"Large Language Models (LLMs) can achieve near-optimal lossless compression by acting as powerful probability models. We investigate their use in the lossy domain, where reconstruction fidelity is traded for higher compression ratios. This paper introduces Error-Bounded Predictive Coding (EPC), a lossy text codec that l...","url_abs":"https://arxiv.org/abs/2510.22207","url_pdf":"https://arxiv.org/pdf/2510.22207v1","authors":"[\"Nnamdi Aghanya\",\"Jun Li\",\"Kewei Wang\"]","published":"2025-10-25T08:18:31Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CL\",\"cs.IT\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
