{"ID":3004771,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T11:43:53.432517148Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03739","arxiv_id":"2606.03739","title":"Entropy Gate: Entropy Quenching for Near-Lossless Token Compression in LLM Pipelines","abstract":"LLM pipelines waste substantial token budgets on low-information content: repeated context, verbose responses, and redundant boilerplate. We introduce Entropy Gate, a token compression framework applying entropy quenching $-$ a thermodynamic process that progressively freezes out low-energy tokens while preserving semantic fidelity. Each token receives a multi-factor information energy $E(t)$ combining statistical, structural, and positional components. An adaptive quenching schedule $T(τ) = T_0 / (1 + ατ)$ removes tokens whose Boltzmann survival probability $p_i = \\exp(-E_i / kT)$ falls below threshold, with a fidelity gate halting compression when energy-weighted similarity drops below $θ$. We prove token selection by descending $E(t)$ maximizes expected semantic preservation, that quenching produces nested survival sets, and that achievable compression approaches the information-theoretic limit $\\text{CR} \\to 1 - I(P; T)/H(P)$. A Phase 1 heuristic achieves 40-60% compression across five prompt categories while maintaining $S_E \u003e 0.80$, with energy-squared amplification $E \\to E^2$ adding 10-25 percentage points. Context deduplication adds 50-70% savings on repeated blocks. Output-side quenching, motivated by findings that brevity improves accuracy, further reduces response overhead. Combined with external memory, reduction composes multiplicatively to 88-96% for agentic workloads. The framework is stateless, model-agnostic, and deploys as an OpenAI-compatible HTTP proxy.","short_abstract":"LLM pipelines waste substantial token budgets on low-information content: repeated context, verbose responses, and redundant boilerplate. We introduce Entropy Gate, a token compression framework applying entropy quenching $-$ a thermodynamic process that progressively freezes out low-energy tokens while preserving sema...","url_abs":"https://arxiv.org/abs/2606.03739","url_pdf":"https://arxiv.org/pdf/2606.03739v1","authors":"[\"Justice Owusu Agyemang\",\"Jerry John Kponyo\",\"Kwame Opuni-Boachie Obour Agyekum\",\"Francisca Adoma Acheampong\",\"Kwame Agyeman-Prempeh Agyekum\",\"James Dzisi Gadze\"]","published":"2026-06-02T14:55:02Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.IT\"]","methods":"[\"Large Language Model\"]","has_code":false}
