{"ID":2884947,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10018","arxiv_id":"2508.10018","title":"A Rose by Any Other Name Would Smell as Sweet: Categorical Homotopy Theory for Large Language Models","abstract":"Natural language is replete with superficially different statements, such as ``Charles Darwin wrote\" and ``Charles Darwin is the author of\", which carry the same meaning. Large language models (LLMs) should generate the same next-token probabilities in such cases, but usually do not. Empirical workarounds have been explored, such as using k-NN estimates of sentence similarity to produce smoothed estimates. In this paper, we tackle this problem more abstractly, introducing a categorical homotopy framework for LLMs. We introduce an LLM Markov category to represent probability distributions in language generated by an LLM, where the probability of a sentence, such as ``Charles Darwin wrote\" is defined by an arrow in a Markov category. However, this approach runs into difficulties as language is full of equivalent rephrases, and each generates a non-isomorphic arrow in the LLM Markov category. To address this fundamental problem, we use categorical homotopy techniques to capture ``weak equivalences\" in an LLM Markov category. We present a detailed overview of application of categorical homotopy to LLMs, from higher algebraic K-theory to model categories, building on powerful theoretical results developed over the past half a century.","short_abstract":"Natural language is replete with superficially different statements, such as ``Charles Darwin wrote\" and ``Charles Darwin is the author of\", which carry the same meaning. Large language models (LLMs) should generate the same next-token probabilities in such cases, but usually do not. Empirical workarounds have been exp...","url_abs":"https://arxiv.org/abs/2508.10018","url_pdf":"https://arxiv.org/pdf/2508.10018v1","authors":"[\"Sridhar Mahadevan\"]","published":"2025-08-07T00:48:30Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"math.AT\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
