{"ID":2879024,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.17550","arxiv_id":"2508.17550","title":"In-Context Algorithm Emulation in Fixed-Weight Transformers","abstract":"We prove that a minimal Transformer with frozen weights emulates a broad class of algorithms by in-context prompting. We formalize two modes of in-context algorithm emulation. In the task-specific mode, for any continuous function $f: \\mathbb{R} \\to \\mathbb{R}$, we show the existence of a single-head softmax attention layer whose forward pass reproduces functions of the form $f(w^\\top x - y)$ to arbitrary precision. This general template subsumes many popular machine learning algorithms (e.g., gradient descent, linear regression, ridge regression). In the prompt-programmable mode, we prove universality: a single fixed-weight two-layer softmax attention module emulates all algorithms from the task-specific class (i.e., each implementable by a single softmax attention) via only prompting. Our key idea is to construct prompts that encode an algorithm's parameters into token representations, creating sharp dot-product gaps that force the softmax attention to follow the intended computation. This construction requires no feed-forward layers and no parameter updates. All adaptation happens through the prompt alone. Numerical results corroborate our theory. These findings forge a direct link between in-context learning and algorithmic emulation, and offer a simple mechanism for large Transformers to serve as prompt-programmable libraries of algorithms. They illuminate how GPT-style foundation models may swap algorithms via prompts alone, and establish a form of algorithmic universality in modern Transformer models.","short_abstract":"We prove that a minimal Transformer with frozen weights emulates a broad class of algorithms by in-context prompting. We formalize two modes of in-context algorithm emulation. In the task-specific mode, for any continuous function $f: \\mathbb{R} \\to \\mathbb{R}$, we show the existence of a single-head softmax attention...","url_abs":"https://arxiv.org/abs/2508.17550","url_pdf":"https://arxiv.org/pdf/2508.17550v2","authors":"[\"Jerry Yao-Chieh Hu\",\"Hude Liu\",\"Jennifer Yuntong Zhang\",\"Han Liu\"]","published":"2025-08-24T23:20:31Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"stat.ML\"]","methods":"[\"Transformer\"]","has_code":false}
