Transmuting prompts into weights

cs.LG arXiv:2510.08734
View PDF arXiv JSON

Abstract

A growing body of research has demonstrated that the behavior of large language models can be effectively controlled at inference time by directly modifying their internal states, either through vector additions to their activations or through updates to their weight matrices. These techniques, while powerful, are often guided by empirical heuristics, such as deriving steering vectors from the average activations of contrastive prompts. This work provides a theoretical foundation for these interventions, explaining how they emerge from the fundamental computations of the transformer architecture. Building on the recent finding that a prompt's influence can be mathematically mapped to token-dependent implicit weight updates (Dherin et. al, 2025), we derive a principled method for condensing this information into token-independent thought vectors and thought matrices. These constructs provide a theoretical explanation for existing vector-and-matrix-based model editing techniques and offer a direct, computationally-grounded method for transmuting textual input into reusable weight updates.

PDF Viewer