{"ID":2861762,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00368","arxiv_id":"2510.00368","title":"The Transformer Cookbook","abstract":"We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we synthesize this disparate body of findings into a curated set of recipes that demonstrate how to implement everything from basic arithmetic in feed-forward layers to complex data routing via self-attention. Our mise en place of formulations is for both newcomers seeking an accessible entry point and experts in need of a systematic reference. This unified presentation of transformer constructions provides a foundation for future work spanning theoretical research in computational complexity to empirical investigations in architecture design and interpretability.","short_abstract":"We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we syn...","url_abs":"https://arxiv.org/abs/2510.00368","url_pdf":"https://arxiv.org/pdf/2510.00368v1","authors":"[\"Andy Yang\",\"Christopher Watson\",\"Anton Xue\",\"Satwik Bhattamishra\",\"Jose Llarena\",\"William Merrill\",\"Emile Dos Santos Ferreira\",\"Anej Svete\",\"David Chiang\"]","published":"2025-10-01T00:25:07Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false}
