{"ID":2878281,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.19414","arxiv_id":"2508.19414","title":"Even Heads Fix Odd Errors: Mechanistic Discovery and Surgical Repair in Transformer Attention","abstract":"We present a mechanistic case study of a format-dependent reasoning failure in Llama-3.1-8B-Instruct, where the model incorrectly judges \"9.11\" as larger than \"9.8\" in chat or Q\u0026A formats, but answers correctly in simple format. Through systematic intervention, we discover transformers implement even/odd attention head specialization: even indexed heads handle numerical comparison, while odd heads serve incompatible functions. The bug requires exactly 8 even heads at Layer 10 for perfect repair. Any combination of 8+ even heads succeeds, while 7 or fewer completely fails, revealing sharp computational thresholds with perfect redundancy among the 16 even heads. SAE analysis reveals the mechanism: format representations separate (10% feature overlap at Layer 7), then re-entangle with different weightings (80% feature overlap at Layer 10), with specific features showing 1.5x amplification in failing formats. We achieve perfect repair using only 25% of attention heads and identify a 60% pattern replacement threshold, demonstrating that apparent full-module requirements hide sophisticated substructure with implications for interpretability and efficiency. All of our code is available at https://github.com/gussand/surgeon.","short_abstract":"We present a mechanistic case study of a format-dependent reasoning failure in Llama-3.1-8B-Instruct, where the model incorrectly judges \"9.11\" as larger than \"9.8\" in chat or Q\u0026A formats, but answers correctly in simple format. Through systematic intervention, we discover transformers implement even/odd attention head...","url_abs":"https://arxiv.org/abs/2508.19414","url_pdf":"https://arxiv.org/pdf/2508.19414v1","authors":"[\"Gustavo Sandoval\"]","published":"2025-08-26T20:33:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":610471,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2878281,"paper_url":"https://arxiv.org/abs/2508.19414","paper_title":"Even Heads Fix Odd Errors: Mechanistic Discovery and Surgical Repair in Transformer Attention","repo_url":"https://github.com/gussand/surgeon","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
