{"ID":2828218,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.15938","arxiv_id":"2512.15938","title":"SALVE: Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks","abstract":"Deep neural networks achieve impressive performance but remain difficult to interpret and control. We present SALVE (Sparse Autoencoder-Latent Vector Editing), a unified \"discover, validate, and control\" framework that bridges mechanistic interpretability and model editing. Using an $\\ell_1$-regularized autoencoder, we learn a sparse, model-native feature basis without supervision. We validate these features with Grad-FAM, a feature-level saliency mapping method that visually grounds latent features in input data. Leveraging the autoencoder's structure, we perform precise and permanent weight-space interventions, enabling continuous modulation of both class-defining and cross-class features. We further derive a critical suppression threshold, $α_{crit}$, quantifying each class's reliance on its dominant feature, supporting fine-grained robustness diagnostics. Our approach is validated on both convolutional (ResNet-18) and transformer-based (ViT-B/16) models, demonstrating consistent, interpretable control over their behavior. This work contributes a principled methodology for turning feature discovery into actionable model edits, advancing the development of transparent and controllable AI systems.","short_abstract":"Deep neural networks achieve impressive performance but remain difficult to interpret and control. We present SALVE (Sparse Autoencoder-Latent Vector Editing), a unified \"discover, validate, and control\" framework that bridges mechanistic interpretability and model editing. Using an $\\ell_1$-regularized autoencoder, we...","url_abs":"https://arxiv.org/abs/2512.15938","url_pdf":"https://arxiv.org/pdf/2512.15938v2","authors":"[\"Vegard Flovik\"]","published":"2025-12-17T20:06:03Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false}
