{"ID":2852160,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.18462","arxiv_id":"2510.18462","title":"DePass: Unified Feature Attributing by Simple Decomposed Forward Pass","abstract":"Attributing the behavior of Transformer models to internal computations is a central challenge in mechanistic interpretability. We introduce DePass, a unified framework for feature attribution based on a single decomposed forward pass. DePass decomposes hidden states into customized additive components, then propagates them with attention scores and MLP's activations fixed. It achieves faithful, fine-grained attribution without requiring auxiliary training. We validate DePass across token-level, model component-level, and subspace-level attribution tasks, demonstrating its effectiveness and fidelity. Our experiments highlight its potential to attribute information flow between arbitrary components of a Transformer model. We hope DePass serves as a foundational tool for broader applications in interpretability.","short_abstract":"Attributing the behavior of Transformer models to internal computations is a central challenge in mechanistic interpretability. We introduce DePass, a unified framework for feature attribution based on a single decomposed forward pass. DePass decomposes hidden states into customized additive components, then propagates...","url_abs":"https://arxiv.org/abs/2510.18462","url_pdf":"https://arxiv.org/pdf/2510.18462v2","authors":"[\"Xiangyu Hong\",\"Che Jiang\",\"Kai Tian\",\"Biqing Qi\",\"Youbang Sun\",\"Ning Ding\",\"Bowen Zhou\"]","published":"2025-10-21T09:36:12Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Transformer\"]","has_code":false}
