{"ID":2880381,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.16663","arxiv_id":"2508.16663","title":"The Loupe: A Plug-and-Play Attention Module for Amplifying Discriminative Features in Vision Transformers","abstract":"Fine-Grained Visual Classification (FGVC) requires models to focus on subtle, task-relevant regions rather than broad object context. We present The Loupe, a lightweight plug-and-play spatial gating module for hierarchical Vision Transformers. The module is inserted at an intermediate feature stage, predicts a single-channel spatial mask with a small CNN, and uses that mask to reweight feature activations during end-to-end training with a cross-entropy objective and an l1 sparsity term. On CUB-200-2011, The Loupe improves Swin-Base from 88.36% to 91.72% and Swin-Tiny from 85.14% to 88.61%, with under 0.1% additional parameters. Ablations show that the improvement depends on the insertion point and the sparsity regularizer, suggesting that controlled spatial gating is more effective than naive multi-scale masking in this setting. Qualitative results indicate that the learned masks often align with discriminative bird parts, although the module is not a substitute for part-level supervision and can fail under occlusion or fine-grained intra-part differences.","short_abstract":"Fine-Grained Visual Classification (FGVC) requires models to focus on subtle, task-relevant regions rather than broad object context. We present The Loupe, a lightweight plug-and-play spatial gating module for hierarchical Vision Transformers. The module is inserted at an intermediate feature stage, predicts a single-c...","url_abs":"https://arxiv.org/abs/2508.16663","url_pdf":"https://arxiv.org/pdf/2508.16663v2","authors":"[\"Naren Sengodan\"]","published":"2025-08-20T19:07:21Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Convolutional Neural Network\"]","has_code":false}
