{"ID":2837448,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.18925","arxiv_id":"2511.18925","title":"LookSharp: Attention Entropy Minimization for Test-Time Adaptation","abstract":"Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. While entropy minimization over the output distribution has proven effective as a TTA loss, we study using the intermediate distributions computed by transformers in the attention mechanism. We propose LookSharp, which minimizes the entropy of CLS-to-patch attention in the final layer as a novel TTA objective, encouraging the model to maintain focused attention on shifted data. We demonstrate that attention entropy minimization improves robustness on ImageNet-C. We also show that it is complementary to output entropy minimization and maintains performance on clean data.","short_abstract":"Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. While entropy minimization over the output distribution has proven effective as a TTA loss, we study using the intermediate distributions computed by transformers in the attention mechanism. We propose LookSharp, which mi...","url_abs":"https://arxiv.org/abs/2511.18925","url_pdf":"https://arxiv.org/pdf/2511.18925v3","authors":"[\"Yash Mali\",\"Evan Shelhamer\"]","published":"2025-11-24T09:32:01Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false}