{"ID":2883871,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.08183","arxiv_id":"2508.08183","title":"THAT: Token-wise High-frequency Augmentation Transformer for Hyperspectral Pansharpening","abstract":"Transformer-based methods have demonstrated strong potential in hyperspectral pansharpening by modeling long-range dependencies. However, their effectiveness is often limited by redundant token representations and a lack of multi-scale feature modeling. Hyperspectral images exhibit intrinsic spectral priors (e.g., abundance sparsity) and spatial priors (e.g., non-local similarity), which are critical for accurate reconstruction. From a spectral-spatial perspective, Vision Transformers (ViTs) face two major limitations: they struggle to preserve high-frequency components--such as material edges and texture transitions--and suffer from attention dispersion across redundant tokens. These issues stem from the global self-attention mechanism, which tends to dilute high-frequency signals and overlook localized details. To address these challenges, we propose the Token-wise High-frequency Augmentation Transformer (THAT), a novel framework designed to enhance hyperspectral pansharpening through improved high-frequency feature representation and token selection. Specifically, THAT introduces: (1) Pivotal Token Selective Attention (PTSA) to prioritize informative tokens and suppress redundancy; (2) a Multi-level Variance-aware Feed-forward Network (MVFN) to enhance high-frequency detail learning. Experiments on standard benchmarks show that THAT achieves state-of-the-art performance with improved reconstruction quality and efficiency. The source code is available at https://github.com/kailuo93/THAT.","short_abstract":"Transformer-based methods have demonstrated strong potential in hyperspectral pansharpening by modeling long-range dependencies. However, their effectiveness is often limited by redundant token representations and a lack of multi-scale feature modeling. Hyperspectral images exhibit intrinsic spectral priors (e.g., abun...","url_abs":"https://arxiv.org/abs/2508.08183","url_pdf":"https://arxiv.org/pdf/2508.08183v1","authors":"[\"Hongkun Jin\",\"Hongcheng Jiang\",\"Zejun Zhang\",\"Yuan Zhang\",\"Jia Fu\",\"Tingfeng Li\",\"Kai Luo\"]","published":"2025-08-11T17:03:10Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"eess.IV\"]","methods":"[\"Vision Transformer\",\"Transformer\"]","has_code":false,"code_links":[{"ID":611030,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2883871,"paper_url":"https://arxiv.org/abs/2508.08183","paper_title":"THAT: Token-wise High-frequency Augmentation Transformer for Hyperspectral Pansharpening","repo_url":"https://github.com/kailuo93/THAT","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}