{"ID":2831863,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.08048","arxiv_id":"2512.08048","title":"Family Matters: A Systematic Study of Spatial vs. Frequency Masking for Continual Test-Time Adaptation","abstract":"Recent continual test-time adaptation (CTTA) methods adopt masked image modeling to stabilize learning under distribution shift, yet each treats its masking family $F$ as a fixed design choice and innovates exclusively along the selection strategy $S$, leaving the family axis underexplored. We present a systematic empirical study that isolates this axis. Using a controlled CTTA instantiation -- Mask to Adapt (M2A) -- that fixes $S=random$ and standard losses, we vary only $F$ across spatial (patch, pixel) and frequency (all-band, low-band, high-band) families while keeping every other component identical. The study's contributions are the design guidance it extracts for the CTTA settings we evaluated: (1)~\\emph{the masking family determines whether adaptation compounds useful structure or compounds errors} -- on patch-tokenized architectures, spatial masking accumulates stable representations over long streams while frequency masking collapses catastrophically. We characterize this instability through a \\emph{structural-preservation} account, where spatial coherence maintains the broad-spectrum redundancy needed to avoid terminally overlapping with a corruption's spectral signature; (2)~\\emph{the optimal family depends on architecture-task alignment} -- on CNNs, whose overlapping receptive fields dilute patch occlusion, the family gap vanishes, whereas on fine-grained tasks with global cues and large-capacity ViTs, frequency masking becomes competitive. In confounded system-level comparisons -- where baselines also differ in losses and auxiliary components -- M2A's random selection performs comparably to heuristic strategies, though we treat this observation as suggestive context rather than a controlled quantification of $S$'s relative importance.","short_abstract":"Recent continual test-time adaptation (CTTA) methods adopt masked image modeling to stabilize learning under distribution shift, yet each treats its masking family $F$ as a fixed design choice and innovates exclusively along the selection strategy $S$, leaving the family axis underexplored. We present a systematic empi...","url_abs":"https://arxiv.org/abs/2512.08048","url_pdf":"https://arxiv.org/pdf/2512.08048v2","authors":"[\"Chandler Timm C. Doloriel\",\"Yunbei Zhang\",\"Yeonguk Yu\",\"Taki Hasan Rafi\",\"Muhammad salman siddiqui\",\"Tor Kristian Stevik\",\"Habib Ullah\",\"Fadi Al Machot\",\"Kristian Hovde Liland\"]","published":"2025-12-08T21:16:44Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Convolutional Neural Network\"]","has_code":false}