{"ID":2826395,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19693","arxiv_id":"2512.19693","title":"The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding","abstract":"Deep representations across modalities are inherently intertwined. In this paper, we systematically analyze the spectral characteristics of various semantic and pixel encoders. Interestingly, our study uncovers a highly inspiring and rarely explored correspondence between an encoder's feature spectrum and its functional role: semantic encoders primarily capture low-frequency components that encode abstract meaning, whereas pixel encoders additionally retain high-frequency information that conveys fine-grained detail. This heuristic finding offers a unifying perspective that ties encoder behavior to its underlying spectral structure. We define it as the Prism Hypothesis, where each data modality can be viewed as a projection of the natural world onto a shared feature spectrum, just like the prism. Building on this insight, we propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details via an innovative frequency-band modulator, enabling their seamless coexistence. Extensive experiments demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity within a single latent space, achieving state-of-the-art performance. Moreover, we show that UAE can be directly applied to pixel-space modeling, significantly improving both FID and IS over the vanilla JIT baseline. Our code is avaliable at: https://github.com/WeichenFan/UAE.","short_abstract":"Deep representations across modalities are inherently intertwined. In this paper, we systematically analyze the spectral characteristics of various semantic and pixel encoders. Interestingly, our study uncovers a highly inspiring and rarely explored correspondence between an encoder's feature spectrum and its functiona...","url_abs":"https://arxiv.org/abs/2512.19693","url_pdf":"https://arxiv.org/pdf/2512.19693v5","authors":"[\"Weichen Fan\",\"Haiwen Diao\",\"Quan Wang\",\"Dahua Lin\",\"Ziwei Liu\"]","published":"2025-12-22T18:59:57Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":605741,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2826395,"paper_url":"https://arxiv.org/abs/2512.19693","paper_title":"The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding","repo_url":"https://github.com/WeichenFan/UAE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
