{"ID":2899326,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.01939","arxiv_id":"2507.01939","title":"SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars","abstract":"In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to structured language, encode rich physical and chemical information about stars. By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications. As a proof of concept, SpecCLIP involves pre-training on two spectral types--LAMOST low-resolution and Gaia XP--followed by contrastive alignment using the CLIP (Contrastive Language-Image Pre-training) framework, adapted to associate spectra from different instruments. This alignment is complemented by auxiliary decoders that preserve spectrum-specific information and enable translation (prediction) between spectral types, with the former achieved by maximizing mutual information between embeddings and input spectra. The result is a cross-spectrum framework enabling intrinsic calibration and flexible applications across instruments. We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar-parameter estimation and chemical-abundance determination. SpecCLIP also enhances the accuracy and precision of parameter estimates benchmarked against external survey data. Additionally, its similarity search and cross-spectrum prediction capabilities offer potential for anomaly detection. Our results suggest that contrastively trained foundation models enriched with spectrum-aware decoders can advance precision stellar spectroscopy. Our code SpecCLIP is publicly available at https://github.com/Xiaosheng-Zhao/SpecCLIP","short_abstract":"In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to...","url_abs":"https://arxiv.org/abs/2507.01939","url_pdf":"https://arxiv.org/pdf/2507.01939v4","authors":"[\"Xiaosheng Zhao\",\"Yang Huang\",\"Guirong Xue\",\"Xiao Kong\",\"Jifeng Liu\",\"Xiaoyu Tang\",\"Timothy C. Beers\",\"Yuan-Sen Ting\",\"A-Li Luo\"]","published":"2025-07-02T17:49:52Z","proceeding":"astro-ph.IM","tasks":"[\"astro-ph.IM\",\"astro-ph.SR\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":612472,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2899326,"paper_url":"https://arxiv.org/abs/2507.01939","paper_title":"SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars","repo_url":"https://github.com/Xiaosheng-Zhao/SpecCLIP","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
