{"ID":2886929,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.08441","arxiv_id":"2508.08441","title":"SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data","abstract":"Automated molecular structure elucidation remains challenging, as existing approaches often depend on pre-compiled databases or restrict themselves to single spectroscopic modalities. Here we introduce SpectraLLM, a large language model that performs end-to-end structure prediction by reasoning over one or multiple spectra. Unlike conventional spectrum-to-structure pipelines, SpectraLLM represents both continuous (IR, Raman, UV-Vis, NMR) and discrete (MS) modalities in a shared language space, enabling it to capture substructural patterns that are complementary across different spectral types. We pretrain and fine-tune the model on small-molecule domains and evaluate it on four public benchmark datasets. SpectraLLM achieves state-of-the-art performance, substantially surpassing single-modality baselines. Moreover, it demonstrates strong robustness in unimodal settings and further improves prediction accuracy when jointly reasoning over diverse spectra, establishing a scalable paradigm for language-based spectroscopic analysis. Code is available at https://github.com/OPilgrim/SpectraLLM.","short_abstract":"Automated molecular structure elucidation remains challenging, as existing approaches often depend on pre-compiled databases or restrict themselves to single spectroscopic modalities. Here we introduce SpectraLLM, a large language model that performs end-to-end structure prediction by reasoning over one or multiple spe...","url_abs":"https://arxiv.org/abs/2508.08441","url_pdf":"https://arxiv.org/pdf/2508.08441v3","authors":"[\"Yunyue Su\",\"Jiahui Chen\",\"Zao Jiang\",\"Zhenyi Zhong\",\"Liang Wang\",\"Qiang Liu\",\"Zhaoxiang Zhang\"]","published":"2025-08-04T13:33:38Z","proceeding":"q-bio.QM","tasks":"[\"q-bio.QM\",\"cs.CE\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611384,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886929,"paper_url":"https://arxiv.org/abs/2508.08441","paper_title":"SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data","repo_url":"https://github.com/OPilgrim/SpectraLLM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
