{"ID":2888608,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.00920","arxiv_id":"2508.00920","title":"Uni-Mol3: A Multi-Molecular Foundation Model for Advancing Organic Reaction Modeling","abstract":"Organic reaction, the foundation of modern chemical industry, is crucial for new material development and drug discovery. However, deciphering reaction mechanisms and modeling multi-molecular relationships remain formidable challenges due to the complexity of molecular dynamics. While several state-of-the-art models like Uni-Mol2 have revolutionized single-molecular representation learning, their extension to multi-molecular systems, where chemical reactions inherently occur, has been underexplored. This paper introduces Uni-Mol3, a novel deep learning framework that employs a hierarchical pipeline for multi-molecular reaction modeling. At its core, Uni-Mol3 adopts a multi-scale molecular tokenizer (Mol-Tokenizer) that encodes 3D structures of molecules and other features into discrete tokens, creating a 3D-aware molecular language. The framework innovatively combines two pre-training stages: molecular pre-training to learn the molecular grammars and reaction pre-training to capture fundamental reaction principles, forming a progressive learning paradigm from single- to multi-molecular systems. With prompt-aware downstream fine-tuning, Uni-Mol3 demonstrates exceptional performance in diverse organic reaction tasks and supports multi-task prediction with strong generalizability. Experimental results across 10 datasets spanning 4 downstream tasks show that Uni-Mol3 outperforms existing methods, validating its effectiveness in modeling complex organic reactions. This work not only ushers in an alternative paradigm for multi-molecular computational modeling but also charts a course for intelligent organic reaction by bridging molecular representation with reaction mechanism understanding.","short_abstract":"Organic reaction, the foundation of modern chemical industry, is crucial for new material development and drug discovery. However, deciphering reaction mechanisms and modeling multi-molecular relationships remain formidable challenges due to the complexity of molecular dynamics. While several state-of-the-art models li...","url_abs":"https://arxiv.org/abs/2508.00920","url_pdf":"https://arxiv.org/pdf/2508.00920v2","authors":"[\"Lirong Wu\",\"Junjie Wang\",\"Zhifeng Gao\",\"Xiaohong Ji\",\"Rong Zhu\",\"Xinyu Li\",\"Linfeng Zhang\",\"Guolin Ke\",\"Weinan E\"]","published":"2025-07-30T02:38:52Z","proceeding":"physics.chem-ph","tasks":"[\"physics.chem-ph\",\"cs.LG\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
