{"ID":2894694,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.09982","arxiv_id":"2507.09982","title":"TextOmics-Guided Diffusion for Hit-like Molecular Generation","abstract":"Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery. However, the field lacks heterogeneous data and unified frameworks for integrating diverse molecular representations. To bridge this gap, we introduce TextOmics, a pioneering benchmark that establishes one-to-one correspondences between omics expressions and molecular textual descriptions. TextOmics provides a heterogeneous dataset that facilitates molecular generation through representations alignment. Built upon this foundation, we propose ToDi, a generative framework that jointly conditions on omics expressions and molecular textual descriptions to produce biologically relevant, chemically valid, hit-like molecules. ToDi leverages two encoders (OmicsEn and TextEn) to capture multi-level biological and semantic associations, and develops conditional diffusion (DiffGen) for controllable generation. Extensive experiments confirm the effectiveness of TextOmics and demonstrate ToDi outperforms existing state-of-the-art approaches, while also showcasing remarkable potential in zero-shot therapeutic molecular generation. Sources are available at: https://github.com/hala-ToDi.","short_abstract":"Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery. However, the field lacks heterogeneous data and unified frameworks for integrating diverse molecular representations. To bridge this gap, we introduce TextOmics, a pioneering benchmark that establishes one-to-one c...","url_abs":"https://arxiv.org/abs/2507.09982","url_pdf":"https://arxiv.org/pdf/2507.09982v1","authors":"[\"Hang Yuan\",\"Chen Li\",\"Wenjun Ma\",\"Yuncheng Jiang\"]","published":"2025-07-14T06:56:37Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Diffusion Model\"]","has_code":false}
