{"ID":2857731,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09784","arxiv_id":"2510.09784","title":"Combined Representation and Generation with Diffusive State Predictive Information Bottleneck","abstract":"Generative modeling becomes increasingly data-intensive in high-dimensional spaces. In molecular science, where data collection is expensive and important events are rare, compression to lower-dimensional manifolds is especially important for various downstream tasks, including generation. We combine a time-lagged information bottleneck designed to characterize molecular important representations and a diffusion model in one joint training objective. The resulting protocol, which we term Diffusive State Predictive Information Bottleneck (D-SPIB), enables the balancing of representation learning and generation aims in one flexible architecture. Additionally, the model is capable of combining temperature information from different molecular simulation trajectories to learn a coherent and useful internal representation of thermodynamics. We benchmark D-SPIB on multiple molecular tasks and showcase its potential for exploring physical conditions outside the training set.","short_abstract":"Generative modeling becomes increasingly data-intensive in high-dimensional spaces. In molecular science, where data collection is expensive and important events are rare, compression to lower-dimensional manifolds is especially important for various downstream tasks, including generation. We combine a time-lagged info...","url_abs":"https://arxiv.org/abs/2510.09784","url_pdf":"https://arxiv.org/pdf/2510.09784v1","authors":"[\"Richard John\",\"Yunrui Qiu\",\"Lukas Herron\",\"Pratyush Tiwary\"]","published":"2025-10-10T18:46:21Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cond-mat.stat-mech\",\"q-bio.QM\"]","methods":"[\"Diffusion Model\"]","has_code":false}
