Latent thermodynamic flows: unified representation learning and generative modeling of temperature-dependent behaviors from limited data
Abstract
Accurate characterization of equilibrium distributions in complex molecular systems, and their dependence on environmental factors such as temperature, is crucial for understanding thermodynamic properties and transition mechanisms. However, obtaining converged sampling of these high-dimensional distributions using approaches like molecular dynamics simulations often incurs prohibitive computational costs. And the absence of informative low-dimensional representations for these distributions hampers interpretability and many downstream analyses. Recent advances in generative AI, particularly flow-based models, show promise for efficiently modeling molecular equilibrium distributions; yet, without tailored representation learning, their generative performance on high-dimensional distributions remains limited and inexplicable. In this work, we present Latent Thermodynamic Flows (LaTF), an end-to-end framework that seamlessly integrates representation learning with generative modeling. LaTF unifies the State Predictive Information Bottleneck with Normalizing Flows to simultaneously learn low-dimensional representations, i.e., collective variables, classify metastable states, and generate equilibrium distributions across temperatures beyond the training data. The joint optimization of representation learning and generative modeling allows LaTF to mutually enhance both components, making optimal use of costly simulation data to accurately reproduce the system's equilibrium behaviors over the meaningful latent representation that captures its slow, essential degrees of freedom. We demonstrate LaTF's effectiveness across diverse systems, including a model potential, the Chignolin protein, and a cluster of Lennard-Jones particles, with thorough evaluations and benchmarking using multiple metrics and extensive simulations. Moreover, we apply LaTF to a RNA tetraloop system, where despite using simulation data from only two temperatures, LaTF reconstructs the temperature-dependent structural ensemble and melting behavior, consistent with experimental and prior extensive computational results.

Please wait while we load your content...