Coupled fragment-based generative modeling with stochastic interpolants
Abstract
Fragment-based drug design (FBDD) has become a key approach in structure-based drug discovery, allowing researchers to systematically develop molecular fragments into potent ligands. Although recent generative AI models, such as diffusion-based approaches, show great potential for designing new molecules, applying them to fragment-based methods faces challenges due to mismatches between training and inference procedures, as well as computational limitations. In this work, we develop a generative model based on stochastic interpolants that unify diffusion and flow matching paradigms, learning to create fragments through conditional training on molecular substructures. Our experiments show that models trained with explicit fragment-based conditioning perform much better than unconditional models that are adapted for fragment completion tasks. We compare diffusion models with flow matching models using identical backbone architectures and find that flow matching delivers better convergence and produces higher-quality 3D molecular poses with reduced strain energies, all while needing fewer computational steps. We test our method on standard benchmark datasets and examine different fragmentation strategies, finding that the choice of fragmentation algorithm plays an important role in model performance. Through a detailed case study on an internal PLK3 inhibitor structure, we demonstrate that our approach can generate new fragments that show computationally favorable docking scores and binding energy estimates competitive with tested internal Pfizer compounds, while also exploring regions of chemical space that go beyond existing fragment libraries. These findings establish flow matching within the stochastic interpolants framework as a promising approach for fragment-based drug design, providing both improved computational efficiency and better molecular quality for structure-based optimization.

Please wait while we load your content...