A Diverse and Chemically Relevant Solvation Model Benchmark Set with Flexible Molecules and Conformer Ensembles
Abstract
We introduce FlexiSol - a flexible solvation benchmark set with molecule ensembles. FlexiSol is the first of its kind to combine structurally and functionally complex, highly flexible solutes with exhaustive conformational sampling for systematic testing of solvation models. The dataset contains 824 experimental solvation energy and partition ratio data points (1551 unique molecule-solvent pairs) at standard-state conditions, focusing on drug-like, medium-to-large flexible molecules (up to 141 atoms), with over 25000 theoretical conformer/tautomer geometries across all phases. The set is publicly available and data points were selected in order to have minimal overlap with existing sets. Using this benchmark, we evaluate a broad spectrum of popular implicit solvation approaches, including physics-based (quantum-chemical and semiempirical) and data-driven models. We find that partition ratios are generally computed more accurately compared to solvation energies, likely due to partial error cancellation, yet most models still systematically underestimate strongly stabilizing interactions while overestimating weaker ones in both solvation energies and partition ratios. Additionally, we investigate the impact of three key ingredients: conformational ensemble, geometry choice (phase-specific vs. single-phase), and underlying electronic energy method. We find that full Boltzmann-weighted ensembles or just the lowest-energy conformers yield very similar accuracy - still both require conformational sampling - whereas random single-conformer selection degrades performance, especially for larger and flexible systems. Geometry relaxation and the level of electronic structure theory both influence results; however, the magnitude and sometimes direction of these effects can vary by method, as fortuitous error cancellation sometimes masks underlying deficiencies present in the models. As a complement to existing data sets, FlexiSol will enable more systematic development and evaluation of solvation models.
- This article is part of the themed collection: 15th anniversary: Chemical Science community collection