Open Access Article
This Open Access Article is licensed under a Creative Commons Attribution-Non Commercial 3.0 Unported Licence

A diverse and chemically relevant solvation model benchmark set with flexible molecules and conformer ensembles

Lukas Wittmann , Christian Erik Selzer and Stefan Grimme*
Mulliken Center for Theoretical Chemistry, Beringstraße 4, D-53115 Bonn, Germany. E-mail: grimme@uni-bonn.de

Received 21st August 2025 , Accepted 10th October 2025

First published on 13th October 2025


Abstract

We introduce FlexiSol – a solvation benchmark set with flexible molecules and ensembles. FlexiSol is the first of its kind to combine structurally and functionally complex, highly flexible solutes with exhaustive conformational sampling for systematic testing of solvation models. The dataset contains 824 experimental solvation energy and partition ratio data points (1551 unique molecule–solvent pairs) at standard-state conditions, focusing on drug-like, medium-to-large flexible molecules (up to 141 atoms), with over 25[thin space (1/6-em)]000 theoretical conformer/tautomer geometries across all phases. The set is publicly available and data points were selected in order to have minimal overlap with existing sets. Using this benchmark, we evaluate a broad spectrum of popular implicit solvation approaches, including physics-based (quantum–chemical and semiempirical) and data-driven models. We find that partition ratios are generally computed more accurately compared to solvation energies, likely due to partial error cancellation, yet most models still systematically underestimate strongly stabilizing interactions while overestimating weaker ones in both solvation energies and partition ratios. Additionally, we investigate the impact of three key ingredients: conformational ensemble, geometry choice (phase-specific vs. single-phase), and underlying electronic structure method. We find that full Boltzmann-weighted ensembles or just the lowest-energy conformers yield very similar accuracy – still both require conformational sampling – whereas random single-conformer selection degrades performance, especially for larger and flexible systems. Geometry relaxation and the level of electronic structure theory both influence results; however, the magnitude and sometimes direction of these effects can vary by method, as fortuitous error cancellation sometimes masks underlying deficiencies present in the models. As a complement to existing data sets, FlexiSol will enable more systematic development and evaluation of solvation models.


1 Introduction

Everyday observations like dissolving sugar in coffee or the low solubility of creatine in water1 illustrate the ubiquitous nature of solvation – which refers to any stabilizing interaction between a solute and its solvent2 and affects most areas of chemistry, biology and materials science.3,4 Over the past decades, theoretical chemistry has become an indispensable partner to experiment, offering both conceptual frameworks for interpreting chemical phenomena and predictive tools that guide new investigations.5,6 Especially in the field of Green Chemistry,7 theory helps to identify and optimize reaction pathways, predict solvent and catalyst effects in silico, and guide the design of sustainable processes with reduced waste and energy use.8–10 Besides sustainability aspects, the behavior of emerging environmental pollutants, such as per- and polyfluorinated alkyl substances (PFAS),11–13 can be modeled under climate change, predicting their compartmental distribution, degradation, and bioavailability.14–17 A major challenge, however, is the scarcity of experimental data,18,19 making predictive calculations not just valuable but essential for assessing their behavior in solution – a need recognized in guidelines for more efficient risk assessment by the Organisation for Economic Co-operation and Development (OECD).12,20–22 Beyond macroscopic equilibria, solvation also shapes molecular properties measured by spectroscopic methods. Solvent-induced shifts in infra-red (IR) absorption and nuclear magnetic resonance (NMR) chemical shifts23–26 can be large. This often results from a significant, non-negligible change in molecular geometry upon solvation,27–29 that can change the preferred tautomeric or conformational state, e.g., amino acids are neutral in the gas phase but zwitterionic in water, and capsaicin adopts an open form in aqueous solution versus a folded form in methanol.30–32

There are three main approaches to model solute–solvent interactions. The most direct approach is explicit solvation, where solvent molecules are included in the calculation. Commonly using free-energy perturbation33,34 or thermodynamic integration35,36 with dynamic simulations, these methods require exhaustive sampling of the configurational space and are thus computationally demanding, typically only feasible with classical methods like force fields. Together with integral-equation theories like the Reference Interaction Site Model (RISM) and its extensions (EC-RISM, RISM-SCF-cSED),37,38 they are often grouped as statistical methods. The so-called implicit approach can be seen as a simplification in which the solvent is approximated as some form of continuum, drastically improving the computational cost and enhancing applicability.39,40 Most implicit solvation models are based on a quantum-mechanical (QM) treatment of the solute, while the continuum is often treated classically.41 The Poisson (or Poisson–Boltzmann) equation solved around a molecular cavity is often used as the basis in so called polarizable continuum models (PCM).42 More sophisticated implicit solvation models describe additional solvent–solute interactions, e.g., hydrogen bonding, dispersion interactions and cavitation energy. Popular approaches are the usage of a surface-area-dependent term,43 or using empirical statistical mechanical frameworks.44,45 Hybrid strategies, such as cluster-continuum (microsolvation) approaches, combine a few explicit solvent molecules with an implicit bulk description.46–48 While often successful, they are difficult to apply, computationally demanding, and usually require expert intervention,49–51 although several automated microsolvation workflows are available.52,53 The third and most empirical approach is the descriptor-based approach. These models rely on correlations between known descriptors and target properties, replacing explicit physical components with mappings learned from large experimental or quantum-mechanical databases. These ones are, e.g., quantitative structure-activity and structure-property relationship (QSAR, QSPR) models like UNIFAC54 and OPERA.55 These approaches are complemented by machine-learning (ML) models. Recent ML models include QM-GNNIS for solvation energies,56 QupKake for pKa,57 MF-LOGP for octanol–water partition ratios,58 and FASTSOLV for solubility prediction.59

Generally, with increasing model empiricism, more data is required for parameterization and testing. This means that ML-based approaches, and even empirical QM models, depend on particularly extensive datasets. However, not only the quantity of the data is of importance – equally the quality and diversity, since models trained on bad or incomplete datasets can inherit these shortcomings.60 For solution phase properties (e.g., solvation energies) there is no practicable ab initio method or protocol to obtain accurate, theoretical reference values – unlike for general electronic structure theory.61–63 Because of this, experimental values have to be taken as reference. Popular data sets include the Minnesota Solvation Database (MNSOL),64 which contains around 3000 experimental data points spanning 92 solvents with roughly 800 unique molecules, or the FreeSolv database,65 containing data of around 650 molecules in aqueous solution, however, only 250 of those are not already contained in MNSOL.66 There are other databases that only contain the name or atom connectivity of the molecules (e.g., via the Simplified Molecular Input Line Entry System, SMILES), but lacking geometric information required by QC methods. These are, for example, the SOLV@TUM database or the large collection of acid dissociation constants of IUPAC.67,68

Despite their great utility, the present sets have limitations. They predominantly feature small molecules, lacking diversity in larger, more complex molecular structures and motifs (see Fig. 1b), and often contain lower amounts of unique molecules compared to their number of data points. For example, half of the around 3000 data points in MNSOL originate from just 54 unique molecules in a large number of different solvents (Fig. 1a), making it chemically relatively homogeneous. Additionally, the present QM-ready sets most often only provide a single gas phase structure per unique molecule (e.g., MNSOL). Without proper minimum-energy geometries for each phase, the solvent-induced geometric and conformational changes are not accounted for and can thus introduce systematic biases. Moreover, as many models are parameterized on these databases, the limited availability of independent sets leaves only a small fraction of data for testing, which may reduce the ability to fully assess model robustness or identify potential overfitting.69,70


image file: d5sc06406f-f1.tif
Fig. 1 Distributions of solvent occurrence per solute and solute sizes in the MNSOL dataset. (a) Histogram of the number of unique solvents for each unique solute, with the red line marking the median. Ethanol (shown) is the most common solute, available in 65 different solvents (right tail). (b) Histogram of the number of atoms per solute molecule, with the red line indicating the mean. Ethyl stearate (shown) is the largest solute with 62 atoms.

To address these limitations, we have compiled a diverse data set of 824 experimental solvation energies and partition ratios (1551 unique molecule–solvent pairs) for drug-like, medium-to-large, flexible molecules (mainly in the 30–80 atom range, up to 141), including their conformational ensembles with over 25[thin space (1/6-em)]000 conformers and tautomers in total. The complete benchmark, along with all coordinates, computed energies, and reference values, is freely available for use and extension. This set not only extends beyond the size and structural diversity of existing collections, but also allows us to systematically dissect three critical factors in their influence on model accuracy: conformational sampling, geometry choice (phase-specific versus single-phase), and underlying electronic structure method. By doing so, we aim to provide a rigorous foundation for testing modern methods and guiding the development of more robust, efficient solvation-prediction protocols.

2 Background

To place our work in context, we briefly review the theoretical foundations of solvation modeling, the involved quantities, the common approach and introduce the most popular solvation models, especially focusing on the implicit approach.

2.1 Thermodynamic quantities

Gibbs energies are the direct bridge between molecular calculations and real-world observables like solubilities, partition ratios, and equilibrium constants, which dictate concentrations, yields, and distribution of compounds under realistic conditions. For a given experimental equilibrium, the equation
 
ΔG = −RT[thin space (1/6-em)]ln[thin space (1/6-em)]K, (1)
states the relationship between the equilibrium constant K and the difference in Gibbs energy ΔG for the respective process at temperature T, with R being the ideal gas constant. In principle, any solvation-dependent experimental quantity can be used to evaluate a model's performance, however, solvation energies and partition ratios depend predominantly on the solvation description, making them especially suitable for testing theoretical approaches. The partition ratio describes the equilibrium of a substance A between two phases α and β
image file: d5sc06406f-t1.tif
which can be expressed as the ratio of concentrations in the respective phases as
 
image file: d5sc06406f-t2.tif(2)

The phase transfer Gibbs energy ΔtrG can be obtained by the difference of Gibbs energies of the compound in each phase.

 
image file: d5sc06406f-t3.tif(3)

Partition ratios (often also called partition constant) give the partitioning of a substance between two immiscible solution phases, whereas Henry's Law constants (HLC) denotes the partitioning of a substance between a solution phase and the gas phase.71 Both, the partition ratios and HLCs can be used to obtain the respective Gibbs energy difference. In this work, we will describe the air–solvent partitioning via their respective solvation energy (ΔsolvG) and the solvent–solvent partitioning via the partition ratio (log[thin space (1/6-em)]Kα/β). Although it is recommended to include temperature and solvent in the notation, we will simplify this since only standard conditions (298.15 K and 1 atm) at infinite dilution are used.

Solvent–solvent partition ratios are most commonly measured by the shake-flask method, in which a solute is equilibrated between two immiscible solvents and concentrations in each phase are determined – often by UV-vis spectroscopy or 1H-NMR.22,72 For highly hydrophobic compounds that form can stable emulsions, slow-stirring techniques can offer improved reliability.73 Additional measurement approaches such as reversed-phase high-performance liquid chromatography (HPLC) and generator-column methods have been employed to expand coverage across diverse solute–solvent systems.74,75 Henry's law constants, are often obtained by either static equilibrium experiments, where gas and liquid phase concentrations are measured in closed cells once equilibrium is reached, or by dynamic gas-purge approaches,76,77 in which a purge gas or bubble column achieves thermodynamic equilibrium and the decay of the gas phase solute concentration is monitored over time.78,79 Static methods provide direct equilibrium measurements, while dynamic methods can be more suitable for volatile or low-solubility compounds. The accuracy of such methods is discussed in more detail in Sec. 4.1.

2.2 Ensemble averaged gibbs energies

To calculate Henry's Law constants or partition ratios (via eqn (1) and (3)), the Gibbs energy of the respective substance of interest in the respective phases α and β has to be known. The Gibbs energy of a unique structure i of substance A in a solvent (or phase) α is obtained from
 
G(α)A,i = Eel,A,i + G(α)trv,A,i + ΔsolvG(α)A,i, (4)
where Eel,A,i is the gas phase total electronic energy as obtained by, e.g., by Density Functional Theory (DFT). The ro-vibrational Gibbs energy contribution at finite temperature Gtrv,A,i, which includes temperature-dependent entropic as well as enthalpic terms, both arising from vibrational, rotational, and translational contributions.80 Due to the associated computational cost and the fact that the change of the ro-vibrational Gibbs energy contribution is generally absorbed into the solvation model itself or rather in its parametrization (see Sec. 2.3 and ref. 43, and 81), we will not compute this term explicitly. Finally, a solvation energy component ΔsolvG(α)A,i is needed, which will be detailed in Sec. 2.3.

In addition to these main contributions, non-rigid molecules often have multiple relevant conformers contributing to the total Gibbs energy of a given chemical species.82 This firstly requires the exploration of the respective potential energy surface for finding relevant conformers, and secondly, the calculation of all aforementioned contributions for each conformer. To obtain the Gibbs energy of the substance, the Gibbs energies of all ensemble members i need to be Boltzmann-weighted via

 
image file: d5sc06406f-t4.tif(5)
where β = (kBT)−1 with kB being the Boltzmann constant. This is shown schematically in Fig. 2. The ensemble-averaged Gibbs energy A of A will be denoted as GA for the rest of this work.


image file: d5sc06406f-f2.tif
Fig. 2 Scheme illustrating how conformational ensembles are used to compute the solute transfer Gibbs energy between two phases. All conformers (and tautomers) within a defined energy window are considered, and their per-conformer Gibbs energies GA,i are Boltzmann weighted to yield the ensemble average Gibbs energy A for each phase α and β. The phase-transfer Gibbs energy ΔtrGα→βA is obtained as the difference between these ensemble averages. Shown is the example of three conformers of flupentixol in octanol.

Depending on the composition of a molecule, multiple tautomeric states are possible.83 They are included by extending the Boltzmann sum to cover all relevant tautomers, treating them as additional ensemble members i together with their associated GA,i.

In principle, the Gibbs energy of a conformer ensemble additionally includes a conformational Gibbs energy part Gconv that stems from the conformational entropy (−TSconv), as a result of mixing multiple populated conformers.84 For the gas phase, this is already a challenging problem due to the required very extensive exploration of the potential energy surface. For the solution phase, the limitations of the implicit solvation model make accurate determination of the conformational entropy computationally not feasible.84,85 Therefore, we omit determining these contributions in our study.

2.3 Accounting for solvation effects

To compute the Gibbs energy of each conformer, the solvation energy ΔsolvG is needed, which refers to the change in Gibbs energy when a molecule is transferred from vacuum (the gas phase) to a solvent α.86 One common decomposition of contributions in the implicit-solvation framework is given in eqn (6).
 
image file: d5sc06406f-t5.tif(6)
Here, GES is the electrostatic term, GNES the non-electrostatic term, GN the geometry-relaxation contribution, and image file: d5sc06406f-t6.tif is the standard-state correction term. In the following paragraphs, these components are explained in more detail.

The electrostatic contribution GES (often referred to as polarization) captures the stabilization arising from the mutual polarization of the solute's electronic density and the surrounding dielectric continuum. In implicit models, this term effectively represents the electrostatic solute–solvent interactions that would be described explicitly in a molecular simulation by Coulomb interactions. GES is obtained by solving the Poisson or Poisson–Boltzmann equation using methods such as IEF-PCM,87 COSMO or CPCM.88,89

The non-electrostatic term GNES accounts for, e.g., cavity formation around the solute, solvent restructuring, and London dispersion interactions. Especially in less polar solvents, an electrostatics-only description is not sufficient, making the inclusion of non-electrostatic effects necessary. Popular models include the surface-area-based SMx family of solvation methods, with SM12 or SMD,43,90 or the variants of the Miertus–Scrocco–Tomasi (MST) approach.42,91–94 Another popular method is COSMO-RS,44,45,95 which uses a semi-empirical statistical-thermodynamic framework. Noteworthy is also the less empirical composite method for implicit representation of solvent CMIRS,96–99 and newer approaches, like the spherically averaged liquid susceptibility ansatz SaLSA100 or charge-asymmetric non-locally determined local-electric solvation model CANDLE.101 The growing interest in larger structures has also led to the need for more efficient solvation models. To this category, the ALPB model,81 CPCM-X,102 and the easy solvation estimation ESE models103 belong to. Current implicit models do not explicitly take into account the entropy penalty upon solvation, resulting from the loss of translational, rotational, and often also low-frequency vibrational degrees of freedom, while rigorous explicit approaches (e.g., via MD) are prohibitively expensive. This omission is in fact why most often a separate thermostatistical correction for solution phase properties is left out: the entropy change is already fitted into the empirical parameterization of the solvation models, although several schemes have been proposed in the literature that would enable an approximate explicit calculation.84,104,105

The geometry (or nuclear relaxation) contribution is independent of the solvation model itself, but depends on the used procedure. This contribution is given by

 
G(α)N = E(α)elE(gas)el, (7)
and describes the change in electronic energy when a molecule distorts from its gas phase to its solution phase equilibrium geometry. A common, but crude, approximation in many solvation models or approaches43,81 is setting G(α)N = 0 as it simplifies the calculation procedure (cf. 4.2.2). This constitutes a systematic approximation whose associated error is typically absorbed or compensated for during the parameterization of the respective solvation model against experimental references.

To obtain the standard-state corrected solvation energy ΔsolvG°, denoted as ΔsolvG from now on, a standard state correction factor image file: d5sc06406f-t7.tif is needed.106 It accounts for the standard state mismatch between an ideal gas at 1 atm (gas phase) and an ideal solution at 1 mol L−1 (solution phase), ensuring both phases are referenced to 1 mol L−1 concentration. It is given by

 
image file: d5sc06406f-u4.tif (8)
where c = 1 mol L−1 is the standard concentration, and p = 1 atm is standard pressure and amounts to 1.89 kcal mol−1 assuming ideal gas behavior.106

Unlike physics-based continuum models, ML approaches bypass the decomposition into physical contributions and instead directly predict the overall solvation energy (i.e., both, G(α)ES and G(α)NES) from molecular representations.

3 Methods

In this section, the workflow for the generation of the conformer ensembles, the computational methods, the tested solvation models and the data curation is outlined.

3.1 Ensemble generation

The procedure described in the following serves to generate phase-specific conformational ensembles and is shown in Fig. 3. The computational details are detailed in Sec. 3.3. Initial geometries were created by converting respective SMILES identifiers to three dimensional geometries.107 These were subsequently optimized using an efficient tight–binding method. Tautomeric states were automatically screened using alternating protonation and deprotonation cycles, as implemented in the CREST program package.108,109 The obtained ensemble of tautomers was subsequently cleaned using MolBar,110 removing not only redundant conformers but also artifacts generated during tautomerization, which often correspond to entirely different compounds. In a second clean-up step, high-lying (i.e., > 12.0 kcal mol−1) tautomers are sorted out using a low-cost DFT method (PBE-D4/def2-SV(P)). For each of the remaining tautomers, a conformer search was carried out using GOAT111 (energy window of 6.0 kcal mol−1). The last step of the conformer generation process is the screening and subsequent optimization, where the ensemble is further refined by ensemble geometry optimization as implemented in CENSO82 using ORCA (final energy window of 4.0 kcal mol−1).112
image file: d5sc06406f-f3.tif
Fig. 3 Illustration of the workflow used to generate the structure ensemble for a given substance in a given phase. The used software/tool (black) and level of theory (gray) is given as text next to the individual steps.

For the conformer searches and geometry optimizations in solution, the ALPB solvation model81 was employed for xTB/CREST calculations, while for any DFT calculations, the CPCM model89,113 was used. We also tested the more sophisticated SMD model. However, owing to possible convergence difficulties114 and our observation that geometries from CPCM and SMD differ only marginally (see SI, Sec. C.1), we employ CPCM throughout this work.

3.2 Solvation models

We grouped the models into three categories according to their underlying electronic structure method: (a) quantum mechanical models requiring DFT (or similar level of QM), (b) semiempirical models based on efficient approximate electronic structure methods, and (c) machine-learning models predicting ΔsolvG, directly from molecular representations. Table 1 summarizes the tested models, their included contributions, and required molecule input type. A more detailed description can be found in the SI, Sec. A.1. In the main manuscript, we present only the 10 most widely used and representative solvation models. The results for all other methods can be found in SI, Sec. E and F. All ΔsolvG and log[thin space (1/6-em)]Kα/β values correspond to standard-state conditions (298.15 K, 1 atm) at infinite dilution.
Table 1 Overview of all tested solvation models, showing the required input (3D coordinates or SMILES) and explicitly computed energy terms (electrostatic, ES; non-electrostatic, NES), with brief descriptions and references. Details can be found in the SI, Sec. A.1
Type Model Input ES NES Description Ref.
a D-COSMO-RS results are shown without the combinatorial contribution and do not include the datapoints in hexadecane due to no available parameterization. Results with the combinatorial contribution can be found in the supplementary information.b Input charges for solv, uESE, and ESE-PM7 are computed using g-xTB Mulliken charges.127
Quantum mechanical (QM) CPCM 3D Conductor-like PCM (electrostatics only) 89
SMD 3D SMx model with empirical cavity-dispersion (CDS) term 43
openCOSMO-RS 3D Open-source variant of COSMO-RS 115
COSMO-RS 3D COSMO-based screening model for real solvents 44 and 45
D-COSMO-RSa 3D Direct, self-consistent variation of COSMO-RS 116–118
Semiempirical (SQM) ALPB 3D Analytically linearized Poisson–Boltzmann model 81
ddCOSMO 3D Domain-decomposition formulation of COSMO/CPCM 102, 119 and 120
CPCM-X 3D ddCOSMO with post-processing (similar to COSMO-RS & SMD) 102
Solvb 3D Non-iterative COSMO-like electrostatics with NES 121
uESEb 3D Non-iterative COSMO-like electrostatics with NES 103
ESE-PM7b 3D Cheaper, PM7 charges-based variant of uESE 122
Machine learning (ML) DirectML SMILES Directed message-passing neural network (NN) 123
CIGIN SMILES Chemically interpretable graph interaction network 124
ESE-EE-DNN 3D Dense NN with empirical charges, electrostatics, and ML NES 125
ESE-GB-DNN 3D Simplified electrostatics version of ESE-EE-DNN 126


For the self-consistently calculated solvation energies (i.e., with CPCM, SMD, D-COSMO-RS, ALPB and ddCOSMO), the solvation energy is obtained via

 
ΔsolvG(α)A,i = G(α)A,iEel,A,i, (9)
where the G(α)A,i is the Gibbs energy of the conformer i of molecule A in phase α and Eel,A,i is the respective gas phase energy evaluated on the same geometry.

Only solvation models capable of predicting ΔsolvG for all solvents in our benchmark were included; models limited to specific solvents (e.g., water-only models) were excluded. Although we intended to evaluate additional ML- and QSAR-based tools, many proved to be property-specific (e.g., QupKake for pKa, IFSQSAR128 and Vega129 for other physicochemical properties), lacked publicly accessible or maintained code, or were insufficiently documented to support a reproducible integration.

3.3 Computational details

Quantum chemical calculations were performed with xTB 6.6.1, Turbomole 7.7.1,130–132 and ORCA 6.0.1.112,133,134 If not stated otherwise, default settings were applied for all calculations. LibXC was used for some of the functionals.135 All quantum chemical calculations use matching def/def2 effective small core potentials (ECPs) for heavy elements with Z > 36.136,137 For ORCA related calculations, matching general-purpose auxiliary basis sets are constructed on the fly using AutoAux138 and the RIJCOSX139–141 approximation was used. Turbomole calculations were done using RI-J.142,143

For the workflows, Open Babel 3.1.0,144 CREST 3.0.2,109,145 GOAT,111 smi2xyz,107 a development version of CENSO 2.0,82 and MolBar 1.1.0 were used.110 The tautomerization protocol,108 as well as the conformational sampling was done using GFN2-xTB with ALPB.81,146 For pre-screening, PBE-D4/def2-SV(P)147–151 is used. Final geometries were obtained using the r2SCAN-3c composite method,152,153 combined with CPCM for any non-gas phase structures.89 Calculations with the new general extended tight-binding method g-xTB use the publicly available development version 1.0.0.127 Hybrid DFT calculations utilize ωB97M-V (ref. 154) with the def2-TZVPPD (aTZ) basis with ORCA.136,151

COSMO-RS 16.01 is calculated using Turbomole 7.7.1 with COSMOtherm C30-1601.155,156 COSMO-RS uses per default BP86/def-TZVP.157–159 D-COSMO-RS solvation contributions were calculated via Turbomole 7.7.1 using BP86/def-TZVP. Solvation contributions of the CPCM and SMD43 models are obtained using r2SCAN-3c, and openCOSMO-RS115 uses (per default) BP86/def2-TZVPD in ORCA. Solvation contributions of the GBSA, ALPB,81 and CPCM-X102 models were calculated using GFN2-xTB within xTB.160 CPCM-X was calcualted in xTB 6.7.1 due to availability. ddCOSMO was calculated using ddX161 combined with GFN2-xTB within tblite. Solvation contributions of uESE 1.2,103,162 ESE-PM7 1.2, and solv 1.0 (with 500/150 grid setting for electrostatics and non-electrostatics, respectively),121 are calculated using Mulliken charges obtained with g-xTB. Solvation contributions from the ESE-GB-DNN model (version: September 2023)163 and the ESE-EE-DNN model (version: June 2024)125 were obtained using their respective published implementations. Solvation contributions of DirectML 0.0.3 and CIGIN (version: August 2020) were obtained using the SMILES strings of the solvent and solute.123

3.4 Data curation

Curating a truly novel solvation test set required painstaking effort to identify high-quality experimental data that had not already been used by MNSol, FreeSolv, or SOLV@TUM. We used the following criteria to select suitable candidates: first, we focus only on the elements HCNO, F, Cl, Br, I, S, P, and Si. On the one hand, some of the tested solvation models are only parametrized for these elements, and on the other hand, the available number of experimental data points for, e.g., (transition) metals and heavy elements is very small. Secondly, the focus was set on drug-like, medium- to large-sized (30–80 atoms) molecules that contain, for example, heteroatom-rich scaffolds, zwitterionic moieties, macrocycles, and halogens. The inclusion of molecules that contain networks of hydrogen-bond donors and acceptors, and nonstandard ring sizes further increase the complexity and diversity. We prioritized compounds of biological, pharmaceutical, or environmental importance, such as per-and polyfluoroalkyl substances, active pharmaceutical ingredients, and natural products to guarantee that FlexiSol addresses a highly relevant chemical space. Thirdly, we deliberately included conformationally flexible molecules or molecules with multiple protonation/tautomeric states, to investigate the importance of including the corresponding chemical space. We limited our set to solvents with abundant, high-quality data and broad practical relevance: primarily water and 1-octanol, which together account for the majority of published solvation energies and partition ratios for medium-to-large organic molecules. Sources were curated through an extensive literature study, where the primary literature was consulted when possible. Care was taken to ensure that every data point was explicitly labeled as experimental. All experimental solvation energies in this work are given at standard-state conditions (298.15 K, 1 atm) in the molar reference state, meaning that ΔsolvG is the Gibbs energy for transferring a solute molecule from the ideal gas phase at 1 mol L−1 into an ideal solution at the same solute concentration.

3.5 Analysis

For the analysis of the mean or average errors, a 3σ criterion is applied. Hence, values (now considered an outlier) of each model that deviate more than three times the standard deviation from the mean are removed from the statistics. This is done to mitigate the impact of outliers that would otherwise dominate the statistics and make the interpretation difficult. Cases where models (or their underlying method) failed to converge or delivered obviously faulty results were also removed from the statistics of the respective model. In the final set, this occurred only for openCOSMO-RS and exclusively for three specific conformers of three different compounds, detailed in SI, Sec. C.7.

The error of all respective methods will be analyzed using the mean absolute error (MAE), standard deviation (SD) and root mean squared error (RMSE). For definitions of all used statistical measures in this work, see SI, Sec. B. The solvation energies will be discussed in kcal mol−1, while the partition ratios are given in log units, where one log unit is equivalent to 1.36 kcal mol−1 according to eqn (1).

To discuss the influence of different factors on the computed results (Sec. 4.2.1–4.2.3), we will use relative errors, i.e., the change in root mean square error (RMSE) relative to a baseline approach that will be introduced in Sec. 4.2. This is done by varying one variable at a time (ceteris paribus). An improvement in error is given as a negative change in RMSE, while a worsening is indicated by a positive change. The RMSE is used as a measure as it concludes the statement by MAE and SD. For the discussion of the relative errors, the 3σ criterion is not applied in order to avoid inconsistencies.

Due to the large range of experimental reference values in FlexiSol, absolute errors alone can be misleading when comparing methods across the entire set. Therefore, relative errors with respect to the experimental reference (in %) are additionally reported in the SI, Sec. C.5. We also performed an analysis in which deviations that are smaller than the reported experimental uncertainties were counted as zero error – explicitly taking into account the experimental uncertainty. These can be found in SI, Sec. C.4. Both the relative error analysis and the explicit treatment of the experimental uncertainties yielded trends essentially identical to those from the absolute errors. As these additional analyses did not alter the overall conclusions, we proceed with the main analysis based solely on absolute errors.

4 Results and discussion

4.1 Database

Absolute solvation energy data (or the respective Henry's law constants) was taken from ref. 18, 68, 123, and 164–173. Partition ratio reference data was collected from ref. 18, 45, 123, 167, 172, and 174–179. The SI provides a detailed list of reference values and their sources (SI, Sec. G), together with the curated experimental data, optimized geometries, and raw energies for every structure and method.

FlexiSol contains 824 experimental values: 530 solvation energies and 294 partition ratios, covering 734 unique molecules in 10 solvents (1551 molecule–solvent pairs). Including all conformer/tautomer ensembles, the set contains over 25[thin space (1/6-em)]000 geometries. A chord diagram of the solvent distribution is shown in Fig. 5. Molecules range from 11 to 141 atoms (mean 42, Fig. 4) with up to 25 rotatable bonds (SI, Fig. D1a); for comparison, MNSOL averages 15 atoms (Fig. 4). Of the heavy atoms, 30% are non-carbon, mainly nitrogen or oxygen (21%) and halogens (8%), with the remainder being sulfur, phosphorus, and silicon (2%). The solvation energies cover a range of 32.5 kcal mol−1 (from −27.7 to 4.9 kcal mol−1), with a mean absolute reference energy of 11.1 kcal mol−1 (MNSOL, for comparison, has 4.5 kcal mol−1). The partition ratios cover a range of 14.2[thin space (1/6-em)]log units (from −4.3 to 9.9[thin space (1/6-em)]log units), with a mean absolute reference of 3.3[thin space (1/6-em)]log units.


image file: d5sc06406f-f4.tif
Fig. 4 Histogram of the solute molecule sizes in the FlexiSol and MNSOL datasets, with vertical lines indicating the respective means. The largest compound of FlexiSol, tylosin (141 atoms), and five other representative molecules from the set are shown.

image file: d5sc06406f-f5.tif
Fig. 5 Chord diagram showing the solvent frequency and the composition of data points in FlexiSol. Gas-solvent ribbons represent solvation energies (ΔsolvG) and solvent–solvent ribbons represent partition ratios (log[thin space (1/6-em)]Kα/β). The arc size reflects how often each solvent appears in the dataset, while ribbon thickness indicates the number of data points for each combination.

Because most often no experimental uncertainty is stated for the data points used, we will take an educated guess for the uncertainty. MNSOL states an experimental uncertainty of around 0.2 kcal mol−1,64 and FreeSolv state, where available, an uncertainty per data point, ranging from 0.0 to 1.9 kcal mol−1, with a default value of around 0.5 kcal mol−1 if no literature value is given.65 This is in good agreement with the literature, where often an uncertainty of around 0.3 to 0.5 kcal mol−1 is stated.180–182 For partition ratios, experimental repeatability is around ± 0.3[thin space (1/6-em)]log units according to OECD,183 with inter-method differences of up to ± 0.5[thin space (1/6-em)]log units.22 Additionally, the experimental determination of larger (and strongly polar or nonpolar) molecules is generally more difficult, e.g., due to poor solubility.184–187 For the mentioned reasons, we estimate the uncertainty for solvation energies to be around ± 0.6 kcal mol−1 and for the partition ratios to be around ± 0.5[thin space (1/6-em)]log units.

While this benchmark set was constructed to reflect chemically relevant and challenging solutes, some inherent biases and limitations remain: (i), the majority of reference data refers to water and octanol, which dominate both the solvation and partition ratios (see Fig. 5). (ii) The chemical space is skewed toward molecules composed mainly of C, H, N, O, halogens, and to a lesser extent S, P, and Si. (iii) The set contains only neutral solutes. While there are works presenting partition coefficients188 or solvation energies64,66,189–191 for ionic substances that could in principle be used as reference values, the accuracy of the values is more uncertain due to the larger experimental errors associated. (iv) The focus on drug-like, medium-sized molecules means that small molecules and inorganic/organometallic systems are not present. (v) The set only contains data points at standard conditions. (vi) The set contains only data for infinite dilution and thus no concentration/activity effects. Accordingly, our conclusions are restricted to neutral, flexible and polyfunctional drug-like organic molecules in mainly water or 1-octanol at standard conditions.

4.2 Benchmark results

In this section, we assess how well the computed values of the models agree with the experimental references for the FlexiSol set. We start with our baseline approach – phase-specific conformational ensembles, Boltzmann weighting, and r2SCAN-3c electronic energies – and use this as the reference point for all further comparisons. This approach has proven robust at moderate cost in many prior works.82,192,193 Next, we examine the impact of (i), ensemble sampling, Sec. 4.2.1, (ii), geometry choice, Sec. 4.2.2, and (iii), the electronic structure method for the description of nuclear relaxation, Sec. 4.2.3. Here, we start with a detailed investigation of the baseline approach. The mean absolute error (MAE) and standard deviation (SD) for the whole set are shown in Fig. 6 for each solvation method.
image file: d5sc06406f-f6.tif
Fig. 6 Mean absolute error (solid bars) and standard deviation (hatched bars) are shown for solvation energies (top, ΔsolvG in kcal mol−1) and partition ratios (bottom, log[thin space (1/6-em)]Kα/β in log units, with the secondary right axis in kcal mol−1). Results are given for the most popular solvation models, additional models can be found in the SI.

We find that the computed solvation energies show larger errors on our benchmark set than those reported in prior studies, as seen for, e.g., MNSOL. We attribute this to the generally larger and more challenging solutes in our set, the fact that many models (e.g., SMD) were parameterized on MNSOL, and FlexiSol's substantially wider range of ΔsolvG and log[thin space (1/6-em)]Kα/β values. For solvation energies, we find the DFT-based methods to deliver overall the best results, with COSMO-RS leading the category with a mean absolute error of 2.0 kcal mol−1. The SQM-based models yield a slightly worse result with CPCM-X being the best performer with an MAE of 2.7 kcal mol−1. The machine-learning models yield accuracy similar to the DFT-based models, demonstrated by DirectML's MAE of 2.2 kcal mol−1, but generally show a greater difference between the individual models.

Calculated partition ratios generally agree better with experiment compared to solvation energies, likely because solvent-independent, substance-specific errors partially cancel out in the ratio. An example is decachlorobiphenyl, where the computed free solvation energies in octanol and water both show larger errors (error of QM-based methods around 4.0 kcal mol−1) compared to the respective partition ratio (error of 1.0[thin space (1/6-em)]log units).194 For the DFT-based methods, we find SMD and COSMO-RS to both provide a mean absolute error of 1.0[thin space (1/6-em)]log units, followed by the SQM-based models with CPCM-X at an MAE of 1.8[thin space (1/6-em)]log units. The ML-based models show wider variability, with ESE-GB-DNN performing worst and DirectML performing best overall with an MAE of 0.7[thin space (1/6-em)]log units.

Most models show a systematic error in their computed solvation energies overestimating small ones and underestimating larger ones, as shown in Fig. 7 (see SI, Sec. D.2.1 for all methods). This trend is very similar for most methods with a slope between −0.1 and −0.2 kcal mol−1 per kcal mol−1 of reference ΔsolvG. Partition ratios show the same systematic error: the models underestimate the affinity for the favored phase, i.e., negative errors for very positive partition ratios and positive errors for very negative partition ratios (SI, Sec. D.2.2). Systematic errors can also be found for some specific functional groups or structural motifs (SI, Sec. D.3). Systems containing primary and secondary heteroatoms are systematically overestimated in terms of their solvation energy, whereas the opposite holds for tertiary heteroatoms. For heteroatom–heteroatom bonds, partition ratios are found to be generally underestimated, favoring the polar phase. This highlights the difficulty in describing strong solvent–solute interactions like hydrogen bonding – often a major difficulty for implicit solvation models.39,41 Because our dataset is dominated by water and octanol, and contains far fewer data in other solvents, broad conclusions about solvent dependence are inherently limited. Nonetheless, we observe that the MAE values are worse for octanol compared to water, by about 0.3 kcal mol−1 for most methods, with SMD and CIGIN both worsening most by over 0.6 kcal mol−1. In the next paragraphs, the three model categories will be discussed in more detail.


image file: d5sc06406f-f7.tif
Fig. 7 Error in computed solvation energy (ΔsolvG) versus experimental ΔsolvG for three representative models: COSMO-RS (QM, blue circles), ALPB (SQM, yellow squares), and CIGIN (ML, gray triangles). Solid lines show linear regression lines of the errors, indicating each model's systematic underestimation of large (positive) and overestimation of small (negative) values. All values are given in kcal mol−1.

Among the DFT-based models (SMD, COSMO-RS, and openCOSMO-RS), absolute solvation energies are computed best by COSMO-RS (MAE 2.0 kcal mol−1), closely followed by its open-source variant openCOSMO-RS (MAE 2.2 kcal mol−1) and SMD (MAE 2.5 kcal mol−1). For partition ratios, COSMO-RS and SMD perform very similarly (MAE 1.0[thin space (1/6-em)]log units), with openCOSMO-RS showing a systematic shift (i.e., SD < RMSE) with a mean signed error of 1.8[thin space (1/6-em)]log units, thus yielding a worse agreement (MAE 2.0[thin space (1/6-em)]log units) with experiment. One of the largest average overestimations is found for polyhydroxy compounds (6.0 kcal mol−1 on average), more specifically sugar-type alcohols like sorbitol, adonitol, mannitol, or galacticol. Especially COSMO-RS overestimates these by more than 10.0 kcal mol−1, SMD, for example, only by 4.5 kcal mol−1. SMD, however, struggles more with solvation energies of very lipophilic substances, underestimating molecules such as decafluorobiphenyl by 7.0 kcal mol−1. Organic-aqueous partition ratios for heteroatom-dominant solutes (especially nitrogen-containing) are underestimated by both COSMO-RS and SMD, e.g., for substances, like cytidine diphosphate (COSMO-RS, error of −5.6[thin space (1/6-em)]log units) and azimsulfuron (SMD, error of −5.4[thin space (1/6-em)]log units). Such systematic errors have been noted already in, e.g., ref. 195–198.

The SQM-based approaches deliver robust results, with MAEs of around 2.7 kcal mol−1 for ΔsolvG and 1.8[thin space (1/6-em)]log units for log[thin space (1/6-em)]Kα/β, with the best performer being CPCM-X. Most of the SQM-based models struggle with halogen-dominated substances like chlorothalonil (error of −13.0 kcal mol−1) or very oxygen- and nitrogen-rich interactions in polar solvents like benzo-18-crown-6 in water (error of 10.0 kcal mol−1) on FlexiSol. For partition ratios, the SQM-based models seem to overestimate the affinity for the organic (octanol) phase relative to the aqueous one for large, polycyclic, and heteroatom-rich substances like tacrolimus (with an average error of 10.0[thin space (1/6-em)]log units). ALPB underestimates the octanol–water partitioning for polyfluorinated substances like perfluorooctanesulfonic acid (with an error of −9.0[thin space (1/6-em)]log units).

The ML approaches deliver overall good results. Besides the low MAEs, however, the number of outliers according to the 3σ criterion, which assumes a Gaussian error distribution, is generally slightly larger than for the other model classes. As ML models do not always follow a normal distribution of errors,199 the application of this criterion results in the removal of a larger number of outliers (about five more compared to the QM/SQM-based models, see SI, Sec. F). CIGIN and DirectML are found to show larger errors (i.e., ± 10 kcal mol−1) for the solvation energies of cyclic heteroatom-containing xenobiotics, like flumioxazin, milbemycin A3, or trimethoprim. The excellent performance for DirectML cannot be matched by any other tested ML-based model.

4.2.1 Ensemble. Because sampling the conformational space is computationally demanding (cf. Sec. 4.3), we analyze the influence of the conformational ensemble on the results by testing two simplified approaches: (a), using only the lowest-energy conformer for each phase, thereby avoiding higher-energy conformers and requiring one final single-point and one solvation model evaluation, and (b), one random conformer in each phase, simulating the absence of conformational sampling by mimicking the outcome if one initial geometry is used for the optimization in both phases. Both approaches are illustrated schematically in Fig. 8, and their impact relative to the baseline is shown in Table 2.
image file: d5sc06406f-f8.tif
Fig. 8 Schematic of the two approximate ways to calculate solvation energies with respect to the selection of the used conformers. The baseline approach uses all conformers (and tautomers) within the chosen energy window. Per-conformer Gibbs energies GA,i are Boltzmann-weighted to yield the ensemble average for each phase (shown in red, and in Fig. 2). (a) Lowest-energy conformer: only the single lowest-energy structure is used in both phases. (b) Random conformer: a single conformer irand is chosen at random and used for the ΔG. This aims to simulate the error introduced by the absence of conformational sampling.
Table 2 Change in RMSE (relative to the baseline) for solvation energies (kcal mol−1) and partition ratios (log units) upon using different approaches to obtain the final Gibbs energy in each phase. A negative change in RMSE indicates an improvement, a positive change a worsening. Low. denotes using only the lowest-energy conformer per phase; Rand. denotes using a single random conformer per phase
Method ΔsolvG (kcal mol−1) log[thin space (1/6-em)]Kα/β (log units)
Low. Rand. Low. Rand.
SMD −0.0 0.3 −0.0 0.3
openCOSMO-RS −0.1 0.2 0.0 0.2
COSMO-RS −0.0 0.3 0.0 0.4
D-COSMO-RS −0.1 0.2 −0.0 0.4
ALPB −0.0 0.3 −0.0 0.4
CPCM-X −0.0 0.2 0.0 0.5
ESE-PM7 −0.1 0.1 0.0 0.3
DirectML 0.0 0.3 −0.0 1.1
CIGIN −0.0 0.7 0.0 0.5
ESE-GB-DNN 0.0 0.4 −0.0 0.2


We find that using only the lowest-energy conformer instead of the full Boltzmann-weighted ensemble yields overall very similar results for both solvation energies and partition ratios. The largest deviations are about ± 0.7 kcal mol−1 for solvation energies, as seen for example in simvastatin (octanol, error improves) and in 15-crown-5 (water, error worsens). In cases where there is a noticeable change, many conformers lie close in energy, so their combined Boltzmann weights differ significantly from that of the single lowest conformer. For instance, simvastatin has 41 conformers in the gas phase within 4.0 kcal mol−1, five of which have Boltzmann population above 5%, while the lowest conformer accounts for only 17%. Since the typical model errors on the full set (MAE ∼2.0 kcal mol−1) exceed the magnitude of the changes, such improvements are within the typical error of the models and likely reflect error-compensation effects. Therefore, the use of a single conformer in scenarios with many conformers that are energetically close together can lead to small errors (∼0.7 kcal mol−1), while the results remain unchanged when no (near-)degenerate conformers are present.

For the random conformer approach, the accuracy deteriorates on average by over 0.2 kcal mol−1 for solvation energies and 0.4[thin space (1/6-em)]log units for partition ratios compared to the baseline. It is important to emphasize that our test still favors lower-energy conformers: the random conformer was drawn from the already optimized ensemble, which only contains conformers within the defined 4.0 kcal mol−1 window. This also restricts the random conformer selection to that window, and means that the degradation observed here is a lower bound to “real-world” calculation without conformer sampling. In a true one-geometry workflow, for example, optimizing a single arbitrary starting structure (e.g., output of a 1D-to-3D conversion) in each phase without prior conformer screening, larger deviations can occur, particularly for large and flexible molecules or systems with multiple low-energy conformers. These findings reinforce that a tautomer and conformational analysis step is recommended for accurate solvation and partition calculations, regardless of the underlying solvation model. In a recent study, lysergic acid diethylamide (LSD) was shown to exhibit several tautomeric forms differing by up to 12 kcal mol−1 relative to the commonly depicted tautomer (e.g., the SMILES-derived form); only a comprehensive QM tautomer search identified the dominant species and thus yielded accurate solvation free energies.200

4.2.2 Geometry. Beyond conformational effects, we explore how nuclear relaxation affects computed solvation energies and the agreement with experiment. Before discussing the respective approaches and their influence on the computed values, we will investigate the geometry change and general magnitude of the nuclear relaxation energy on the FlexiSol set when transferring a solute from gas to solution phase.

Firstly, we generally investigate the structural changes associated with bringing a solute into solution. On our set, we find final structure root mean square deviations (RMSD) between the solution and gas phase geometry from 0.0 to 4.9 Å, using the lowest-energy conformer in each phase as ranked by r2SCAN-3c with the SMD solvation model. This RMSD is linearly correlated with the number of atoms and the number of rotatable bonds in the molecule (SI, Fig. D1a and b). The nuclear relaxation contribution associated with the change in geometry upon solvation averages 0.6 kcal mol−1 on FlexiSol. Of those nuclear relaxation contributions, around 300 are larger than 1.0 kcal mol−1 and around 120 larger than 2.0 kcal mol−1 (SI, Fig. D1c). One example of a very high nuclear relaxation contribution with 13.0 kcal mol−1 is cytidine diphosphate (CDP), shown in Fig. 9. This large contribution results from the opening of intramolecular hydrogen bonds. This means that, for flexible and highly functionalized molecules, explicitly including the nuclear relaxation contribution is essential for a good description of solvation energies.


image file: d5sc06406f-f9.tif
Fig. 9 The top structure is the gas phase geometry; the lower (shaded background) ones are the solution phase geometries. Vertical arrows give method-specific GN values (kcal mol−1). The horizontal arrow between solution phase structures shows ΔGN. (a) Nuclear relaxation for cytidine diphosphate (CDP) in octanol and water. (b) Error cancellation in ΔGN for partition-ratio calculations of cyclotetramethylene tetranitramine in trichloromethane (TCM) and toluene.

Additionally, we investigated the geometry effects by comparing two different approaches to our baseline (phase-specific geometries): (a), the gas phase geometry approach, using gas phase geometries for both phases, assuming unchanged geometry upon solvation, and (b), the solution phase approach, using only solution phase geometries, which also omits the nuclear-relaxation component. By systematically comparing these three protocols, we assess the sensitivity of the results to nuclear relaxation effects. Importantly, model-computed solvation energies depend explicitly on the supplied geometry. Thus, evaluating the same solvation model on a gas-phase versus a solution-phase optimized structure does not simply omit the nuclear relaxation contribution but also changes the computed solvation term itself, making the total result non-linear with respect to GN. The resulting change in RMSE compared to the baseline is reported in Table 3.

Table 3 Change in RMSE (relative to the baseline) for solvation energies (kcal mol−1) and partition ratios (log units) upon using different geometry approaches. A negative change in RMSE indicates an improvement, a positive change a worsening. Gas ph. denotes using only the gas phase geometry; sol. ph. denotes using the solution phase geometry
Method ΔsolvG (kcal mol−1) log[thin space (1/6-em)]Kα/β (log units)
Gas ph. Sol. ph. Gas ph.
SMD 0.0 0.1 −0.1
openCOSMO-RS −0.0 −0.1 0.4
COSMO-RS −0.0 −0.0 0.1
D-COSMO-RS 0.0 0.1 0.4
ALPB −0.1 0.2 −0.0
CPCM-X −0.1 0.1 −0.1
ESE-PM7 0.1 −0.0 −0.1
DirectML 0.2 0.2 −0.3
CIGIN −0.3 −0.3 −0.3
ESE-GB-DNN −0.1 −0.0 −0.5


Compared to the prior section, no general trends are observed; only methods within a category show a similar tendency: QM-based methods benefit only partially from using solvent-specific geometries, and SQM- and ML-based methods give a more inhomogeneous picture, often improving noticeably (<−0.3 kcal mol−1 and < −0.3[thin space (1/6-em)]log units) when gas-phase structures are used for computing solvation energies and partition ratios.

For the QM-based methods, solvation energies are overall unchanged, while partition ratios worsen by about 0.3[thin space (1/6-em)]log[thin space (1/6-em)]units when using gas phase geometries only. This effect is most pronounced for openCOSMO-RS and D-COSMO-RS, which rely on phase-specific geometries (worsening by ∼0.4[thin space (1/6-em)]log[thin space (1/6-em)]units), whereas SMD and COSMO-RS are only marginally affected (±0.1 kcal mol−1 and 0.1[thin space (1/6-em)]log[thin space (1/6-em)]units). The largest deteriorations occur for larger, highly functionalized molecules, where nuclear relaxation contributions are substantial. For example, the solvation energy of penoxsulam in water worsens by 4.8 kcal mol−1 across the QM-based models. Of this, 2.3 kcal mol−1 is due the neglected nuclear relaxation, and the remaining error stems from using an inappropriate geometry, which causes the solvation term itself to be poorly described.

For the SQM-based models, the trends are more heterogeneous: solvation energies improve with gas phase geometries and worsen with solution-phase geometries. Solvation energies and partition ratios are improved by around 0.1 kcal mol−1 and log units upon using the gas phase geometry; solvation energies are worsened by 0.1 kcal mol−1 when using solution phase geometries. Due to the generally larger errors in these models, the influence of the geometry used is small overall, even though the computed values change noticeably (on average by 0.3 kcal mol−1 and 0.3[thin space (1/6-em)]log units for ΔsolvG and log[thin space (1/6-em)]Kα/β, respectively). For the machine learning models, the respective change is highly method dependent: computed solvation energies with DirectML worsen upon using either only gas or solution phase geometries, while CIGIN improves. Partition ratios are improved in all cases upon using the gas phase geometry. For the partition ratios, both SQM and ML-based methods improve by around 0.3[thin space (1/6-em)]log units.

These trends are opposite to the QM-based trends which we attribute to the parameterization strategy and model design: especially machine-learning models are meant to predict the total solvation energy – including the nuclear contribution.123 This means that using phase-specific geometries for these models (i.e., including the geometry relaxation additionally) will double-count this effect, thus resulting in worse results. Similarly, this is also the reason for the semiempirical models, which were for the most part also parameterized on gas phase geometries and thus implicitly account for this effect already.81,102,126 It is noteworthy, that SMD was parameterized on gas phase structures only (MNSOL database), which may explain why SMD does not significantly benefit from using phase-specific geometries.43

In this context, we note, that in order to obtain solution phase geometries efficiently (i.e., perform geometry optimizations), the analytical nuclear gradient for a method has to be available and implemented. This is unavailable for most of the tested methods: only CPCM, ddCPCM, SMD, D-COSMO-RS, and ALPB allow for geometry optimizations.

4.2.3 Electronic energy. As seen in the previous section, the nuclear contribution GN is often non-negligible, especially for larger, more complex solutes. Here, we test whether evaluating the underlying electronic energies at a lower- or higher-level method affects the results. Importantly, we used the same gas- and solution-phase geometries as in the previous sections – no new optimizations were performed. Only the single-point electronic energies were recomputed at different levels of theory to assess their impact on GN. Besides the baseline r2SCAN-3c, we test: (a) the efficient GFN2-xTB method, (b) its much improved successor g-xTB, and (c) a high-level hybrid DFT functional, ωB97M-V, combined with a large triple-ζ basis set, as our high-accuracy method. We selected these methods to span a reasonable accuracy-cost range for GN. GFN2-xTB and g-xTB provide very low-cost electronic energies, routinely used for screening and sampling purposes.127 r2SCAN-3c delivers accurate relative conformer energies and geometries for main-group organic systems at a moderate cost, with consistent performance across large benchmarks, making it a robust baseline for our set.152,201–203 As our high-accuracy method, ωB97M-V with a large augmented triple-ζ basis represents a high-accuracy range-separated hybrid, chosen because it has been shown to perform exceptionally well, often rivaling double-hybrid functionals.154,204 Table 4 shows the change in error relative to the baseline when substituting its electronic energy with these alternatives.
Table 4 Change in RMSE (relative to the baseline) for solvation energies (kcal mol−1) and partition ratios (log units) upon using different levels of theory for the underlying electronic energy to calculate GN. A negative change in RMSE indicates an improvement, a positive change a worsening
Method ΔsolvG (kcal mol−1) log[thin space (1/6-em)]Kα/β (log units)
GFN2-xTB g-xTB ωB97M-V GFN2-xTB g-xTB ωB97M-V
SMD 0.0 0.0 −0.1 0.1 −0.0 0.1
openCOSMO-RS −0.0 0.0 −0.0 0.1 0.1 0.0
COSMO-RS 0.3 0.2 0.0 0.0 0.0 0.0
D-COSMO-RS 0.0 0.1 0.0 0.1 0.0 −0.0
ALPB −0.0 −0.0 −0.1 0.1 0.0 −0.1
CPCM-X −0.1 −0.1 −0.1 0.1 0.0 0.0
ESE-PM7 0.1 0.1 0.1 0.1 0.1 −0.0
DirectML −0.1 −0.1 −0.1 0.1 0.1 0.1
CIGIN 0.3 0.3 −0.1 −0.0 0.0 −0.1
ESE-GB-DNN 0.1 −0.0 −0.1 −0.5 −0.3 0.0


We generally find only small changes in the RMSE for solvation energies and partition ratios (≤0.1 kcal mol−1, ≤ 0.1[thin space (1/6-em)]log units), with partition ratios being less affected. QM-based methods tend to slightly worsen at lower levels of theory, while some ML models show a more variable result and small apparent improvements – likely due to error-cancellation effects. The smaller changes observed for partition ratios arise from cancellation of errors in GN between the two individual phases. This is observed for, e.g., cyclotetramethylene tetranitramine (shown in Fig. 9b), where the method specific error is present for both phases (chloroform and toluene), canceling perfectly for the ΔGN between the phases and thus in the computation of the partition ratio.

Moving from r2SCAN-3c to the much more expensive ωB97M-V produces negligible improvements. In most cases, the changes in computed values are well below 0.5 kcal mol−1, with only 15 cases changing by more than 1 kcal mol−1. The most notable improvements are found for the resulting solvation energy of difethialone in water or that of bensulide in octanol, which both improve on average by 1.5 kcal mol−1 across the tested models. Other well-performing hybrids such as PBE0 (ref. 147 and 205) or B3LYP,206,207 combined with D4 or MBD,208,209 perform similarly well on various benchmark sets61,210 and are expected to yield results comparable to r2SCAN-3c and ωB97M-V on this set. While GFN2-xTB yields slightly higher overall errors, this degradation is driven by only a small number of structures. Notable cases include fructose, penoxsulam, and 15-crown-5 (shown in SI, Fig. C3), which worsen by about 3.0, 5.0, and 8.0 kcal mol−1, respectively, compared to r2SCAN-3c. Such trends mirror errors observed in relative conformational-energy benchmarks127,146 like the GLUCOSE205 (ref. 211) or UPU46 (ref. 212) sets. Using the much improved g-xTB method already eliminates most issues seen with GFN2-xTB and shows good performance for the computation of GN.

4.3 Timings

To give practical context, we report wall-clock timings for the approaches described above (see Sec. 4.2.1). The Gibbs energy of diclosulam in water was chosen as a representative test case: diclosulam's molecular size, conformational flexibility, and the measured timings for the steps closely match the medians across FlexiSol (distributions shown in SI, Sec. C.6.1). Timings are shown in Fig. 10. Computations were done on 48 cores of an intel xeon platinum 8468 (Sapphire Rapids) CPU. To obtain a final solvation energy, the same workflow must be run twice, i.e., once per phase (for the gas phase run no solvation model calculation is required). Detailed timings for the solvation models and their corresponding underlying method can be found in SI, Sec. C.6.2.
image file: d5sc06406f-f10.tif
Fig. 10 Timings (wall-clock) in minutes to obtain a single Gibbs energy for diclosulam in water on 48 cores of an intel xeon platinum 8468 CPU. Bars decompose total cost into conformer search (blue), screening (yellow), geometry optimization (gray) and final solvation evaluation (red); panels a-d compare the full ensemble (a), lowest-conformer (b), random-conformer (c) and screening-only protocols (d), respectively (annotated nconv values show retained conformers). The inset shows the six final optimized conformers of (a).

Overall, the full baseline (a) approach takes around 1 h. The initial conformer screening takes around 9 min (∼15%) and yields 76 initial conformers. The following screening takes 6 min (∼10%) and reduces the number of relevant conformers to 20. The most expensive step, the optimization, takes around 42 min (∼74%) and yields 6 final conformers within the specified 4 kcal mol−1 window. The final solvation model evaluation (COSMO-RS) takes less than one minute (<1%), making it a negligible contribution. Using only the lowest-energy conformer per phase dramatically reduces optimization cost, because only one geometry optimization is required per phase. The actual speedup depends on how many conformers survive the screening step (and thus how many optimizations would otherwise be needed); for diclosulam this delivers roughly a 5× reduction in wall time for the solvation energy, while for very flexible cases (e.g., ledipasvir, ∼935 initial conformers) the savings can reach orders of magnitude. Accelerated variants of the CENSO workflow have been proposed for such reduced-cost protocols and can be adopted where appropriate.213,214 Skipping ensemble generation entirely (c), thereby simulating the common one-geometry practice by selecting a single random conformer, moderately reduces the computational time. While this eliminates the ensemble-generation and screening steps, it still requires at least one full geometry optimization per phase, so wall-time savings are limited compared to (b). For rapid, low-cost screening, it may be useful to combine the random-conformer approach with a cheaper optimization method such as GFN2-xTB or g-xTB – forming a possible efficient screening protocol (d). This strategy can reduce the computational cost of solvation energies and partition ratios by orders of magnitude compared to the full ensemble approach (a). However, we did not systematically test or benchmark such an SQM/ML-based screening workflow.

5 Conclusion and outlook

In this work, we present FlexiSol, a comprehensive solvation model benchmark set with experimental reference data, consisting of 824 data points and 1551 unique molecule–solvent pairs. With the inclusion of most relevant conformers and tautomers for each respective phase, this totals over 25[thin space (1/6-em)]000 geometries. This set focuses on drug-like molecules between 30 to 80 atoms, with the mean being 42 atoms, and the largest molecule having 141 atoms – far surpassing common sets in that regard. Additionally, focus is placed on relevant and more difficult structures that are underrepresented in existing benchmarks: larger, flexible, and polyfunctional molecules. The benchmark, including all geometries and energies, is publicly available. In addition to creating the benchmark itself, a wide range of popular solvation models and approaches are tested and investigated, trying to isolate the effects of ensemble sampling, geometry choice, and underlying electronic energy.

5.1 Overall findings

Errors on FlexiSol exceed those previously seen on MNSOL or FreeSolv, reflecting the larger and more complex nature of the solutes and the wider range of ΔsolvG and log[thin space (1/6-em)]Kα/β values. The QM-based models (COSMO-RS, SMD) show most consistent results with the best performing MAE of 2.0 kcal mol−1 for ΔsolvG, and 1.0 log units for log[thin space (1/6-em)]Kα/β. SQM-based methods perform slightly worse with an MAE of 2.8 kcal mol−1 and 1.7[thin space (1/6-em)]log units, respectively. The best performing ML method (DirectML) nearly matches the QM methods for solvation energies (MAE of 2.2 kcal mol−1) and yields the best result for partition ratios with an MAE of 0.7[thin space (1/6-em)]log units. Most models systematically overestimate weak stabilization and underestimate strong stabilization, particularly for heteroatom-rich/H-bonding motifs, and errors increase in less polar solvents (octanol vs. water by around 0.3 kcal mol−1 on average).

5.2 Ensemble

Using the phase-specific lowest-energy conformers reproduces the result using the full Boltzmann ensemble with minor deviations, with the largest changes being ∼±0.7 kcal mol−1 for solvation energies. Choosing a single random conformer per phase significantly degrades accuracy by > 0.3 kcal mol−1 and (ΔsolvG) and > 0.3[thin space (1/6-em)]log units (log[thin space (1/6-em)]Kα/β) on average, showing that the conformational analysis step is essential. A simplified protocol using only the lowest conformer in each phase can reduce costs at minimal accuracy loss.

5.3 Geometry (nuclear relaxation)

Nuclear relaxation contributions GN average ∼0.6 kcal mol−1 on our set (with > 2.0 kcal mol−1 in flexible/H-bonding cases). However, this does not directly translate into shifts in ΔsolvG, because of solvation-model uncertainties and different model parameterization choices. For some QM-based methods (openCOSMO-RS and D-COSMO-RS), using non-phase-specific geometries worsens the result consistently, e.g., for partition ratios by ∼0.3[thin space (1/6-em)]log units – whereas SMD and COSMO-RS are hardly affected. SQM/ML-based models show an improvement using only gas phase geometries because GN is partly absorbed into the empirical parameterization. Omitting the nuclear relaxation contributions by only using gas phase structures approximately halves the computational cost.

5.4 Electronic energy

Changing the level of the underlying electronic structure method for the description of the nuclear relaxation changes results only slightly (typically < 1.0 kcal mol−1); partition ratios are even less sensitive due to possible cancellation of GN across phases. Upgrading from r2SCAN-3c to the much more costly ωB97M-V/aTZ method produces little practical gain, while fast tight-binding approaches (notably g-xTB) typically provide an adequate and computationally efficient description of GN.

5.5 Recommended protocols

In line with our findings, we recommend two practical protocols: the physics-based, high-rigor approach: employ a QM-based solvation model such as COSMO-RS or SMD together with conformer/tautomer sampling, ensuring phase-specific geometries. Whether the full Boltzmann-weighted ensemble or just the lowest-energy conformer per phase is used makes little difference to the final accuracy, but conformer sampling is essential. Such a protocol, combined with r2SCAN-3c single-point energies for the electronic component, yields robust accuracy on FlexiSol (MAE 2.0 kcal mol−1 for ΔsolvG; 1.0[thin space (1/6-em)]log units for log[thin space (1/6-em)]Kα/β) and is preferred for heteroatom-rich/H-bonding or foreseen more complicated cases. Fast semiempirical or ML-based screening: apply a modern ML- or SQM-based solvation model (DirectML or CPCM-X) together with a single gas phase geometry. This yields errors in the range of (MAE 2.2 kcal mol−1 to 2.8 kcal mol−1 for ΔsolvG; 0.7[thin space (1/6-em)]log units to 1.8[thin space (1/6-em)]log units for log[thin space (1/6-em)]Kα/β), at a fraction of the cost – especially when resorting to a lower-cost method for the optimization.

5.6 Outlook and future directions

Finally, we want to outline options for extending scope and utility of future solvation benchmark sets. An important step is the extension beyond neutral drug-like molecules to inorganic and transition metal compounds,165,215 proteins and macromolecular fragments,216 and ions.188,217,218 Complementing this chemical space expansion, a broadened solvent coverage and inclusion of additional properties such as acid dissociation constants,68 solubilities,219 and vapor pressures would be very beneficial.220 Consistency-focused curation, i.e., thermodynamic-cycle checks, cross-source reconciliation, and uncertainty propagation can improve reliability of experimental references.221–223 Additionally to improvements to the set itself, an extension to explicit solvation approaches alongside implicit models using classical force fields like GAFF224 and GFN-FF225,226 or ML interatomic potentials like UMA,227 SO3LR,228 or PaiNN229 could give additional insights into computational modeling of solvation.

Author contributions

Conceptualization: LW, SG; data curation: LW, CES; formal analysis: LW, CES; funding acquisition: LW, SG; investigation: LW, CES; methodology: LW, CES; project administration: LW, SG; resources: SG; software: LW; supervision: LW, SG; validation: LW, CES; visualization: LW, CES; writing (original draft): LW, CES; writing (review) & editing: LW, CES, SG.

Conflicts of interest

There are no conflicts of interest to disclose.

Data availability

All raw data for FlexiSol – including geometries (image file: d5sc06406f-u1.tif files), all computed energies for all conformers/tautomers, computed solvation and partition ratios for each model and approach (image file: d5sc06406f-u2.tif files), experimental reference values and their bibliographic sources (image file: d5sc06406f-u3.tif file) – are provided as supporting information (SI) with this work. This data, including a tool for easy evaluation of methods on FlexiSol can additionally also be found on GitHub: https://github.com/grimme-lab/flexisol. Supplementary information: Includes a PDF with all supplementary figures (systematic-error trends, solvent-specific performance), extended tables, and additional discussion. Further, the full set of conformer geometries and all computed energies; experimental reference values with full bibliographic citations; and computed solvation energies and partition ratios for each model. See DOI: https://doi.org/10.1039/d5sc06406f.

Acknowledgements

L. W. greatly acknowledges the support of the Stiftung Stipendien-Fonds des Verbandes der Chemischen Industrie e. V. through its Kekulé Fellowship. The authors gratefully acknowledge the granted access to the Marvin cluster hosted by the University of Bonn and the help and support of the HPC/A Lab. The authors acknowledge Dr Uwe Huniar for providing parameter files for D-COSMO-RS. The authors thank Thomas Gasevic, Tim K. Schramm, and Christoph Plett for fruitful discussions. Thomas Gasevic and Tim K. Schramm are additionally acknowledged for proofreading our manuscript.

Notes and references

  1. R.-M. Dannenfelser and S. H. Yalkowsky, Sci. Total Environ., 1991, 109–110, 625–628 CrossRef CAS.
  2. P. Muller, Pure Appl. Chem., 1994, 66, 1077–1184 CrossRef.
  3. G. Dyrda, E. Boniewska-Bernacka, D. Man, K. Barchiewicz and R. Słota, Mol. Biol. Rep., 2019, 46, 3225–3232 CrossRef CAS PubMed.
  4. T. Gholami, H. Seifi, E. A. Dawi, M. Pirsaheb, S. Seifi, A. M. Aljeboree, A.-H. M. Hamoody, U. S. Altimari, M. Ahmed Abass and M. Salavati-Niasari, Mater. Sci. Eng. B, 2024, 304, 117370 CrossRef CAS.
  5. W. Thiel, Angew. Chem., Int. Ed., 2011, 50, 9216–9217 CrossRef CAS PubMed.
  6. M. Barbatti, Pure Appl. Chem., 2025, 97(9), 1115–1134 CrossRef CAS PubMed.
  7. M. Poliakoff, J. M. Fitzpatrick, T. R. Farren and P. T. Anastas, Science, 2002, 297, 807–810 CrossRef CAS PubMed.
  8. M. G. Quesne, F. Silveri, N. H. De Leeuw and C. R. A. Catlow, Front. Chem., 2019, 7, 182 CrossRef CAS PubMed.
  9. G. N. Simm, A. C. Vaucher and M. Reiher, J. Phys. Chem. A, 2019, 123, 385–399 CrossRef CAS PubMed.
  10. O. Engkvist, P.-O. Nsorrby, N. Selmi, Y.-h. Lam, Z. Peng, E. C. Sherer, W. Amberg, T. Erhard and L. A. Smyth, Drug Discovery Today, 2018, 23, 1203–1218 CrossRef CAS PubMed.
  11. C. M. A. Eichler and J. C. Little, Environ. Sci.: Processes Impacts, 2020, 22, 500–511 RSC.
  12. B. B. de Souza and J. Meegoda, Sci. Total Environ., 2024, 926, 171738 CrossRef CAS PubMed.
  13. E. Panieri, K. Baralic, D. Djukic-Cosic, A. Buha Djordjevic and L. Saso, Toxics, 2022, 10, 44 CrossRef CAS PubMed.
  14. F. Haque and C. Fan, iScience, 2023, 26, 107649 CrossRef CAS PubMed.
  15. D. S. Aga, F. Samara, L. Dronjak, S. Kanan, M. M. Mortula and L. Vahapoglu, ACS ES&T Water, 2024, 4, 2785–2788 Search PubMed.
  16. V. Hatje, M. Sarin, S. G. Sander, D. Omanović, P. Ramachandran, C. Völker, R. O. Barra and A. Tagliabue, Front. Mar. Sci., 2022, 9, 936109 CrossRef.
  17. T. Salthammer, J. Zhao, A. Schieweck, E. Uhde, T. Hussein, F. Antretter, H. Künzel, M. Pazold, J. Radon and W. Birmili, Indoor Air, 2022, 32, e13039 CAS.
  18. S. Endo, J. Hammer and S. Matsuzawa, Environ. Sci. Technol., 2023, 57, 8406–8413 CrossRef CAS PubMed.
  19. I. Abusallout, C. Holton, J. Wang and D. Hanigan, J. Hazard. Mater. Lett., 2022, 3, 100070 CrossRef CAS.
  20. M. Mudlaff, A. Sosnowska, L. Gorb, N. Bulawska, K. Jagiello and T. Puzyn, Environ. Int., 2024, 185, 108568 CrossRef CAS PubMed.
  21. Q. Xiang, G. Shan, W. Wu, H. Jin and L. Zhu, Environ. Pollut., 2018, 242, 1283–1290 CrossRef CAS PubMed.
  22. Test No. 117: Partition Coefficient (n-Octanol/Water), HPLC Method, OECD, https://www.oecd.org/en/publications/test-no-117-partition-coefficient-n-octanol-water-hplc-method_9789264069824-en.html Search PubMed.
  23. Y. Li, H. Zhang and Q. Liu, Spectrochim. Acta, Part A, 2012, 86, 51–55 CrossRef CAS PubMed.
  24. A. Allerhand and P. v. R. Schleyer, J. Am. Chem. Soc., 1963, 85, 371–380 CrossRef CAS.
  25. A. D. Buckingham, T. Schaefer and W. G. Schneider, J. Chem. Phys., 1960, 32, 1227–1233 CrossRef CAS.
  26. P. Laszlo, Prog. Nucl. Magn. Reson. Spectrosc., 1967, 3, 231–402 CrossRef CAS.
  27. E. Jonas, S. Kuhn and N. Schlörer, Magn. Reson. Chem., 2022, 60, 1021–1031 CrossRef CAS PubMed.
  28. A. García Alejo, N. De Silva, Y. Liu, T. L. Windus and M. Pérez García, Solvent Extr. Ion Exch., 2023, 41, 241–251 CrossRef.
  29. Y. Y. Rusakov, Y. A. Nikurashina and I. L. Rusakova, J. Chem. Phys., 2024, 160, 084109 CrossRef CAS PubMed.
  30. E. Antoniou, C. F. Buitrago, M. Tsianou and P. Alexandridis, Carbohydr. Polym., 2010, 79, 380–390 CrossRef CAS.
  31. K.-J. Liu and J. L. Parsons, Macromolecules, 1969, 2, 529–533 CrossRef CAS.
  32. N. D. Kambaine, D. M. Shadrack and S. A. Vuai, J. Mol. Liq., 2022, 345, 117794 CrossRef CAS.
  33. S. Wan, R. H. Stote and M. Karplus, J. Chem. Phys., 2004, 121, 9539–9548 CrossRef CAS PubMed.
  34. W. L. Jorgensen, J. F. Blake and J. Buckner, Chem. Phys., 1989, 129, 193–200 CrossRef CAS.
  35. T. P. Straatsma and H. J. C. Berendsen, J. Chem. Phys., 1988, 89, 5876–5886 CrossRef CAS.
  36. G. Duarte Ramos Matos, D. Y. Kyu, H. H. Loeffler, J. D. Chodera, M. R. Shirts and D. L. Mobley, J. Chem. Eng. Data, 2017, 62, 1559–1569 CrossRef CAS PubMed.
  37. T. Kloss, J. Heil and S. M. Kast, J. Phys. Chem. B, 2008, 112, 4337–4343 CrossRef CAS PubMed.
  38. K. Imamura, D. Yokogawa and H. Sato, J. Chem. Phys., 2024, 160, 050901 CrossRef CAS PubMed.
  39. A. V. Marenich, C. J. Cramer and D. G. Truhlar, J. Chem. Theory Comput., 2008, 4, 877–887 CrossRef CAS PubMed.
  40. S. Decherchi, M. Masetti, I. Vyalov and W. Rocchia, Eur. J. Med. Chem., 2015, 27–42 CrossRef CAS PubMed.
  41. J. M. Herbert, Wiley Interdiscip. Rev.:Comput. Mol. Sci., 2021, 11, e1519 CAS.
  42. S. Miertuš, E. Scrocco and J. Tomasi, Chem. Phys., 1981, 55, 117–129 CrossRef.
  43. A. V. Marenich, C. J. Cramer and D. G. Truhlar, J. Phys. Chem. B, 2009, 113, 6378–6396 CrossRef CAS PubMed.
  44. A. Klamt, J. Phys. Chem., 1995, 99, 2224–2235 CrossRef CAS.
  45. A. Klamt, V. Jonas, T. Bürger and J. C. W. Lohrenz, J. Phys. Chem. A, 1998, 102, 5074–5085 CrossRef CAS.
  46. J. R. Pliego, J. Phys. Chem. A, 2024, 128, 6440–6449 CrossRef PubMed.
  47. S. C. L. Kamerlin, M. Haranczyk and A. Warshel, ChemPhysChem, 2009, 10, 1125–1134 CrossRef CAS PubMed.
  48. J. R. Pliego and J. M. Riveros, J. Phys. Chem. A, 2001, 105, 7241–7247 CrossRef CAS.
  49. S. A. Katsyuba, S. Spicher, T. P. Gerasimova and S. Grimme, J. Phys. Chem. B, 2020, 124, 6664–6670 CrossRef CAS PubMed.
  50. S. N. Steinmann, P. Sautet and C. Michel, Phys. Chem. Chem. Phys., 2016, 18, 31850–31861 RSC.
  51. E. Tang, D. Di Tommaso and N. H. De Leeuw, Phys. Chem. Chem. Phys., 2010, 12, 13804 RSC.
  52. M. Steiner, T. Holzknecht, M. Schauperl and M. Podewitz, Molecules, 2021, 26, 1793 CrossRef CAS PubMed.
  53. P. L. Türtscher and M. Reiher, J. Chem. Theory Comput., 2025, 21, 5571–5587 CrossRef PubMed.
  54. A. Fredenslund, R. L. Jones and J. M. Prausnitz, AIChE J., 1975, 21, 1086–1099 CrossRef CAS.
  55. K. Mansouri, C. M. Grulke, R. S. Judson and A. J. Williams, J. Cheminf., 2018, 10, 10 Search PubMed.
  56. P. Katzberger, F. Pultar and S. Riniker, J. Chem. Theory Comput., 2025, 21, 7450–7459 CrossRef CAS PubMed.
  57. O. D. Abarbanel and G. R. Hutchison, J. Chem. Theory Comput., 2024, 20, 6946–6956 CrossRef CAS PubMed.
  58. D. H. Kenney, R. C. Paffenroth, M. T. Timko and A. R. Teixeira, J. Cheminf., 2023, 15, 9 CAS.
  59. L. Attia, J. W. Burns, P. S. Doyle and W. H. Green, Nat. Commun., 2025, 16, 7497 CrossRef CAS PubMed.
  60. K. Mansouri, C. M. Grulke, A. M. Richard, R. S. Judson and A. J. Williams, SAR QSAR Environ. Res., 2016, 27, 911–937 CrossRef CAS PubMed.
  61. L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, A. Najibi and S. Grimme, Phys. Chem. Chem. Phys., 2017, 19, 32184–32215 RSC.
  62. J. Řezáč, J. Chem. Theory Comput., 2020, 16, 6305–6316 CrossRef PubMed.
  63. N. Mardirossian and M. Head-Gordon, Mol. Phys., 2017, 115, 2315–2372 CrossRef CAS.
  64. A. V. Marenich, C. P. Kelly, J. D. Thompson, G. D. Hawkins, C. C. Chambers, D. J. Giesen, P. Winget, C. J. Cramer and D. G. Truhlar, Minnesota Solvation Database (MNSOL) Version 2012, 2020 Search PubMed.
  65. D. L. Mobley and J. P. Guthrie, J. Comput.-Aided Mol. Des., 2014, 28, 711–720 CrossRef CAS PubMed.
  66. C. Plett, M. Stahn, M. Bursch, J.-M. Mewes and S. Grimme, J. Phys. Chem. Lett., 2024, 15, 2462–2469 CrossRef CAS PubMed.
  67. C. Hille, S. Ringe, M. Deimel, C. Kunkel, W. E. Acree, K. Reuter and H. Oberhofer, Solv@TUM v 1.0, 2018, https://mediatum.ub.tum.de/1452571?v=1.
  68. J. Zheng and L.-J. Olivier, IUPAC/Dissociation-Constants: v2.3b, 2025, https://zenodo.org/records/15375522.
  69. X. Ying, J. Phys.: Conf. Ser., 2019, 1168, 022022 CrossRef.
  70. D. M. Hawkins, J. Chem. Inf. Comput. Sci., 2004, 44, 1–12 CrossRef CAS PubMed.
  71. R. Sander, W. E. Acree, A. De Visscher, S. E. Schwartz and T. J. Wallington, Pure Appl. Chem., 2022, 94, 71–85 CrossRef CAS.
  72. H. Cumming and C. Rücker, ACS Omega, 2017, 2, 6244–6249 CrossRef CAS PubMed.
  73. J. De Bruijn, F. Busser, W. Seinen and J. Hermens, Environ. Toxicol. Chem., 1989, 8, 499–512 CrossRef CAS.
  74. B. McDuffie, Chemosphere, 1981, 10, 73–83 CrossRef CAS.
  75. H. Devoe, M. Miller and S. Wasik, J. Res. Natl. Bur. Stand., 1981, 86, 361 CrossRef CAS PubMed.
  76. J. Staudinger and P. V. Roberts, Crit. Rev. Environ. Sci. Technol., 1996, 26, 205–297 CrossRef CAS.
  77. S.-H. Lee, S. Mukherjee, B. Brewer, R. Ryan, H. Yu and M. Gangoda, J. Chem. Educ., 2013, 90, 495–499 CrossRef CAS.
  78. J. Kames, S. Schweighoefer and U. Schurath, J. Atmos. Chem., 1991, 12, 169–180 CrossRef CAS.
  79. H. S. S. Ip, X. H. H. Huang and J. Z. Yu, Geophys. Res. Lett., 2009, 36, 2008GL036212 CrossRef.
  80. M. Bursch, J.-M. Mewes, A. Hansen and S. Grimme, Angew. Chem., Int. Ed., 2022, 61, e202205735 CrossRef CAS PubMed.
  81. S. Ehlert, M. Stahn, S. Spicher and S. Grimme, J. Chem. Theory Comput., 2021, 17, 4250–4261 CrossRef CAS PubMed.
  82. S. Grimme, F. Bohle, A. Hansen, P. Pracht, S. Spicher and M. Stahn, J. Phys. Chem. A, 2021, 125, 4039–4054 CrossRef CAS PubMed.
  83. Y. C. Martin, J. Comput.-Aided Mol. Des., 2009, 23, 693–704 CrossRef CAS PubMed.
  84. D. Suárez and N. Díaz, Wiley Interdiscip. Rev.:Comput. Mol. Sci., 2015, 5, 1–26 Search PubMed.
  85. J. Gorges, S. Grimme, A. Hansen and P. Pracht, Phys. Chem. Chem. Phys., 2022, 24, 12249–12259 RSC.
  86. IUPAC – Solvation Energy (ST07102), https://goldbook.iupac.org/terms/view/ST07102.
  87. E. Cancès, B. Mennucci and J. Tomasi, J. Chem. Phys., 1997, 107, 3032–3041 CrossRef.
  88. A. Klamt and G. Schüürmann, J. Chem. Soc., Perkin Trans., 1993, 2, 799–805 RSC.
  89. V. Barone and M. Cossi, J. Phys. Chem. A, 1998, 102, 1995–2001 CrossRef CAS.
  90. A. V. Marenich, C. J. Cramer and D. G. Truhlar, J. Chem. Theory Comput., 2013, 9, 609–620 CrossRef CAS PubMed.
  91. S. Miertuš and J. Tomasi, Chem. Phys., 1982, 65, 239–245 CrossRef.
  92. J. Tomasi and M. Persico, Chem. Rev., 1994, 94, 2027–2094 CrossRef CAS.
  93. C. Colominas, F. J. Luque, J. Teixidó and M. Orozco, Chem. Phys., 1999, 240, 253–264 CrossRef CAS.
  94. R. D. Cunha, S. Romero-Téllez, F. Lipparini, F. J. Luque and C. Curutchet, J. Comput. Chem., 2025, 46, e70027 CrossRef CAS PubMed.
  95. A. Klamt, Wiley Interdiscip. Rev.:Comput. Mol. Sci., 2018, 8, e1338 Search PubMed.
  96. A. Pomogaeva, D. W. Thompson and D. M. Chipman, Chem. Phys. Lett., 2011, 511, 161–165 CrossRef CAS.
  97. A. Pomogaeva and D. M. Chipman, J. Chem. Theory Comput., 2011, 7, 3952–3960 CrossRef CAS PubMed.
  98. A. Pomogaeva and D. M. Chipman, J. Phys. Chem. A, 2013, 117, 5812–5820 CrossRef CAS PubMed.
  99. Z.-Q. You and J. M. Herbert, J. Chem. Theory Comput., 2016, 12, 4338–4346 CrossRef CAS PubMed.
  100. R. Sundararaman, K. A. Schwarz, K. Letchworth-Weaver and T. A. Arias, J. Chem. Phys., 2015, 142, 054102 CrossRef PubMed.
  101. R. Sundararaman and W. A. Goddard III, J. Chem. Phys., 2015, 142, 064107 CrossRef PubMed.
  102. M. Stahn, S. Ehlert and S. Grimme, J. Phys. Chem. A, 2023, 127, 7036–7043 CrossRef CAS PubMed.
  103. S. F. Vyboishchikov and A. A. Voityuk, J. Comput. Chem., 2021, 42, 1184–1194 CrossRef CAS PubMed.
  104. O. J. Conquest, T. Roman, A. Marianov, A. Kochubei, Y. Jiang and C. Stampfl, J. Chem. Theory Comput., 2021, 17, 7753–7771 CrossRef CAS PubMed.
  105. D. L. Mobley, K. A. Dill and J. D. Chodera, J. Phys. Chem. B, 2008, 112, 938–946 CrossRef CAS PubMed.
  106. V. Radtke, D. Stoica, I. Leito, F. Camões, I. Krossing, B. Anes, M. Roziková, L. Deleebeeck, S. Veltzé, T. Näykki, F. Bastkowski, A. Heering, N. Dániel, R. Quendera, L. Liv, E. Uysal and N. Lawrence, Pure Appl. Chem., 2021, 93, 1049–1060 CrossRef CAS.
  107. GitHub - hoelzerC/Smi2xyz: Conversion of SMILES to Xyz, https://github.com/hoelzerC/smi2xyz.
  108. P. Pracht, C. A. Bauer and S. Grimme, J. Comput. Chem., 2017, 38, 2618–2631 CrossRef CAS PubMed.
  109. P. Pracht, S. Grimme, C. Bannwarth, F. Bohle, S. Ehlert, G. Feldmann, J. Gorges, M. Müller, T. Neudecker, C. Plett, S. Spicher, P. Steinbach, P. A. Wesołowski and F. Zeller, J. Chem. Phys., 2024, 160, 114110 CrossRef CAS PubMed.
  110. N. Van Staalduinen and C. Bannwarth, Digital Discovery, 2024, 3, 2298–2319 RSC.
  111. B. De Souza, Angew. Chem., Int. Ed., 2025, 64, e202500393 CrossRef CAS PubMed.
  112. F. Neese, Wiley Interdiscip. Rev.:Comput. Mol. Sci., 2025, 15, e70019 Search PubMed.
  113. M. Garcia-Ratés and F. Neese, J. Comput. Chem., 2020, 41, 922–939 CrossRef PubMed.
  114. A. Kovács, Symmetry, 2024, 16, 1668 CrossRef.
  115. T. Gerlach, S. Müller, A. G. De Castilla and I. Smirnova, Fluid Phase Equilib., 2022, 560, 113472 CrossRef CAS.
  116. S. Sinnecker, A. Rajendran, A. Klamt, M. Diedenhofen and F. Neese, J. Phys. Chem. A, 2006, 110, 2235–2245 CrossRef CAS PubMed.
  117. M. Renz, M. Kess, M. Diedenhofen, A. Klamt and M. Kaupp, J. Chem. Theory Comput., 2012, 8, 4189–4203 CrossRef CAS PubMed.
  118. A. Klamt and M. Diedenhofen, J. Phys. Chem. A, 2015, 119, 5439–5445 CrossRef CAS PubMed.
  119. M. Nottoli, R. Nifosì, B. Mennucci and F. Lipparini, J. Chem. Theory Comput., 2021, 17, 5661–5672 CrossRef CAS PubMed.
  120. F. Lipparini, G. Scalmani, L. Lagardère, B. Stamm, E. Cancès, Y. Maday, J.-P. Piquemal, M. J. Frisch and B. Mennucci, J. Chem. Phys., 2014, 141, 184108 CrossRef PubMed.
  121. Y. Minenkov, J. Chem. Theory Comput., 2023, 19, 5221–5230 CrossRef CAS PubMed.
  122. S. F. Vyboishchikov and A. A. Voityuk, J. Chem. Inf. Model., 2021, 61, 4544–4553 CrossRef CAS PubMed.
  123. Y. Chung, F. H. Vermeire, H. Wu, P. J. Walker, M. H. Abraham and W. H. Green, J. Chem. Inf. Model., 2022, 62, 433–446 CrossRef CAS PubMed.
  124. Y. Pathak, S. Mehta and U. D. Priyakumar, J. Chem. Inf. Model., 2021, 61, 689–698 CrossRef CAS PubMed.
  125. S. F. Vyboishchikov, J. Chem. Inf. Model., 2023, 63, 6283–6292 CrossRef CAS PubMed.
  126. S. F. Vyboishchikov, J. Comput. Chem., 2025, 46, e70104 CrossRef CAS PubMed.
  127. T. Froitzheim, M. Müller, A. Hansen and S. Grimme, G-xTB: A General-Purpose Extended Tight-Binding Electronic Structure Method For the Elements H to Lr (Z=1–103), 2025 Search PubMed.
  128. GitHub – Tnbrowncontam/Ifsqsar, https://github.com/tnbrowncontam/ifsqsar.
  129. A. Pedretti, A. Mazzolari, S. Gervasoni, L. Fumagalli and G. Vistoli, Bioinformatics, 2021, 37, 1174–1175 CrossRef CAS PubMed.
  130. R. Ahlrichs, M. Bär, M. Häser, H. Horn and C. Kölmel, Chem. Phys. Lett., 1989, 162, 165–169 CrossRef CAS.
  131. S. G. Balasubramani, G. P. Chen, S. Coriani, M. Diedenhofen, M. S. Frank, Y. J. Franzke, F. Furche, R. Grotjahn, M. E. Harding, C. Hättig, A. Hellweg, B. Helmich-Paris, C. Holzer, U. Huniar, M. Kaupp, A. Marefat Khah, S. Karbalaei Khani, T. Müller, F. Mack, B. D. Nguyen, S. M. Parker, E. Perlt, D. Rappoport, K. Reiter, S. Roy, M. Rückert, G. Schmitz, M. Sierka, E. Tapavicza, D. P. Tew, C. Van Wüllen, V. K. Voora, F. Weigend, A. Wodyński and J. M. Yu, J. Chem. Phys., 2020, 152, 184107 CrossRef CAS PubMed.
  132. Y. J. Franzke, C. Holzer, J. H. Andersen, T. Begušić, F. Bruder, S. Coriani, F. Della Sala, E. Fabiano, D. A. Fedotov, S. Fürst, S. Gillhuber, R. Grotjahn, M. Kaupp, M. Kehry, M. Krstić, F. Mack, S. Majumdar, B. D. Nguyen, S. M. Parker, F. Pauly, A. Pausch, E. Perlt, G. S. Phun, A. Rajabi, D. Rappoport, B. Samal, T. Schrader, M. Sharma, E. Tapavicza, R. S. Treß, V. Voora, A. Wodyński, J. M. Yu, B. Zerulla, F. Furche, C. Hättig, M. Sierka, D. P. Tew and F. Weigend, J. Chem. Theory Comput., 2023, 19, 6859–6890 CrossRef CAS PubMed.
  133. F. Neese, Wiley Interdiscip. Rev.:Comput. Mol. Sci., 2012, 2, 73–78 CAS.
  134. F. Neese, F. Wennmohs, U. Becker and C. Riplinger, J. Chem. Phys., 2020, 152, 224108 CrossRef CAS PubMed.
  135. S. Lehtola, C. Steigemann, M. J. Oliveira and M. A. Marques, SoftwareX, 2018, 7, 1–5 CrossRef.
  136. D. Andrae, U. Häußermann, M. Dolg, H. Stoll and H. Preuß, Theor. Chim. Acta, 1990, 77, 123–141 CrossRef CAS.
  137. K. A. Peterson, D. Figgen, E. Goll, H. Stoll and M. Dolg, J. Chem. Phys., 2003, 119, 11113–11123 CrossRef CAS.
  138. G. L. Stoychev, A. A. Auer and F. Neese, J. Chem. Theory Comput., 2017, 13, 554–562 CrossRef CAS PubMed.
  139. F. Neese, F. Wennmohs, A. Hansen and U. Becker, Chem. Phys., 2009, 356, 98–109 CrossRef CAS.
  140. R. Izsák and F. Neese, J. Chem. Phys., 2011, 135, 144105 CrossRef PubMed.
  141. B. Helmich-Paris, B. De Souza, F. Neese and R. Izsák, J. Chem. Phys., 2021, 155, 104109 CrossRef CAS PubMed.
  142. K. Eichkorn, O. Treutler, H. Öhm, M. Häser and R. Ahlrichs, Chem. Phys. Lett., 1995, 240, 283–290 CrossRef CAS.
  143. F. Weigend, Phys. Chem. Chem. Phys., 2006, 8, 1057 RSC.
  144. N. M. O'Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch and G. R. Hutchison, J. Cheminf., 2011, 3, 33 Search PubMed.
  145. P. Pracht, F. Bohle and S. Grimme, Phys. Chem. Chem. Phys., 2020, 22, 7169–7192 RSC.
  146. C. Bannwarth, S. Ehlert and S. Grimme, J. Chem. Theory Comput., 2019, 15, 1652–1671 CrossRef CAS PubMed.
  147. K. Burke, M. Ernzerhof and J. P. Perdew, Chem. Phys. Lett., 1997, 265, 115–120 CrossRef CAS.
  148. E. Caldeweyher, C. Bannwarth and S. Grimme, J. Chem. Phys., 2017, 147, 034112 CrossRef PubMed.
  149. E. Caldeweyher, S. Ehlert, A. Hansen, H. Neugebauer, S. Spicher, C. Bannwarth and S. Grimme, J. Chem. Phys., 2019, 150, 154122 CrossRef.
  150. L. Wittmann, I. Gordiy, M. Friede, B. Helmich-Paris, S. Grimme, A. Hansen and M. Bursch, Phys. Chem. Chem. Phys., 2024, 26, 21379–21394 RSC.
  151. F. Weigend and R. Ahlrichs, Phys. Chem. Chem. Phys., 2005, 7, 3297 RSC.
  152. S. Grimme, A. Hansen, S. Ehlert and J. M. Mewes, J. Chem. Phys., 2021, 154, 64103 CrossRef CAS PubMed.
  153. T. Gasevic, J. B. Stückrath, S. Grimme and M. Bursch, J. Phys. Chem. A, 2022, 126, 3826–3838 CrossRef CAS.
  154. N. Mardirossian and M. Head-Gordon, J. Chem. Phys., 2016, 144, 214110 CrossRef PubMed.
  155. F. Eckert and A. Klamt, AIChE J., 2002, 48, 369–385 CrossRef CAS.
  156. C. G. C. KG, COSMOtherm, Version C3.0, Release 17.01, 2017 Search PubMed.
  157. J. P. Perdew and W. Yue, Phys. Rev. B:Condens. Matter Mater. Phys., 1986, 33, 8800–8802 CrossRef PubMed.
  158. A. Schäfer, H. Horn and R. Ahlrichs, J. Chem. Phys., 1992, 97, 2571–2577 CrossRef.
  159. K. Eichkorn, F. Weigend, O. Treutler and R. Ahlrichs, Theor. Chem. Acc. and Theor. Chim. Acta., 1997, 97, 119–124 CrossRef CAS.
  160. A. Katbashev, M. Stahn, T. Rose, V. Alizadeh, M. Friede, C. Plett, P. Steinbach and S. Ehlert, J. Phys. Chem. A, 2025, 129, 2667–2682 CrossRef CAS PubMed.
  161. M. Nottoli, M. F. Herbst, A. Mikhalev, A. Jha, F. Lipparini and B. Stamm, Wiley Interdiscip. Rev.:Comput. Mol. Sci., 2024, 14, e1726 CAS.
  162. A. A. Voityuk and S. F. Vyboishchikov, Phys. Chem. Chem. Phys., 2020, 22, 14591–14598 RSC.
  163. S. F. Vyboishchikov, J. Chem. Theory Comput., 2023, 19, 8340–8350 CrossRef CAS PubMed.
  164. S. Baskaran, Y. D. Lei and F. Wania, J. Phys. Chem. Ref. Data, 2021, 50, 043101 CrossRef CAS.
  165. R. Sander, Atmos. Chem. Phys., 2023, 23, 10901–12440 CrossRef CAS.
  166. F. H. Vermeire and W. H. Green, Chem. Eng. J., 2021, 418, 129307 CrossRef CAS.
  167. L. M. Grubbs, M. Saifullah, N. E. De La Rosa, S. Ye, S. S. Achi, W. E. Acree and M. H. Abraham, Fluid Phase Equilib., 2010, 298, 48–53 CrossRef CAS.
  168. G. Bronner, K. Fenner and K.-U. Goss, Fluid Phase Equilib., 2010, 299, 207–215 CrossRef CAS.
  169. S. Lee, K.-H. Cho, C. J. Lee, G. E. Kim, C. H. Na, Y. In and K. T. No, J. Chem. Inf. Model., 2011, 51, 105–114 CrossRef CAS PubMed.
  170. S. H. Hilal, S. N. Ayyampalayam and L. A. Carreira, Environ. Sci. Technol., 2008, 42, 9231–9236 CrossRef CAS PubMed.
  171. R.-U. Ebert, R. Kühne and G. Schüürmann, Environ. Sci. Technol., 2023, 57, 976–984 CrossRef CAS PubMed.
  172. R. Naef and W. E. Acree, Liquids, 2024, 4, 231–260 CrossRef CAS.
  173. S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He, Q. Li, B. A. Shoemaker, P. A. Thiessen, B. Yu, L. Zaslavsky, J. Zhang and E. E. Bolton, Nucleic Acids Res., 2024, 53, D1516–D1525 CrossRef.
  174. C. F. Poole, J. Chromatogr. A, 2023, 1706, 464213 CrossRef CAS PubMed.
  175. M. Nendza, V. Kosfeld and C. Schlechtriem, Environ. Sci. Eur., 2025, 37, 44 CrossRef CAS.
  176. C. Knox, M. Wilson, C. M. Klinger, M. Franklin, E. Oler, A. Wilson, A. Pon, J. Cox, N. E. L. Chin, S. A. Strawbridge, M. Garcia-Patino, R. Kruger, A. Sivakumaran, S. Sanford, R. Doshi, N. Khetarpal, O. Fatokun, D. Doucet, A. Zubkowski, D. Y. Rayat, H. Jackson, K. Harford, A. Anjum, M. Zakir, F. Wang, S. Tian, B. Lee, J. Liigand, H. Peters, R. Q. R. Wang, T. Nguyen, D. So, M. Sharp, R. da Silva, C. Gabriel, J. Scantlebury, M. Jasinski, D. Ackerman, T. Jewison, T. Sajed, V. Gautam and D. S. Wishart, Nucleic Acids Res., 2024, 52, D1265–D1275 CrossRef CAS PubMed.
  177. D. S. Wishart, A. Guo, E. Oler, F. Wang, A. Anjum, H. Peters, R. Dizon, Z. Sayeeda, S. Tian, B. L. Lee, M. Berjanskii, R. Mah, M. Yamamoto, J. Jovel, C. Torres-Calzada, M. Hiebert-Giesbrecht, V. W. Lui, D. Varshavi, D. Varshavi, D. Allen, D. Arndt, N. Khetarpal, A. Sivakumaran, K. Harford, S. Sanford, K. Yee, X. Cao, Z. Budinski, J. Liigand, L. Zhang, J. Zheng, R. Mandal, N. Karu, M. Dambrova, H. B. Schiöth, R. Greiner and V. Gautam, Nucleic Acids Res., 2022, 50, D622–D631 CrossRef CAS PubMed.
  178. Y. Liang, D. T. Kuo, H. E. Allen and D. M. Di Toro, Chemosphere, 2016, 161, 429–437 CrossRef CAS PubMed.
  179. W. J. Zamora, A. Viayna, S. Pinheiro, C. Curutchet, L. Bisbal, R. Ruiz, C. Ràfols and F. J. Luque, Phys. Chem. Chem. Phys., 2023, 25, 17952–17965 RSC.
  180. R. Wolfenden, L. Andersson, P. M. Cullis and C. C. Southgate, Biochemistry, 1981, 20, 849–855 CrossRef CAS.
  181. J. Hine and P. K. Mookerjee, J. Org. Chem., 1975, 40, 292–298 CrossRef CAS.
  182. J. P. Guthrie, J. Phys. Chem. B, 2009, 113, 4501–4507 CrossRef CAS PubMed.
  183. Test No. 107: Partition Coefficient (n-octanol/water): Shake Flask Method, 1995, https://www.oecd.org/en/publications/test-no-107-partition-coefficient-n-octanol-water-shake-flask-method_9789264069626-en.html.
  184. G. Ermondi, M. Vallaro, G. Goetz, M. Shalaeva and G. Caron, Future Drug Discovery, 2019, 1, FDD10 CrossRef.
  185. E. Fritschka and G. Sadowski, Mol. Pharm., 2025, 22(8), 4930–4939 CrossRef CAS.
  186. M. Işık, D. Levorse, D. L. Mobley, T. Rhodes and J. D. Chodera, Future Drug Discovery, 2020, 34, 405–420 Search PubMed.
  187. G. Koch, A. Engstrom, J. Taechalertpaisarn, J. Faris, S. Ono, M. R. Naylor and R. S. Lokey, J. Med. Chem., 2024, 67, 19612–19622 CrossRef CAS.
  188. Y. H. Zhao and M. H. Abraham, J. Org. Chem., 2005, 70, 2633–2640 CrossRef CAS PubMed.
  189. B. Case and R. Parsons, Trans. Faraday Soc., 1967, 63, 1224 RSC.
  190. C. P. Kelly, C. J. Cramer and D. G. Truhlar, J. Phys. Chem. B, 2006, 110, 16066–16081 CrossRef CAS PubMed.
  191. T. Nevolianis, J. W. Zheng, S. Müller, M. Baumann, S. Tshepelevitsh, I. Kaljurand, I. Leito, I. Smirnova, W. H. Green and K. Leonhard, J. Am. Chem. Soc., 2025, 147(34), 30626–30646 CrossRef CAS PubMed.
  192. M. Stahn, S. Grimme, T. Salthammer, U. Hohm and W.-U. Palm, Environ. Sci.: Processes Impacts, 2022, 24, 2153–2166 RSC.
  193. P. Pracht and S. Grimme, J. Phys. Chem. A, 2021, 125, 5681–5692 CrossRef CAS PubMed.
  194. S. Xu and B. Kropscott, Anal. Chem., 2012, 84, 1948–1955 CrossRef CAS PubMed.
  195. A. Klamt, F. Eckert, M. Diedenhofen and M. E. Beck, J. Phys. Chem. A, 2003, 107, 9380–9386 CrossRef CAS.
  196. O. Andreussi, N. G. Hörmann, F. Nattino, G. Fisicaro, S. Goedecker and N. Marzari, J. Chem. Theory Comput., 2019, 15, 1996–2009 CrossRef CAS.
  197. J. Zhang, H. Zhang, T. Wu, Q. Wang and D. van der Spoel, J. Chem. Theory Comput., 2017, 13, 1034–1043 CrossRef CAS.
  198. F. Šebesta, Ž. Sovová and J. V. Burda, J. Phys. Chem. B, 2024, 128, 1627–1637 CrossRef.
  199. P. Pernot, B. Huang and A. Savin, Mach. learn.: sci. technol., 2020, 1, 035011 Search PubMed.
  200. L. Wittmann, T. Salthammer and U. Hohm, Environ. Sci.: Processes Impacts, 2025 10.1039/D5EM00524H.
  201. C. Plett, S. Grimme and A. Hansen, J. Comput. Chem., 2024, 45, 419–429 CrossRef CAS PubMed.
  202. S. Ehlert, S. Grimme and A. Hansen, J. Phys. Chem. A, 2022, 126, 3521–3535 CrossRef CAS.
  203. C. Plett, S. Grimme and A. Hansen, J. Chem. Theory Comput., 2024, 20(18), 8329–8339 CAS.
  204. L. Wittmann, H. Neugebauer, S. Grimme and M. Bursch, J. Chem. Phys., 2023, 159, 224103 CrossRef CAS PubMed.
  205. C. Adamo and V. Barone, J. Chem. Phys., 1999, 110, 6158–6170 CrossRef CAS.
  206. A. D. Becke, Phys. Rev. A:At., Mol., Opt. Phys., 1988, 38, 3098 CrossRef CAS.
  207. C. Lee, W. Yang and R. G. Parr, Phys. Rev. B: Condens. Matter Mater. Phys., 1988, 37, 785–789 CrossRef CAS PubMed.
  208. A. Tkatchenko, R. A. DiStasio, R. Car and M. Scheffler, Phys. Rev. Lett., 2012, 108, 236402 CrossRef PubMed.
  209. A. Ambrosetti, A. M. Reilly, R. A. DiStasio Jr and A. Tkatchenko, J. Chem. Phys., 2014, 140, 18A508 CrossRef PubMed.
  210. M. Puleva, L. Medrano Sandonas, B. D. Lőrincz, J. Charry, D. M. Rogers, P. R. Nagy and A. Tkatchenko, Nat. Commun., 2025, 16, 8583 CrossRef CAS PubMed.
  211. M. Marianski, A. Supady, T. Ingram, M. Schneider and C. Baldauf, J. Chem. Theory Comput., 2016, 12, 6157–6168 CrossRef CAS PubMed.
  212. H. Kruse, A. Mladek, K. Gkionis, A. Hansen, S. Grimme and J. Sponer, J. Chem. Theory Comput., 2015, 11, 4972–4991 CrossRef CAS PubMed.
  213. B. B. Mészáros, K. Kubicskó, D. D. Németh and J. Daru, J. Chem. Theory Comput., 2024, 20, 7385–7392 CrossRef PubMed.
  214. H. Mun, W. Lorpaiboon and J. Ho, J. Phys. Chem. A, 2024, 128, 4391–4400 CrossRef CAS PubMed.
  215. J. R. Pliego, J. Mol. Liq., 2022, 359, 119368 CrossRef CAS.
  216. N. Ancona, A. Bastola and E. Alexov, J. Comput. Biophys. Chem., 2023, 22, 515–524 CrossRef CAS PubMed.
  217. T. Nevolianis, M. Baumann, N. Viswanathan, W. A. Kopp and K. Leonhard, Fluid Phase Equilib., 2023, 571, 113801 CrossRef CAS.
  218. H. Zhang, V. Juraskova and F. Duarte, Nat. Commun., 2024, 15, 6114 CrossRef CAS PubMed.
  219. Solubility Data Series – IUPAC, International Union of Pure and Applied Chemistry, https://iupac.org/what-we-do/databases/solubility-data-series/.
  220. H. Marques and S. Müller, Fluid Phase Equilib., 2025, 592, 114335 CrossRef CAS.
  221. J. Ferraz-Caetano, F. Teixeira and M. N. D. S. Cordeiro, J. Chem. Inf. Model., 2024, 64, 2250–2262 CrossRef CAS PubMed.
  222. A. A. Toropov, A. P. Toropova, A. Roncaglioni, E. Benfenati, D. Leszczynska and J. Leszczynski, Molecules, 2023, 28, 7231 CrossRef CAS PubMed.
  223. D. Zhang, S. Xia and Y. Zhang, J. Chem. Inf. Model., 2022, 62, 1840–1848 CrossRef CAS PubMed.
  224. J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman and D. A. Case, J. Comput. Chem., 2004, 25, 1157–1174 CrossRef CAS PubMed.
  225. S. Spicher and S. Grimme, Angew. Chem., 2020, 132, 15795–15803 CrossRef.
  226. S. Grimme and T. Rose, J. Nat. Prod. B, 2024, 79, 191–200 CAS.
  227. B. M. Wood, M. Dzamba, X. Fu, M. Gao, M. Shuaibi, L. Barroso-Luque, K. Abdelmaqsoud, V. Gharakhanyan, J. R. Kitchin, D. S. Levine, K. Michel, A. Sriram, T. Cohen, A. Das, A. Rizvi, S. J. Sahoo, Z. W. Ulissi and C. L. Zitnick, UMA: A Family of Universal Models for Atoms, 2025 Search PubMed.
  228. A. Kabylda, J. T. Frank, S. Suárez-Dou, A. Khabibrakhmanov, L. Medrano Sandonas, O. T. Unke, S. Chmiela, K.-R. Müller and A. Tkatchenko, J. Am. Chem. Soc., 2025, 147, 33723–33734 CrossRef CAS PubMed.
  229. K. T. Schütt, O. T. Unke and M. Gastegger, Equivariant message passing for the prediction of tensorial properties and molecular spectra, arXiv, 2021, preprint, arXiv:2102.03150[cs],  DOI:10.48550/arXiv.2102.03150, http://arxiv.org/abs/2102.03150.

Footnote

These authors contributed equally to this work.

This journal is © The Royal Society of Chemistry 2025
Click here to see how this site uses Cookies. View our privacy policy here.