Generation and benchmarking of a diverse reaction database of quantum mechanical liquid-phase activation Gibbs free energies
Abstract
Many chemical reactions occur in the liquid phase, making the accurate prediction of the liquid-phase activation Gibbs free energy, $\Delta^{\ne} G^{\circ,\mathrm{L}}$, crucial for numerous applications. Quantum mechanical (QM) methods with implicit solvation models offer a valuable route to $\Delta^{\ne} G^{\circ,\mathrm{L}}$ prediction, although they are computationally demanding at high levels of theory and for larger systems. Data-driven surrogate models can address this issue but require extensive training and test datasets. We present here the liquid phase reaction energy database (LiPRED-2025), a QM reaction database containing 4,513 $\Delta^{\ne} G^{\circ,\mathrm{L}}$ values for 28 diverse chemical reactions computed in various solvents at 298.15 K. The reactions have been chosen for their sensitivity to solvent effects and the availability of experimental data. The SMD model is employed to calculate solvation contributions to $\Delta^{\ne} G^{\circ,\mathrm{L}}$ because it can be used to account for the effect of solvent on the geometries of the reactants and transition states and it is suitable for charged species. The database contains $\Delta^{\ne} G^{\circ,\mathrm{L}}$ obtained from seven calculation methods, including the thermodynamic cycle method, the direct method, and their variants. Using a subset of the database, a benchmarking study shows that the best methods achieve a mean absolute error of 2.89 kcal mol\textsuperscript{-1} in absolute $\Delta^{\ne} G^{\circ,\mathrm{L}}$ and 1.00 kcal mol\textsuperscript{-1} in relative $\Delta^{\ne} G^{\circ,\mathrm{L}}$, respectively. The use of a higher level of theory to calculate $\Delta^{\ne} G^{\circ,\mathrm{L}}$ improves relative $\Delta^{\ne} G^{\circ,\mathrm{L}}$ values only, but not absolute ones. These results provide valuable insights into the choice of methods and levels of theory appropriate for calculating $\Delta^{\ne} G^{\circ,\mathrm{L}}$, while the database can serve for training and testing surrogate models.
Please wait while we load your content...