The euroSAMPL1 pKa blind prediction and reproducible research data management challenge

Abstract

The development and testing of methods in computational chemistry for the prediction of physicochemical properties is by now a mature form of scientific research, with a number of different methods ranging from molecular mechanics simulations, over quantum calculations, to empirical and machine learning models. Blind prediction challenges for these properties are regularly organized to allow researchers from academia and industry to test their methods in a fair and unbiased manner. At the same time, research data management (RDM) is still not utilized as extensively as it could be in the development and application of such models, especially in academia. In particular, the FAIR standards (Findable, Accessible, Interoperable, Reusable) can serve as guidelines for good RDM, but many models, the data used to train them, and the data they generate fall short of one, or multiple, of these standards. The goal of the first euroSAMPL pKa blind prediction challenge was to promote and help develop good RDM standards for computational chemistry. To achieve this, the challenge was designed to rank not just the predictive performance of the models but also evaluate the adherence to the FAIR principles by cross-evaluation of the participants themselves. We here present the analysis of the blind prediction quality by their statistical metrics as well as of the cross-evaluation by a newly defined “FAIRscore”. The results suggest that multiple methods can predict the pKa to within chemical accuracy, but also that “consensus” predictions constructed from multiple, independent methods may outperform each individual prediction. Furthermore, the state of research data management in the field of computational chemistry is discussed, and suggestions for future improvements developed.

Graphical abstract: The euroSAMPL1 pKa blind prediction and reproducible research data management challenge

Article information

Article type
Paper
Submitted
15 Apr 2025
Accepted
11 Jul 2025
First published
02 Sep 2025
This article is Open Access
Creative Commons BY license

Phys. Chem. Chem. Phys., 2025, Advance Article

The euroSAMPL1 pKa blind prediction and reproducible research data management challenge

N. Tielker, M. Lim, P. Kibies, J. Gretz, B. Hein-Janke, C. Chodun, R. A. Mata, P. Czodrowski and S. M. Kast, Phys. Chem. Chem. Phys., 2025, Advance Article , DOI: 10.1039/D5CP01448D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements