A fragment based approach towards curating, comparing and developing machine learning models applied in photochemistry

Raúl Pérez-Soto; Mihai V. Popescu; Sabari Kumar; Leticia A. Gomes; Changyeob Lee; Elijah Shore; Steven A. Lopez; Robert S. Paton; Seonah Kim

doi:10.1039/D5SC05615B

A fragment based approach towards curating, comparing and developing machine learning models applied in photochemistry

Raúl Pérez-Soto,

†^a Mihai V. Popescu,†^a Sabari Kumar,†^a Leticia A. Gomes,

^b Changyeob Lee,^a Elijah Shore,^a Steven A. Lopez,

*^b Robert S. Paton

*^a and Seonah Kim

*^a

Author affiliations

* Corresponding authors

^a Department of Chemistry, Colorado State University, Fort Collins, CO 80523, USA
E-mail: s.lopez@northeastern.edu

^b Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA
E-mail: robert.paton@colostate.edu, seonah.kim@colostate.edu

Abstract

The development of graph neural networks for predicting molecular properties has garnered significant attention, as it enables the correlation of quickly computable atomic and bond descriptors with overall molecular properties. With the rising interest in photochemistry and photocatalysis as sustainable alternatives to thermal reactions, curation of virtual databases of computed photophysical properties for training of machine learning models has become popular. Unfortunately, current efforts fail to consider the exciton localization onto different chromophores of the same molecule, leading to potentially large prediction errors. Here we describe a molecular fragmentation strategy that can be used to overcome this limitation, while also providing a way to compare structural diversity between different libraries. Using a newly generated database of 46 432 adiabatic S₀–T₁ energy gaps (ALFAST-DB), we compare its diversity with two datasets from the literature and demonstrate that a fragment-based delta learning approach improves model generalizability while achieving accuracies comparable to those of traditional message passing graph neural network architectures (MPGNN).

This article is part of the themed collection: #MyFirstChemSci 2025

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

DOI: https://doi.org/10.1039/D5SC05615B
Article type: Edge Article
Submitted: 26 Jul 2025
Accepted: 11 Oct 2025
First published: 15 Oct 2025
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2025, Advance Article

Permissions

Request permissions

A fragment based approach towards curating, comparing and developing machine learning models applied in photochemistry

R. Pérez-Soto, M. V. Popescu, S. Kumar, L. A. Gomes, C. Lee, E. Shore, S. A. Lopez, R. S. Paton and S. Kim, Chem. Sci., 2025, Advance Article , DOI: 10.1039/D5SC05615B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Chemical Science

A fragment based approach towards curating, comparing and developing machine learning models applied in photochemistry

Abstract

Supplementary files

Transparent peer review

Article information

Download Citation

Permissions

A fragment based approach towards curating, comparing and developing machine learning models applied in photochemistry

Social activity

Search articles by author

Spotlight

Advertisements