 Open Access Article
 Open Access Article
      
        
          
            Rashid R. 
            Valiev
          
        
       *a, 
      
        
          
            Rinat T. 
            Nasibullin
          
        
      a, 
      
        
          
            Hilda 
            Sandström
          
        
      b, 
      
        
          
            Patrick 
            Rinke
*a, 
      
        
          
            Rinat T. 
            Nasibullin
          
        
      a, 
      
        
          
            Hilda 
            Sandström
          
        
      b, 
      
        
          
            Patrick 
            Rinke
          
        
       bcde, 
      
        
          
            Kai 
            Puolamäki
bcde, 
      
        
          
            Kai 
            Puolamäki
          
        
       f and 
      
        
          
            Theo 
            Kurten
          
        
      *a
f and 
      
        
          
            Theo 
            Kurten
          
        
      *a
      
aDepartment of Chemistry, University of Helsinki, P.O. Box 55 (A.I. Virtanens plats 1), FIN-00014, Finland. E-mail: valievrashid@gmail.com; theo.kurten@helsinki.fi
      
bDepartment of Applied Physics, Aalto University, P.O. Box 11100, 00076 Aalto, Finland
      
cPhysics Department, TUM School of Natural Sciences, Technical University of Munich, 85748 Garching, Germany
      
dAtomistic Modelling Center, Munich Data Science Institute, Technical University of Munich, 85748 Garching, Germany
      
eMunich Center for Machine Learning (MCML), 80538 Munich, Germany
      
fDepartment of Computer Science, University of Helsinki, P.O. Box 68 (Pehr Kalms gata 5), FIN-00014, Finland
    
First published on 27th June 2025
Peroxy radicals (RO2) are ubiquitous intermediates in many oxidation processes, especially in the atmospheric gas phase. The recombination reaction of two peroxy radicals (RO2 + R′O2) has been demonstrated to lead, via several steps, to a triplet complex of two alkoxy radicals: 3(RO˙⋯R′O˙). The different product channels of RO2 + R′O2 reactions thus correspond to different reactions of this triplet complex. Of particular interest to atmospheric chemistry is the intersystem crossing (ISC) to the singlet state, which enables the recombination of the two radicals to an ROOR′ peroxide with considerably lower volatility than the original precursors. These peroxides are believed to be key contributors to the formation of secondary organic aerosol (SOA) particles, which in turn contribute to both air pollution and radiative forcing uncertainties. Developing reliable computational models for, e.g., RO2 + R′O2 branching ratios requires accurate estimates of the ISC rate constants, which can currently be obtained only from computationally expensive quantum chemistry calculations. By contrast, machine learning (ML) methods offer a faster alternative for estimating ISC rate constants. In the present work, we create a dataset with 98![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 082 conformations of radical pairs and their corresponding rate constants. We apply three ML models—random forest (RF), CatBoost (CB), and a neural network (NN)—to predict ISC rate constants from triplet to singlet states. Specifically, the models predict kISC(T1 → Si) for i = 1–4 and the cumulative kISC(T1 → Sn), in alkoxy radical pairs, using only molecular geometry descriptors as inputs. All ML models achieved a mean absolute error (MAE) on our test set within one order of magnitude and a coefficient of determination R2 > 0.82 for all rate constants. Overall, the ML prediction matches the quantum chemical calculations within 1–2 orders of magnitude, providing a fast and scalable alternative for quantum chemical methods for ISC rate estimation.
082 conformations of radical pairs and their corresponding rate constants. We apply three ML models—random forest (RF), CatBoost (CB), and a neural network (NN)—to predict ISC rate constants from triplet to singlet states. Specifically, the models predict kISC(T1 → Si) for i = 1–4 and the cumulative kISC(T1 → Sn), in alkoxy radical pairs, using only molecular geometry descriptors as inputs. All ML models achieved a mean absolute error (MAE) on our test set within one order of magnitude and a coefficient of determination R2 > 0.82 for all rate constants. Overall, the ML prediction matches the quantum chemical calculations within 1–2 orders of magnitude, providing a fast and scalable alternative for quantum chemical methods for ISC rate estimation.
Recently, RO2 reactions with other RO2 have been identified as sources of very efficiently aerosol-forming accretion products: compounds with more carbon atoms than in the original reactant.4 Computational investigations of RO2 + R′O2 reactions have revealed that the mechanism for accretion product formation involves an intersystem crossing (ISC; spin–flip).5–17 While ISCs and other surface hoppings between electronic states are known to play important roles in many fields of chemistry and physics, their involvement in thermal (non-photochemical) atmospheric gas-phase reactions was somewhat surprising. Specifically, RO2 + R′O2 reactions inevitably lead to a metastable tetroxide (RO4R′) intermediate,5–13 which then decomposes into two alkoxy radicals (RO and R′O) and molecular oxygen (O2).9–17 For the reaction to be thermodynamically feasible, the O2 must be formed in its triplet ground state. Spin conservation then dictates that the alkoxy radical pairs are also coupled as a triplet. For all but the smallest (R![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) R′
R′![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) CH3) system,9 the RO⋯R′O interaction is stronger than the RO⋯O2 interaction, indicating that the O2 molecule will likely be ejected from the reaction system, leaving a 3(RO⋯R′O) complex, where the upper index 3 indicates the multiplicity. The reaction routes of this complex correspond to different product channels of the overall RO2 + R′O2 reaction. For example, dissociation leads to the radical propagating (RO + R′O) channel, a hydrogen shift (H-shift) between the two moieties leads to the molecular (alcohol + carbonyl) channel, while an ISC to the singlet state permits subsequent recombination, leading to peroxide (ROOR′) accretion products. We note that ISCs do not guarantee the formation of a peroxide, as both H-shifts and dissociation are possible also in the singlet state. Furthermore, sufficiently complex (functionalized) alkoxy radicals can undergo a variety of unimolecular reactions within the 3(RO⋯R′O) cluster, potentially leading to an even larger array of possible products.5,13,14 These reactions can also be either preceded or followed by an ISC. Overall, predictions of branching ratios of RO2 + R′O2 reactions, and thus their contribution to, for example, SOA formation, are currently limited by the available methods for estimating ISC rates.
CH3) system,9 the RO⋯R′O interaction is stronger than the RO⋯O2 interaction, indicating that the O2 molecule will likely be ejected from the reaction system, leaving a 3(RO⋯R′O) complex, where the upper index 3 indicates the multiplicity. The reaction routes of this complex correspond to different product channels of the overall RO2 + R′O2 reaction. For example, dissociation leads to the radical propagating (RO + R′O) channel, a hydrogen shift (H-shift) between the two moieties leads to the molecular (alcohol + carbonyl) channel, while an ISC to the singlet state permits subsequent recombination, leading to peroxide (ROOR′) accretion products. We note that ISCs do not guarantee the formation of a peroxide, as both H-shifts and dissociation are possible also in the singlet state. Furthermore, sufficiently complex (functionalized) alkoxy radicals can undergo a variety of unimolecular reactions within the 3(RO⋯R′O) cluster, potentially leading to an even larger array of possible products.5,13,14 These reactions can also be either preceded or followed by an ISC. Overall, predictions of branching ratios of RO2 + R′O2 reactions, and thus their contribution to, for example, SOA formation, are currently limited by the available methods for estimating ISC rates.
In general, the ISC rate constant (kISC) between two electronic states (denoted here Si and Tf) depends on two parameters: (1) the spin–orbital coupled interaction matrix elements (SOCME) between the states (〈ψ(Si)|ĤSO|ψ(Tf)〉); (2) the Franck–Condon (FC) factor or overlap between the nuclear wavefunctions of the states.18–21 The FC factor also depends on the energy gap between the states, which becomes especially important in cases where the final state is higher in energy than the initial state. Both SOCME and the FC factor may depend on the relative orientation of the radical pair. As 3(RO⋯R′O) complexes relevant to atmospheric chemistry are held together by non-covalent interactions that are individually relatively weak, thermal motion will allow them to explore a large variety of orientations. Thus, kISC needs to be computed as an ensemble average over all molecular conformations and relative orientations of the radical pair.
A simplified model of ISC rates can be obtained for radical pairs that are characterized by a small energy gap between the S1 and T1 states,18–24 resulting in a dependence primarily on SOCME rather than FC factors. Moreover, in these pairs, each spin is localized on a single atom, making SOCME sensitive to the distance and angular alignment between p-type atomic orbitals (AOs) on these atoms.6,18–24 This effect was first explained in the pioneering work of Salem and Rowland.23 Subsequent studies have further investigated SOCME's dependence on structural factors, proposing analytical solutions for simplified cases with the fitting parameters.2,18–24 However, a comprehensive universal analytical solution that can be applied to radical pairs with any kind of substituents remains elusive.
Quantum chemical calculations have a broad applicability and offer more accurate estimates of ISC rate constants within 1–2 orders of magnitude.6 For example, the kISC was calculated from the T1 state to the S1 state for the first time for minimum-energy geometries of several alkoxy radical pairs by Valiev et al.6 The computed rate displays strong variation depending on the structures and the relative orientation of the two radicals. Also, it was shown that the formation of ROOR′ is determined by the total (overall) ISC process, which is more complicated than just the ISC between S1 and T1 states, because there are four low-energy singlet excited states with strong SOCME. Thus, the total kISC is the sum of kISC (T1 → S1), kISC (T1 → S2), kISC (T1 → S3) and kISC (T1 → S4) as seen in Fig. 1. Note that after the transition into S2, S3, and S4 states, the internal conversion (IC) process occurs at a high rate, and the final state of the process is still S1.6 Thus, the transition from T1 to S1 can take place either in one step (ISC directly to the S1 state) or through two steps (involving an ISC to an intermediate state – S2, S3, or S4 – followed by IC to S1). However, obtaining these ISC rate constants is computationally expensive, and not feasible for the vast number of systems of interest to atmospheric chemistry.10
Here, we explore, if and how machine learning (ML)24–26 can be used as a tool to accelerate ISC rate constant estimations and to gain chemical insight into the relationship between the radical pair geometry and ISC formation rate. In particular, we apply ML to relate the structural aspects of alkoxy radical pair conformations, to the ISC rate constants for the alkoxy radical clusters.
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 conformations for each cluster and optimize them. Generation and geometrical optimization of clusters was performed using the semi-empirical GNF-xTB level of theory27 in the ABCluster program.28,29 All structural optimizations were performed with the system in only the triplet state, and not the singlet. We filtered our final set of cluster conformation ensembles to remove duplicates based if the radius of gyration (Rg) and electronic energy (E) criteria less than 0.01 Å and 0.001 Hartree respectively.6 These criteria are used to distinguish between unique structures and remove duplicates.6 After removing duplicates, our dataset consists of 98
000 conformations for each cluster and optimize them. Generation and geometrical optimization of clusters was performed using the semi-empirical GNF-xTB level of theory27 in the ABCluster program.28,29 All structural optimizations were performed with the system in only the triplet state, and not the singlet. We filtered our final set of cluster conformation ensembles to remove duplicates based if the radius of gyration (Rg) and electronic energy (E) criteria less than 0.01 Å and 0.001 Hartree respectively.6 These criteria are used to distinguish between unique structures and remove duplicates.6 After removing duplicates, our dataset consists of 98![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 082 radical pair cluster conformations.
082 radical pair cluster conformations.
        We focus on computing and predicting kISC(T1 → Si), where i = 1–4, and the total cumulative kISC. All five rate constants associated with each radical pair conformation were computed with quantum chemistry. Note that the four singlet and four triplet electronic excited states of RO˙⋯˙OR′ are nearly degenerate (within 10![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 cm−1). In this case, the correct description of energy gaps between them can be obtained using a multireference level of theory accounting for both static and dynamic electronic correlations.6,30 Therefore, we applied the extended quasi-degenerate 2nd-order multireference perturbation theory (XMC-QDPT2).30 These calculations were carried out using the Firefly software.31 Like in our previous work,6 we chose an active space consisting of 6 electrons in 4 p-type MOs, for the complete active space self-consistent field (CASSCF) with the stage average over four singlet and four triplet electronic states. The included orbitals are shown in Fig. 2 for a 3(MeO)2 cluster. The electronic states T1–T4 and S1–S4 are formed exclusively by electron transitions between the 2p orbitals localized on the oxygen atoms, where the electron spin is also localized. This electronic configuration can be understood within the framework of the topicity of radicals theory proposed by Minaev.16 For example, the CH3O radical and other alkoxy radicals are classified as bitopic radicals, featuring two radical centers (2px and 2py) with an energy splitting of 0.40 eV.22,32 Consequently, they generate four pairs of singlet and triplet states with similar energy in radical pairs.
000 cm−1). In this case, the correct description of energy gaps between them can be obtained using a multireference level of theory accounting for both static and dynamic electronic correlations.6,30 Therefore, we applied the extended quasi-degenerate 2nd-order multireference perturbation theory (XMC-QDPT2).30 These calculations were carried out using the Firefly software.31 Like in our previous work,6 we chose an active space consisting of 6 electrons in 4 p-type MOs, for the complete active space self-consistent field (CASSCF) with the stage average over four singlet and four triplet electronic states. The included orbitals are shown in Fig. 2 for a 3(MeO)2 cluster. The electronic states T1–T4 and S1–S4 are formed exclusively by electron transitions between the 2p orbitals localized on the oxygen atoms, where the electron spin is also localized. This electronic configuration can be understood within the framework of the topicity of radicals theory proposed by Minaev.16 For example, the CH3O radical and other alkoxy radicals are classified as bitopic radicals, featuring two radical centers (2px and 2py) with an energy splitting of 0.40 eV.22,32 Consequently, they generate four pairs of singlet and triplet states with similar energy in radical pairs.
|  | ||
| Fig. 2 Orbitals corresponding to the active space used for kISC rate calculation in state-averaged CASSCF (6,4)/6-311++G(d,p) MOs for a 3(MeO)2 cluster. | ||
The SOCME between S1–S4 and T1–T4 were calculated using the CASSCF method, but with the XMC-QDPT2/6-311++G(d,p) energies as the zero-order energies within perturbation theory. GAMESS-US33 was used for the CASSCF calculations.
The ISC rate constant calculation from T1 to the S1, S2, S3, and S4 electronic states was conducted using the method described in ref. 6 and 34. More details can be found in ref. 6, but here we give a short description. The main expression is
| kISC = 1.6 × 109〈i|ĤSO|f〉2FC, | (1) | 
|  | (2) | 
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 082 clusters consists of ten different cluster types in the following proportions: 7297 (7.96%) structures for 3(MeO)2, 7812 (7.96%) for 3(MeO⋯OAce), 8428 (8.59%) for 3(MeO⋯OEt), 14
082 clusters consists of ten different cluster types in the following proportions: 7297 (7.96%) structures for 3(MeO)2, 7812 (7.96%) for 3(MeO⋯OAce), 8428 (8.59%) for 3(MeO⋯OEt), 14![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 281 (14.56%) for 3(MeO⋯HOBuO), 6230 (6.35%) for 3(OAce)2, 8313 (8.48%) for 3(OAce⋯OEt), 14
281 (14.56%) for 3(MeO⋯HOBuO), 6230 (6.35%) for 3(OAce)2, 8313 (8.48%) for 3(OAce⋯OEt), 14![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 945 (15.24%) for 3(OAce⋯HOBuO), 5805 (5.92%) for 3(OEt)2, 12
945 (15.24%) for 3(OAce⋯HOBuO), 5805 (5.92%) for 3(OEt)2, 12![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 457 (12.70%) for 3(OEt⋯HOBuO), 12
457 (12.70%) for 3(OEt⋯HOBuO), 12![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 514 (12.76%) for 3(HOBuO)2.
514 (12.76%) for 3(HOBuO)2.
        We compared the relative size of the four kISC(T1 → Si) to identify those with largest impact on the overall kISC. which we from here on refer to as kISC(T1 → Sn). The number of structures in the full dataset where the T1 → S1 process dominates over other ISC processes is 10![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 049 (10.25%); for T1 → S2, it is 85
049 (10.25%); for T1 → S2, it is 85![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 163 (86.83%); for T1 → S3, it is 2866 (2.92%); and for T1 → S4, only 3 structures (0.00%). Thus, the rate constant for the T1 → Sn process is primarily determined by the T1 → S1 and T1 → S2 transitions.
163 (86.83%); for T1 → S3, it is 2866 (2.92%); and for T1 → S4, only 3 structures (0.00%). Thus, the rate constant for the T1 → Sn process is primarily determined by the T1 → S1 and T1 → S2 transitions.
The rate constant log histograms for T1 → S1, T1 → S2 and T1 → Sn transitions are shown in Fig. 3. We observe that the distribution of the T1 → Sn rate constant order repeats the distribution of the T1 → S2 rate constant order. This observation is expected since the number of structures where the T1 → S2 process dominates over other ISC processes is larger than T1 → S1. We note that there is a slight difference between the shapes of the two distributions near zero values. This is because structures where kISC(T1 → S2) is zero may have a nonzero kISC(T1 → S1), and as a result, the cumulative rate constant is mostly nonzero, as seen in the graph (Panel c).
|  | ||
| Fig. 3 Histograms of the logarithm of the calculated rate constant (in units of s−1) for T1 → S1 (a), T1 → S2 (b) and T1 → Sn (c) transitions. | ||
As shown in Scheme 1, even for the simplest radical pair without large substituents (CH3O˙⋯CH3O˙), there is a clear yet quite complex dependence of electronic properties (SOCME and the singlet–triplet energy gap) on the C–O˙⋯C–O˙ angle, with the angular dependence also varying with the O˙⋯O˙ distance. This demonstrates that the electronic parameters governing kISC are sensitive to key structural features of the radical pairs, hence kISC depends on those same geometric parameters. Altogether, this again confirms the existence of a nontrivial relationship between kISC and structural displacements, that cannot be reduced to simple functions like sine, cosine, or exponential functions, as suggested in some prior works.2,18–24 At the same time, this dependence guarantees that machine-learning methods can be applied effectively, provided that a sufficiently large and diverse dataset is available.
We trained and tested our models using two types of structural descriptors. The first is a custom descriptor we developed, which combines selected angles and distances within atoms in a cluster. We selected atoms with the localization of MOs responsible for T1 → S1, T1 → S2, T1 → S3 and T1 → S4. The second descriptor is the many body tensor representation (MBTR), computed using DScribe v.2.1.0.38,39 MBTR is a large structural descriptor that captures three-dimensional structures comprehensively. Compared to our custom descriptor, MBTR is computationally intensive and less interpretable. Here, we include MBTR to benchmark the performance of our custom descriptor. Below, we provide further details on the construction of both our custom descriptor and MBTR.
Our optimal custom descriptor consists of 53 structural features, which were constructed considering the atoms denoted in Fig. 4 for one radical pair. The following atoms were selected: spin-carrying oxygen atoms Ou1 and Ou2, the carbon atoms connected to them (C1 and C2), the atoms connected to C1 and C2 – carbon A1, A2, B1, B2, and hydrogens H1 and H2, as well as oxygens Op1 and Op2 without spins and the hydrogens attached to them (H12 and H22). The resulting selected features include 31 distances, 12 angles, 3 torsional angles, 4 minimum atomic distances between radical pair atoms, and 2 hydrogen bond counts. The first hydrogen bond count only includes the hydrogen bonds formed with Ou1 and Ou2, while the second hydrogen bond count represents the total number of hydrogen bonds in the cluster. Certain geometric features were not applicable to all clusters due to missing atom types. In those cases, the missing features were then assigned −1.
|  | ||
| Fig. 4 Enumeration of atoms used to create custom geometric features. Here is one example configuration of 3(HOBuO)2. The molecular descriptors are also given in Tables 4 and 5. | ||
Our second descriptor, MBTR, encodes the geometric properties of molecular structures through a continuous representation of pairwise and angular relationships between all atoms of a molecule at different many-body levels. For level 1, atomic species are represented by Gaussians corresponding to atomic numbers along a designated axis. In level 2, inverse distances between atom pairs are represented by Gaussians along a distance axis, reflecting the geometric structure of the atomic system. Level 3 encodes angular information by Gaussians on an axis corresponding to the cosine of angles between atom triples. To reduce memory requirements, all numerical values in the MBTR computations were converted to float 32 data types.
In this study, we employed only the level 2 and level 3 MBTR kernels to create a concatenated descriptor. For level 2, we used inverse distances within an interval of [0, 1.5] (unit Å−1), while for level 3, we represented interatomic angles by their cosine values over an interval of [−1, 1]. No additional weighting was applied to either kernel. We optimized two MBTR hyperparameters with random search: the smoothing parameter (σ), which affects Gaussian localization, and the number of discretization points (n), which determines resolution. σ was sampled uniformly between 0.005 and 0.015, and n between 40 and 100, for both kernels independently.
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 20 into training and testing sets, using five different random states, which are defined in the code provided in the ESI.†
20 into training and testing sets, using five different random states, which are defined in the code provided in the ESI.†
        For model performance evaluation, we used a set of metrics including RMSE, the coefficient of determination (R2) and the mean absolute error (MAE). Model performance was evaluated by averaging across five runs and calculating the standard deviations of all performance metrics. All computed average metrics for each model and each predicted feature are provided in the Tables S1–S6 (ESI†). The predicted vs. actual plots presented in the article are based on the identical train-test splits with the random seed number is 339![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 087.
087.
For each train/test split corresponding to a specific random seed and for each model using the custom descriptor set, hyperparameter optimization was performed using grid search cross-validation on the training set with 3 folds, repeated 3 times. Instead of selecting the hyperparameters that yielded the best metrics, one of the configurations with a MAE differing by no more than 0.01 from the best-performing set was chosen for further analysis. This approach ensures greater model robustness and prevents overfitting to a specific dataset split while maintaining nearly optimal performance, which is particularly important for future applications of the model to molecules that were not included in the training dataset. The hyperparameters that produced the best metrics, as well as those with performance close to the best, along with the corresponding metrics for the selected configuration, are presented in Tables S1–S3 (ESI†) for the RF, NN and CB models, respectively. In contrast, for the MBTR-based model, hyperparameter optimization was conducted using randomized search combined with k-fold cross-validation using 3 folds, sampling 10 parameter configurations with the negative mean squared error as the loss function. The corresponding code is provided in the ESI.†
The first model is a RF regressor from scikit-learn version 1.2.0.45 We used RF with both MBTR and our custom descriptor to evaluate the relative performance of the custom descriptor. For the custom descriptor, hyperparameter optimization of the RF model was performed for the number of estimators, tested at 1000, 1250, and 1500, and tree depth, tested at 10, 20, and 30. The optimal values were 1250 estimators and a tree depth of 30. The dependence of the evaluation metrics on hyperparameters is presented in Table S1 (ESI†). For MBTR the RF hyperparameters—number of estimators (50 or 130) and maximum tree depth (10 or 20)—were optimized using random search (MBTR) combined with k-fold cross-validation. The optimal parameters, a tree depth of 20 and 130 estimators, were determined through this process. During model training with MBTR, the random search sampled 10 parameter configurations within the specified intervals, using negative mean squared error (−MSE) as the loss function.
The second model is a NN constructed using Keras.46 The network architecture and hyperparameters were optimized for one to four hidden layers, each containing 50, 100, 150, or 200 neurons, with ReLU activation functions and MSE loss function. The output neuron uses a linear activation function to predict the target variable. The batch size was varied between 16 and 32, the number of epochs was either 50 or 100 and learning rate was 0.001, 0.01 and 0.1. The final selection of hyperparameters is two hidden layers each containing 200 neurons with batch size 32, epochs 100 and learning rate 0.001. The hyperparameter optimization results for NN are summarized in Table S2 (ESI†).
The third model is CB regressor,47 which is designed for GPU training, enabling faster results. The optimal hyperparameters for CB were found within the following values: the number of trees 1000, 1250 and 1500, the learning rate from 0.05 to 0.25, and the tree depth from 8 to 10. The final optimized values were 1250 trees, a learning rate of 0.1, and a tree depth of 9. The hyperparameter optimization results for CB are summarized in Table S3 (ESI†).
Also, CB was chosen for feature selection as it provided the best performance metrics among the tested models. Feature selection in combination with the CB model was performed using the recursive feature elimination (RFE)48 algorithm implemented in the scikit-learn library45 for each of considered ISC rate constants. This method was applied exclusively to the custom descriptor set. The RFE process was conducted within a 3-fold cross-validation repeated 3 times, ensuring robust feature ranking. Instead of a single RFE run, feature rankings were obtained across all cross-validation folds, and the final ranking score for each descriptor was determined by averaging the rankings from all iterations.
Following feature ranking, an iterative descriptor evaluation was performed to determine the optimal subset of descriptors to predict any of the considered ISC rate constants. The descriptors were ranked based on their RFE scores, and models were trained sequentially with an increasing number of descriptors, starting from the highest-ranked feature. Performance was assessed with cross-validation performed using 3-fold repeated 3 times. The final optimal descriptor set was obtained by eliminating descriptors that were not informative across all considered ISC rate constants, ensuring a more generalizable and efficient feature representation. The final descriptor rankings are listed in Tables S4–S8 (ESI†) for prediction of T1 → Sn, T1 → S1, T1 → S2, T1 → S3, T1 → S4 correspondingly. The corresponding implementation details and code are provided in the ESI.†
| T1 → S1 | T1 → S2 | T1 → S3 | T1 → S4 | T1 → Sn | |
|---|---|---|---|---|---|
| MSE | 0.99 ± 0.010 | 0.47 ± 0.015 | 0.80 ± 0.017 | 1.46 ± 0.023 | 0.38 ± 0.009 | 
| RMSE | 0.99 ± 0.005 | 0.69 ± 0.011 | 0.89 ± 0.010 | 1.21 ± 0.010 | 0.62 ± 0.007 | 
| MAE | 0.76 ± 0.004 | 0.46 ± 0.004 | 0.58 ± 0.005 | 0.84 ± 0.007 | 0.42 ± 0.004 | 
| R 2 | 0.89 ± 0.002 | 0.83 ± 0.007 | 0.91 ± 0.002 | 0.77 ± 0.003 | 0.83 ± 0.006 | 
| T1 → S1 | T1 → S2 | T1 → S3 | T1 → S4 | T1 → Sn | |
|---|---|---|---|---|---|
| MSE | 0.89 ± 0.008 | 0.38 ± 0.010 | 0.63 ± 0.009 | 1.04 ± 0.010 | 0.31 ± 0.005 | 
| RMSE | 0.95 ± 0.004 | 0.62 ± 0.008 | 0.79 ± 0.006 | 1.02 ± 0.005 | 0.56 ± 0.004 | 
| MAE | 0.72 ± 0.003 | 0.42 ± 0.003 | 0.54 ± 0.003 | 0.73 ± 0.003 | 0.38 ± 0.002 | 
| R 2 | 0.90 ± 0.001 | 0.87 ± 0.005 | 0.93 ± 0.002 | 0.84 ± 0.001 | 0.86 ± 0.004 | 
| T1 → S1 | T1 → S2 | T1 → S3 | T1 → S4 | T1 → Sn | |
|---|---|---|---|---|---|
| MSE | 1.11 ± 0.013 | 0.52 ± 0.018 | 0.91 ± 0.019 | 1.52 ± 0.028 | 0.42 ± 0.005 | 
| RMSE | 1.05 ± 0.006 | 0.72 ± 0.013 | 0.96 ± 0.010 | 1.23 ± 0.011 | 0.65 ± 0.004 | 
| MAE | 0.80 ± 0.002 | 0.49 ± 0.007 | 0.65 ± 0.014 | 0.87 ± 0.008 | 0.45 ± 0.003 | 
| R 2 | 0.88 ± 0.002 | 0.82 ± 0.009 | 0.90 ± 0.002 | 0.76 ± 0.003 | 0.81 ± 0.004 | 
The RF model has R2 values of 0.89 and 0.82 for kISC(T1 → S1) and total kISC(T1 → Sn), respectively. The best results are obtained with CB, where the mean MAE across the different runs is 0.90 and 0.86 for kISC(T1 → S1) and total kISC(T1 → Sn). Also, CB predicts some zero values correctly, unlike RF. This result can be explained by the fact that CB belongs to the family of gradient boosting models. Such models can better adapt to sparse data due to their flexibility and the ability to fine-tune their loss functions. We note that the NN model is also accurate (mean R2 = 0.81–0.88) for rate constant predictions, although it is slightly less accurate than CB (mean R2 = 0.86–0.90).
In this study, we assess the error of our ML models based on their ability to predict quantum chemistry-derived rate constants. This level of accuracy is comparable to that typically observed in quantum chemical calculations when compared to experimental data.49,50 The degree of precision is generally sufficient for predicting chemical reaction yields or photophysical processes,49 at least in cases where the rates of competing processes are substantially different.
We now consider the ISC from T1 to S2, S3, and S4. In Fig. 5, Tables 1–3, the best predictions for kISC(T1 → S2), kISC(T1 → S3), and kISC(T1 → S4) are achieved with CB, where the mean R2 is 0.87, 0.93 and 0.84, respectively. The prediction of zero values for all rate constants is particularly problematic, as it contributes to the large mean errors in overall predictive accuracy. Additionally, the endothermic nature of the electronic transitions from T1 to S2, S3, and S4 makes their rate constants highly sensitive to temperature and the energy gap, in contrast to kISC(T1 → S1).
| k ISC(T1 → S1) | k ISC(T1 → S2) | k ISC(T1 → S3) | k ISC(T1 → S4) | ||||
|---|---|---|---|---|---|---|---|
| Parameter | R 2 | Parameter | R 2 | Parameter | R 2 | Parameter | R 2 | 
| C1–C2 | 0.663 | ∠(B1,C1,Ou1) | 0.008 | ∠(A1,C1,Ou1) | 0.240 | B1–C1 | 0.134 | 
| min_dist | 0.755 | ∠(B2,C2,Ou2) | 0.031 | ∠(B2,C2,Ou2) | 0.459 | ∠(B2,C2,Ou2) | 0.289 | 
| Ou1–Ou2 | 0.802 | ∠(A1,C1,Ou1) | 0.066 | ∠(A2,C2,Ou2) | 0.608 | ∠(A2,C2,Ou2) | 0.394 | 
| min(C–C) | 0.817 | ∠(A1,C1,B1) | 0.077 | ∠(B1,C1,Ou1) | 0.667 | B2–C2 | 0.470 | 
| C1–Ou2 | 0.838 | ∠(A2,C2,B2) | 0.086 | C2–Ou2 | 0.725 | min_dist | 0.533 | 
| C2–Ou1 | 0.846 | C2–Ou2 | 0.091 | ∠(Ou1,C1,H11) | 0.768 | ∠(B1,C1,Ou1) | 0.577 | 
| B1–Ou2 | 0.855 | Ou1–Op1 | 0.103 | C1–Ou1 | 0.818 | ∠(A1,C1,Ou1) | 0.635 | 
| B2–Ou1 | 0.861 | ∠(A2,C2,H12) | 0.127 | ∠(A2,C2,H12) | 0.831 | ∠(A1,C1,B1) | 0.667 | 
| C1–Ou1 | 0.142 | min_dist | 0.850 | ∠(Ou1,C1,H11) | 0.683 | ||
| ∠(Ou2,C2,H12) | 0.397 | min(O–O) | 0.881 | Ou1–Ou2 | 0.719 | ||
| ∠(Ou1,C1,H11) | 0.500 | min(O–O) | 0.729 | ||||
| ∠(A1,C1,H11) | 0.559 | ∠(Ou2,C2,H12) | 0.744 | ||||
| min(O–O) | 0.663 | min(H–H) | 0.759 | ||||
| ∠(A2,C2,Ou2) | 0.692 | C1–C2 | 0.772 | ||||
| A2–C2 | 0.690 | Ou1–Op1 | 0.803 | ||||
| min_dist | 0.711 | Ou2–Op2 | 0.812 | ||||
| A1–C1 | 0.711 | ||||||
| Ou2–Op2 | 0.712 | ||||||
| k ISC(T1 → Sn) | |
|---|---|
| Parameter | R 2 | 
| ∠(A1,C1,Ou1) | 0.132 | 
| ∠(B2,C2,Ou2) | 0.402 | 
| ∠(B1,C1,Ou1) | 0.526 | 
| C1–Ou1 | 0.583 | 
| B1–C1 | 0.628 | 
| ∠(A2,C2,Ou2) | 0.693 | 
| C2–Ou2 | 0.715 | 
| ∠(A1,C1,B1) | 0.743 | 
| ∠(A2,C2,B2) | 0.764 | 
| Ou1–H21 | 0.787 | 
| ∠(B1,C1,H11) | 0.795 | 
| ∠(B2,C2,H12) | 0.803 | 
| B2–C2 | 0.808 | 
As shown in Table 4, the primary descriptors with a strong correlation for kISC(T1 → S1) are the internuclear distances between atoms associated with different radicals. Overall, the most significant general correlated parameter is the internuclear distance between radicals. Interestingly, the angles between C–O˙ bonds with spins do not play an important role here. At the same time, achieving R2 = 0.861 requires only 8 descriptors. In contrast, for kISC(T1 → S2), achieving R2 = 0.7112 requires 18 descriptors, most of which are intramolecular distances within one radical. Similarly, for kISC(T1 → S3) and kISC(T1 → S4), intramolecular distances serve as key molecular descriptors. For kISC(T1 → Sn), the main descriptors are almost the same as for kISC(T1 → S2), with the addition of the distance between the radicals.
Previous studies have shown that kISC(T1 → S1) mainly depends on the distance between radicals and is only slightly influenced by angles.24 Our ML results confirm this observation. However, the overall ISC rate kISC(T1 → Sn) is actually dominated by kISC(T1 → S2), which depends on the relative orientation of the two radicals in a more complex manner. Furthermore, the rate constants kISC(T1 → S2), kISC(T1 → S3), and kISC(T1 → S4) correspond to non-spontaneous (reverse) ISCs, i.e., the final state is higher in energy than the initial state. It is worth noting that calculating reverse ISC rate constants is challenging even for quantum chemical methods,49 as they strongly depend on the energy gap via the Boltzmann factor. The typical accuracy of quantum chemical calculations for energy gaps, around 0.1–0.3 eV, is often insufficient for rate constant calculations within one order of magnitude, resulting in deviations of 2–3 orders of magnitude.49,50 Therefore, the ML models give results comparable to quantum chemical calculations but much faster. Although we investigated and confirmed this relationship for alkoxy radical pairs, given the fundamental nature of this dependence, it should also hold for any radical pairs – with the caveat that specific descriptors may be needed to describe ISCs to higher singlet states (e.g. S2⋯S4).
Although our ML models do not provide a comprehensive analytical relationship between the ISC rate constant and molecular structure, they can be used to identify important structural features for rate constant prediction and thus improve our understanding of the ISC process. The ML model can also serve as a rapid estimation tool for kISC, without the need for quantum chemical calculations, with its generalizability constrained only by the scope of its training data.
While the present model achieving average order-of-magnitude accuracy will already be extremely useful for atmospheric modelling purposes, we anticipate that ongoing ML method development, for example within the SPAINN51 framework where molecular dynamics is used for data generation, will allow even greater predictive accuracy.
| Footnote | 
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5cp01101a | 
| This journal is © the Owner Societies 2025 |