Density functional theory and machine learning for electrochemical square-scheme prediction: an application to quinone-type molecules relevant to redox flow batteries

Proton–electron transfer (PET) reactions are rather common in chemistry and crucial in energy storage applications. How electrons and protons are involved or which mechanism dominates is strongly molecule and pH dependent. Quantum chemical methods can be used to assess redox potential (Ered.) and acidity constant (pKa) values but the computations are rather time consuming. In this work, supervised machine learning (ML) models are used to predict PET reactions and analyze molecular space. The data for ML have been created by density functional theory (DFT) calculations. Random forest regression models are trained and tested on a dataset that we created. The dataset contains more than 8200 quinone-type organic molecules that each underwent two proton and two electron transfer reactions. Both structural and chemical descriptors are used. The HOMO of the reactant and LUMO of the product participating in the oxidation reaction appeared to be strongly associated with Ered.. Trained models using a SMILES-based structural descriptor can efficiently predict the pKa and Ered. with a mean absolute error of less than 1 and 66 mV, respectively. Good prediction accuracy of R2 > 0.76 and >0.90 was also obtained on the external test set for Ered. and pKa, respectively. This hybrid DFT-ML study can be applied to speed up the screening of quinone-type molecules for energy storage and other applications.


Data distribution
Table S1: Detailed descriptive statistics: mean, standard deviation (µ), minimum value (x min ), maximum value (x max ), lower quartile (Q 1 ), median, higher quartile (Q 3 ).There are 8214 samples (compounds) in the dataset that underwent different reactions., pK a , and E 0 are the calculated target variables.Mean, standard deviation (µ), minimum value (x min ), maximum value (x max ), lower quartile (Q 1 ), median, higher quartile (Q 3 ) provide detailed information on the distribution of data.LuMO have the most and less importance in E red.prediction.In each plot, thousands of individual points from the training dataset are plotted, with a higher value being more pink/yellow/red, and a lower value is more cyan/purple/blue.This is depicted by the feature value bar on the right of each plot.

Importance of MOs of reactants in training models
The importance of various feature variables extracted from reactants in training models can be seen in Figure S3.A SHAP value (impact on model output) 0 for a feature corresponds to the average prediction using all the other possible combinations of features except for the feature of interest.For instance, the SHAP value 0 for E SpinUp HOMO corresponds to the average prediction of models having different combinations of features (excluding E SpinUp HOMO ).SHAP value of 1 for a feature in Figure S3(a) means that the value of that feature increases the model's output by 1.Our results show that E SpinUp HOMO is the most important feature for training the models that predicted E red. and E 0 .Indeed, those orbitals placed at the edge of the vs. E 0 .The reactants involved in the ET and PET reactions, respectively, were used to extract the features used in (a) and (b).While the features employed in (c) and (d) were extracted from the products resulting from the ET and PET processes, respectively.

Correlation between key MOs and target variables
Figure S4 shows the correlation between the most crucial features (MOs) for predicting E red. and E 0 .The reactant's HOMO and the product's LUMO participating in the oxidation reactions are inversely correlated to E red. and E 0 .

Figure S1 :
Figure S1: List of compounds accompanied by the 2D chemical structure depiction.Compounds are numbered from 1 to 15.

Figure S2 :
Figure S2: Comparing the energy of the spin-up and spin-down channels for (a) HOMOs of reactants participating in oxidation (ET) reactions, (b) HOMOs of reactants participating in PET reactions, (c) LUMOs of products produced by oxidation (ET) reactions, and (d) LUMOs of products produced by PET reactions.

Figure
Figure S2 (c) and (d) illustrate similar plots for LUMOs of products from ET and PET reactions, demonstrating that orbitals in the spin-down channel serve as LUMO.

Figure S3 :
Figure S3: SHAP summary plot for elucidating the global feature influences on the (a) E red., (b) pK a , and (c) E 0 trained models.Feature E x y indicates the energy of x orbital in y spin state.The baseline, positioned at zero, equals to an average target value in each case.The SHAP value (impact on model output) indicates the impact of feature missingness on the model prediction.The importance of the feature increases from bottom to top, e.g., E SpinUp HOMO and E SpinUpLuMO have the most and less importance in E red.prediction.In each plot, thousands of individual points from the training dataset are plotted, with a higher value being more pink/yellow/red, and a lower value is more cyan/purple/blue.This is depicted by the feature value bar on the right of each plot.

Figure S4 :
Figure S4: Relationship between key features evaluated by SHAP and the target variables: (a) E SpinUp HOMO vs. E red., (b) E SpinUp HOMO vs. E 0 , (c) E SpinDown LUMO

Figure S5 :
Figure S5: SHAP summary plot for elucidating the global feature influences on the (a) E red., (b) pK a , and (c) E 0 trained models.Feature E x y indicates the energy of x orbital in y spin state.The baseline, positioned at zero, equals an average target value in each case.The SHAP value (impact on model output) indicates the impact of feature missingness on the model prediction.The importance of the feature increases from bottom to top.In each plot, thousands of individual points from the training dataset are plotted, with a higher value being more pink/yellow/red, and a lower value being more cyan/purple/blue.This is depicted by the feature value bar on the right of each plot.

Figure
Figure S5 illustrates how the target variables are related to the features of products.By taking into account a product's attributes, E red. and E 0 are inversely related to E SpinDown LUMO .

Figure S7 :
Figure S7: Square representation for 2,2PEAQ-H 2 oxidation reactions.On top, you see the structure of 2,2PEAQ.The horizontal direction indicates an ET reaction.Numbers are oxidation potential in V.The vertical direction indicates PT (acid/base reaction constant).Numbers are pK a which are unitless.The diagonal direction indicates proton-coupled electron transfer reduction potential (V).ML models were used to predict the numbers in parentheses, where ECFPs are descriptors.

Figure
FigureS7shows the schem of squares representations of 2,2PEAQ species undergoing the reaction below: