Theoretical study on the analyzability of modified convex regression for radical reaction
Received
14th October 2025
, Accepted 3rd January 2026
First published on 6th January 2026
Abstract
Analyzing data and extracting meaningful insights is essential across various research fields. To address acrylate and methacrylate radical reaction data, we propose a modified convex clustering (regression) method, in which representative points are directly selected from the training data to describe the dataset. Although machine learning (ML) models are often regarded as black boxes, making their predictions difficult to interpret, the (modified) convex clustering approach allows for straightforward analysis of model behavior. This study emphasizes the importance of selecting representative points to enhance the interpretability and transparency of ML models. We demonstrate that radical reaction energy barriers can be effectively described and predicted based on the contributions of similar reactions. The simplicity and transparency of the modified convex clustering (regression) method enable in-depth analysis of physicochemical data.
1. Introduction
Machine learning (ML) approaches, combined with large-scale molecular databases, have gained considerable prominence across various chemical fields, including functional molecules, drug discovery, and materials science.1–21 However, despite its advantages, ML presents certain drawbacks. For instance, it is often difficult to comprehend how ML models operate, largely due to the complexity and lack of transparency in their prediction processes. Predictions based on correlations rather than causal relationships can further hinder interpretability. Although ML models excel at capturing multidimensional and nonlinear correlations, such complexity often exceeds human intuitive understanding, making it challenging to grasp underlying mechanisms. This lack of interpretability has limited the broader application of ML, particularly in extracting meaningful insights from physicochemical data. Although explainable artificial intelligence techniques—such as permutation importance and SHAP (SHapley Additive exPlanations)—are sometimes employed, the inherent complexity of ML models continues to obscure a comprehensive understanding of their behavior.22–27 To address this issue, we focus on algorithms that facilitate interpretability. Particularly, we highlight the convex clustering algorithm, which offers several advantageous features, such as sophisticated data clustering, soft assignment, and direct representative point selection from training data.28
In a convex clustering algorithm, the distributions of all clusters (or classes) are defined a priori by a single shared parameter, resulting in uniform distribution sizes. In this study, we slightly modify the algorithm by relaxing this constraint: the distribution size of each cluster is governed by its own parameter, which is automatically determined during the model training process. The modified clustering process is categorized as a soft assignment method, where each data point can partially belong to multiple classes, with probabilistic ratios representing its degree of membership. Further, we employ the modified clustering method to perform regression on radical reaction data for acrylate (ACR) and methacrylate (MA). Radical reactions involving ACR and MA are widely used in synthesizing various acrylic polymers, such as plastics, adhesives, paints, medical materials, and fibers.29–31 To theoretically elucidate radical reaction mechanisms, transition state (TS) analyses based on density functional theory (DFT) are indispensable. DFT-based TS calculations have provided insights into reaction processes,32,33 and ML has been recently employed to predict complex features such as energy barriers and regioselectivity in radical reactions.34,35 We develop ML models to predict the energy barriers of ACR and MA radical reactions using a dataset derived from DFT calculations, and demonstrate that the number of representative points governs both the resolution of data reproduction and the prediction accuracy. This study highlights the simplicity of the convex clustering (regression) approach, particularly in selecting representative points from training data, as a key factor in enhancing the interpretability and transparency of ML models for analyzing physicochemical data.
The remainder of this article is organized as follows. In Section 2, we describe the modified convex clustering algorithm. Section 3 presents the fundamental behavior of the algorithm based on a simple dataset as well as its application to ACR and MA radical reaction data. Finally, Section 4 summarizes this study.
2. Methods and computational conditions
2.1. Convex clustering method
Convex clustering is an unsupervised ML algorithm designed to assign data points to clusters. It is theoretically grounded in the Gaussian mixture model (GMM) and is classified as a soft assignment method, wherein each data point is probabilistically associated with multiple clusters.36,37 Despite this probabilistic framework, a definitive cluster assignment can be made by selecting the cluster with the highest associated probability. Meanwhile, hard assignment algorithms, such as K-means clustering, assign each data point to only one cluster.36,37 We present a modified K-means clustering algorithm (K-near) in Section S1 of the SI as a representative hard assignment approach. Although the modified convex clustering and K-near methods represent soft and hard assignment techniques, respectively, both select representative points from the training data to describe the underlying data structure or class distribution.
First, we describe the GMM to highlight the distinctive characteristics of the convex clustering method. We consider a dataset X = {x1, x2, …, xN} comprising N data points, where each data point is represented by a d-dimensional vector. We assume that each data point is generated from a single class through probabilistically independent sampling (trials), although the specific class from which each point originates is unknown. Under these assumptions, the probability of observing the dataset, denoted as pθ(x1, …, xn), can be expressed as follows:
| |
 | (1.1) |
| |
 | (1.2) |
| |
 | (1.3) |
where
θ represents parameters of the probabilistic model,
pθ(
xk) denotes the probability of observing a data point of
xk,
pθi(
xk|
Πi) denotes the conditional probability of observing the data point
xk given that the class
Πi was selected under the parameter
θi, π
i denotes the probability (or proportion) associated with class
Πi, and
C denotes the total number of classes.
Eqn (1.3) indicates that the sum of all class proportions equals 1.0. Based on the assumption of probabilistic independence, the probability
pθ(
x1, …,
xn) can be expressed as the product of the probabilities for each data point. In the GMM framework, we assume the mean vector
μi and the covariance matrix
Σi for class
Πi with respect to
θi. Accordingly,
pθi(
xk|
Πi) can be computed as follows:
| |
 | (2) |
Given the number of classes
C, the parameters π
i,
μi, and
Σi can be determined using the expectation–maximization (EM) algorithm by maximizing the log-likelihood of log
pθ(
x1,…,
xn).
36,37 Notably, the number of classes
C becomes a predefined parameter (hyperparameter) in the GMM. Meanwhile, the convex clustering method allows for automatic determination of
C, although the covariance matrix
Σi must be specified in advance.
In the convex clustering algorithm, the representative point μi is selected from a point xi in the training dataset, and Σi is assumed to be a simple diagonal matrix, as follows:
where
Id is the
d-dimensional identity matrix. From
eqn (3.2), all classes share the same distribution, which is controlled by the hyperparameter
σ. Under these conditions, the relationship
pθi(
xk|
Πi) =
pxi,σ(
xk|
Πi) =
fi,σ(
xk) can be derived using the following equation:
| |
 | (4) |
Therefore, the log-likelihood of the convex clustering method can be explicitly expressed as follows:
| |
 | (5) |
The EM algorithm provides a self-consistent procedure to determine π
i. The log-likelihood (
eqn (5)) increases monotonically and converges through an iterative loop in which the values of π
(next)i at each step are updated based on the current π
i values as follows:
| |
 | (6.1) |
| |
 | (6.2) |
where
P(
Πi|
x) denotes the probability of class
Πi given that the data point
x was observed. In the self-consistent loop, several data points acquire very small π
i values. Points with π
i values below a certain threshold are excluded from the set of representative candidates; thus, they no longer contribute to the clustering process. To exclude such data points, their π
i values are set to zero. As a result, the number of representative points gradually decreases during the self-consistent loop. The remaining points after the convergence are used as the final representatives for clustering. Notably, the number of remaining points, which corresponds to the number of clusters, depends on the hyperparameter
σ. For example, when a small value is assigned to
σ, a larger number of data points tend to survive the self-consistent procedure.
2.2. Modified convex clustering method
In the convex clustering method, the hyperparameter σ uniformly defines the distributions of all clusters, as seen in eqn (3.2). In this study, we modify this distributional assumption as follows:eqn (7) allows each class to have its own distribution corresponding to its specific σi. Thus, the conditional probability pθi(xk|Πi) can be expanded as follows:| |
 | (8) |
where pθi(xk|Πi) = pxi,σi(xk|Πi) = fi,σi(xk). Therefore, the conditional probability Pi(x) is slightly modified as follows:| |
 | (9) |
To determine the σi value, we examine the following conditions:| |
 | (10) |
where Lil denotes the distance between representative points xi and xl. Accordingly, σi is set to half the distance to the nearest class. In addition, σi is constrained to lie between a minimum threshold σmin and a maximum threshold σmax. The number (or granularity) of classes can be controlled by adjusting these threshold values.
Besides the modification given by eqn (7), we performed a purification process to further reduce the number of clusters (i.e., representative data points; Fig. 1a). In the convex clustering method, representative candidates are typically removed based on a threshold applied to πi. However, some redundant points may remain in the cluster representation even after the self-consistent procedure. To eliminate such redundant points, we introduce a purification process based on the condition
. When this condition is satisfied, even data point xj does not yield the maximum probability for its associated class Πj. Such classes contribute only marginally to the clustering and can be removed without significantly affecting the overall model behavior. We apply this purification condition to simplify the clustering model by removing redundant representative points.
 |
| | Fig. 1 Computational flow of the modified convex clustering method. | |
2.3. Regression based on convex clustering
The (modified) convex clustering method discussed in the previous section can be easily expanded to perform regression. In regression tasks, we typically consider two types of data, explanatory variables (feature vectors) X and target values Y, as follows:| | |
X = {x1, x2,…, xN}
| (11.1) |
| | |
Y = {y1, y2,…, yN}.
| (11.2) |
As the first step in constructing a regression model based on the (modified) convex clustering method, a clustering model Mclustering(x) is built using only explanatory (feature) data X. For a given point x, a set of probabilities over representative classes can be obtained using eqn (9) as follows:| | |
Mclustering(x) → {P1(x), P2(x),…, PN(x)}.
| (12) |
Here, the relation
holds. Based on the probabilities, we can predict a target value ypredict for data point x as follows:| |
 | (13) |
Regression based on convex clustering is simple, making it easy to analyze the behavior of the prediction process. We discuss such an analysis in relation to radical reactions in Section 3.
Here, we discuss the similarities and differences between the (modified) convex clustering method and the k-nearest neighbor (k-NN) algorithm. The k-NN algorithm predicts an unknown data point by selecting the k nearest neighbors from the training dataset and determining the outcome based on this local information. For classification tasks, the class is assigned by majority voting among the neighbors, whereas for regression tasks, the prediction typically involves averaging the neighbors’ values or applying distance-based weighting. The k-NN method usually requires storing all training data and computing distances to every data point during prediction, which increases computational cost and model complexity as the dataset grows. In contrast, the convex clustering method represents the dataset using representative points selected from the training data, modeled as a mixture of multiple Gaussian distributions. Only these representative points, along with their mixing ratios, are retained in the model and used during prediction, thereby reducing complexity. Although the convex regression approach differs theoretically from the non-parametric k-NN method, both share the characteristic of directly leveraging training data for prediction, contributing to an intuitive understanding of the prediction process.
2.4. Computational conditions
In this study, we analyzed chemical data related to polymer radical reactions using ML. The radical reaction data were generated through DFT calculations. The structures of the reactants, products, and transition states were optimized using the B3LYP functional with Grimme's empirical dispersion correction and the 6-31+G* basis set (B3LYP-D3/6-31+G*).38–40 All DFT calculations were performed using the Gaussian16 software package.41 For convenience, a summary of the dataset is provided in Section S2 of the SI, and more detailed descriptions can be found in the literature.6 We implemented the modified convex clustering (regression) method in Python.42 The machine learning analysis based on the random forest (n_estimators = 30), kernel ridge (alpha = 1.0), and k-NN algorithms36 was carried out using the scikit-learn library (version 1.5.1).43
3. Results and discussions
3.1. Basic behavior of the modified convex clustering method
We analyze the behavior of the modified convex clustering method using a simple two-dimensional dataset with 1000 points (Fig. 2a). The points were randomly generated from five normal distributions with means at (1, 6), (5, 2), (6, 9), (8, 5), and (10, 8) and the standard deviations of 0.9, 0.8, 1.0, 1.1, and 1.2, respectively. We applied the modified convex clustering method to this dataset. We employed a maximum threshold value of σmax = 3. We show clustering results with δmin = 1.0, 0.8, and 0.5 in Fig. 2b–d, respectively. In these figures, star markers indicate representative points, and orange-dotted circles indicate the class distributions (σi). For δmin = 1.0, five representative points are obtained from the convex clustering method (Fig. 2b). The means of these representatives are at (0.91, 5.86), (4.94, 2.10), (6.08, 9.03), (8.23, 4.97), and (10.417, 8.30). These representative points are included in the training dataset in Fig. 2a. When smaller δmin values are used, data points are partitioned into more narrowly defined classes. For example, δmin = 0.8 and 0.5 result in 7 and 12 classes, respectively. In the convex clustering method, the number of clusters is automatically determined based on δmin. This hyperparameter sets the lower bound for the class distribution; thus, smaller values lead to finer-grained clustering. In other words, a smaller δmin value allows the dataset to be covered by more compact clusters. Conversely, a larger δmin value results in a coarser representation of the data. Thus, we can control the density (granularity) of the clusters by adjusting δmin.
 |
| | Fig. 2 Clustering results for a simple dataset based on the modified convex clustering algorithm. Here, panel (a) depicts the data generated from five normal distributions prior to clustering, while panels (b), (c), and (d) present the clustering results obtained using σmin = 1.0, 0.8, and 0.5, respectively. In the (modified) convex clustering method, the number of clusters is automatically determined by the hyperparameter σmin. As σmin decreases, the number of clusters increases. | |
3.2. Chemical reaction data analysis based on the convex regression method
In this section, we analyze ACR and MA radical reactions (Fig. 3a) using the modified convex regression method. Previously, we presented radical reactions involving various combinations of ACRs and MAs (Fig. 3b) and calculated their reaction barriers using the DFT method.6 The chemical dataset was analyzed using a modified convex clustering approach. The reaction energy barriers (ΔETS) for radical reactions were predicted using the convex regression method based on eqn (13), with product–reactant energy difference (ΔERP) and a dummy parameter (DP(X)) as explanatory variables (Fig. 3c). Here, the dummy parameter DP(m) represents either ACR or MA for the radical reaction X˙ + Y → XY˙, where m specifies X or Y: DP(X) = 0 (1) indicates X = ACR (MA). The values are also summarized in SI. We standardized the input features to train ML models. Table 1 presents the regression results obtained by varying the hyperparameter σmin. To evaluate the performance of the regression models, we used the 5-fold cross-validation to calculate the mean absolute error (MAE).36 The table also includes the coefficient of determination (R2). Notably, smaller MAE values can be obtained when the training dataset is used directly as input. Fig. 4 shows a comparison of the DFT-calculated reaction energy barriers with prediction results from the regression model for σmin = 0.0075. The results confirm that the regression model tends to perform better when smaller values of σmin are used. As discussed in the previous section, σmin controls the density of clusters (or classes) covering the data, with the number of clusters increasing as the value decreases, and hence the regression model can more accurately predict reaction barriers. For example, MAE values of 0.39 and 0.52 kcal mol−1 were obtained for σmin = 0.015 and 0.03, respectively. Conversely, there is no established method for determining the optimal value of the parameter in convex clustering. Therefore, similar to standard hyperparameter tuning, the parameter needs to be adjusted, and the model's behavior evaluated to identify a suitable value. According to the results in Table 1, the regression performance tends to level off as the σmin value decreases, indicating that the parameter should be chosen with consideration for model complexity to avoid overfitting. Developing a more systematic approach to selecting σmin in convex clustering may be an important topic for future research.
 |
| | Fig. 3 (a) Scheme of the radical reaction between ACR and MA. (b) Reactant monomers of acrylic acid, ACR, methacrylic acid, and MA. We categorize acrylic and methacrylic acids as ACR and MA, respectively. (c) Energy diagram for radical reaction X˙+ Y → XY˙, where X˙ represents the radical monomer. | |
Table 1 Predictive performance of the convex regression method for energy reaction barrier based on cross-validation
| σmin |
MAE [kcal mol−1] |
R2 |
| 0.04 |
0.58 |
0.67 |
| 0.03 |
0.52 |
0.74 |
| 0.02 |
0.49 |
0.80 |
| 0.015 |
0.39 |
0.86 |
| 0.0075 |
0.30 |
0.93 |
 |
| | Fig. 4 Comparison between DFT calculations and ML predictions for reaction energy barriers [kcal mol−1]. | |
Here, we compare the results obtained from convex clustering regression with kernel ridge, random forest, and k-NN algorithms. For kernel ridge regression, the MAE and R2 were 0.44 kcal mol−1 and 0.83, respectively, when using a linear kernel. With a radial basis function kernel, the MAE and R2 improved to 0.36 kcal mol−1 and 0.89. The random forest method achieved an MAE of 0.36 kcal mol−1 and an R2 of 0.88. For the k-NN method with 3, 5, and 7 neighbors, the MAEs were 0.31, 0.30, and 0.33 kcal mol−1, and the corresponding R2 values were 0.92, 0.91, and 0.89. In comparison, the convex regression method with σmin = 0.0075 yielded an MAE of 0.30 kcal mol−1 and an R2 of 0.93. These results indicate that the convex regression method provides predictive performance comparable to, and in some cases slightly better than, other machine learning approaches for the dataset analyzed in this study.
In the modified convex regression method, representative points are selected from the training dataset and stored as internal variables (or states) within the model. Fig. 5 shows several representative points with relatively large πi values, stored in the prediction model for σmin = 0.0075, using a chemical reaction representation. ML models often behave like black boxes, making it difficult to interpret how predictions are made. However, in the modified convex regression method, predictions are based on representative points that originate from the training dataset. This characteristic allows for straightforward analysis of model behavior. For instance, to understand the prediction of reaction energy barriers, we can examine the contribution of each representative point. Fig. 6 shows the contributions for several predictions. For example, the DFT-based TS calculation yielded a barrier of 6.55 kcal mol−1 for the radical reaction between methyl MA (compound 4 in Fig. 3b) and γ-butyrolactone MA (compound 8 in Fig. 3b), whereas the convex regression model predicted a barrier of 6.39 kcal mol−1. This prediction was primarily influenced by two reactions: the radical reaction between methyl MA (compound 4) and methacrylic acid (compound 2) yielded the contribution of 76.4%, and the reaction between t-butyl MA (compound 6) and MA (compound 4) yielded the contribution 23.4%, (Fig. 6a). Similarly, for the reaction between ethyl-cyclohexyl ACR (compound 9) and ethyl-cyclohexyl MA (compound 10), the DFT calculation yielded a barrier of 4.40 kcal mol−1, whereas the ML model predicted 4.84 kcal mol−1. In this case, the reaction between ethyl-cyclohexyl ACR (compound 9) and methyl MA (compound 4) contributed 93.8%, and the reaction between γ-butyrolactone ACR (compound 7) and methacrylic acid (compound 2) contributed 3.89% (Fig. 6b). Thus, the convex regression method enables intuitive interpretation of ML prediction by analyzing the contributions of representative points.
 |
| | Fig. 5 Representative data stored in the convex regression model. We show some representative points with relatively large πi values using a chemical reaction representation. Panels (a), (b), and (c) depict the representative reactions between the following pairs of compounds: (5 and 3), (6 and 8), and (9 and 4), respectively. The explicit chemical structures of these compounds are provided in Fig. 3(b). | |
 |
| | Fig. 6 Contributions from representative points stored in ML models to predict energy barriers. Panels (a), (b), (c), (d), (e), and (f) depict the representative reactions between the following pairs of compounds: (4 and 8), (9 and 10), (8 and 10), (5 and 5), (9 and 10), and (1 and 5), respectively. The explicit chemical structures of these compounds are provided in Fig. 3(b). The reactions enclosed within the dashed boxes represent the representative points and their corresponding weights that were used for machine learning predictions. | |
In these reaction predictions, the machine learning model clearly focuses on similarities among reactant monomers. For example, methyl MA (compound 4) in Fig. 6a and ethyl-cyclohexyl ACR (compound 9) in Fig. 6b appear both in the target reactions and in the reactions that contributed most to the prediction. The model also assigned importance to whether the monomer belongs to MA or ACR when making predictions, as shown in Fig. 6. This distinction is critical in radical reactions because it determines the stability of the reactant and product radicals. For reactions involving relatively large side chains, the model tends to reference reactions with similarly large side chains, as these can influence radical behavior through steric effects. The machine learning model appears to account for these effects as well, which aligns with chemical intuition. Conversely, it is worth noting that machine learning using the convex clustering method does not predict reactions based on chemical understanding or causal relationships as researchers do, but rather relies solely on the similarity of reaction data. Nevertheless, analyzing the model's behavior in this way may help researchers extract chemical insights from the data. Improving prediction transparency could facilitate uncovering insights from machine learning analyses of chemical data.
The convex clustering method selects representative points along with their class proportions, making the model simpler and more efficient. A simple prediction process also helps improve understanding of the model's behavior. As shown in eqn (12) and (13), even when the dataset is small and an unseen data point is far from the representative points, the regression process remains influenced by the nearest representatives. This helps prevent extreme predictions and ensures stable behavior. Conversely, as with other machine learning methods, improving predictive performance requires expanding the training dataset. In addition, similar to distance-based models such as GMM and k-NN, feature selection is also critical for convex clustering. Dimensionality reduction and careful feature selection are essential for building effective models with this approach. To address this challenge, we are currently investigating a method that combines feature refinement with convex clustering. The results of this research will be reported elsewhere.
The complexity of ML algorithms leads to difficulties in understanding their predictive process by humans. To alleviate the incomprehensibility of ML, we focused on the selection process of representative points. In the convex clustering approach, representative points are selected from the training dataset and used to make predictions. By analyzing the contributions of these representative points, we can gain some insight into a model's behavior. Selecting representative points directly from the training dataset plays a valuable role in enhancing the transparency and interpretability of ML models. We considered the convex regression method as an example, which may serve as a guideline for developing ML algorithms with improved analyzability. In particular, chemical datasets often contain rich information, and incorporating this information directly into ML models can further enhance their interpretability and transparency.
4. Summary
In this study, we explored a modified convex clustering (regression) method, in which representative points used to describe classes are selected directly from the training dataset. In this approach, each class is associated with a parameter that defines its size (distribution), allowing for a flexible data representation. We demonstrated that data are described more coarsely when the number of representative points is smaller. Conversely, increasing the number of classes (clusters) enables a more fine-grained representation, which can enhance the predictive performance of ML models. However, this increased granularity induces greater model complexity. The number of representative points provides a means to control both model granularity and complexity. We applied the modified method to ACR and MA radical reaction data and constructed ML models to predict reaction energy barriers. Our results showed that prediction accuracy improves with the number of representative points. We also analyzed the prediction process by examining the contributions of individual representative points, where the energy barrier is estimated as a weighted sum of contributions from radical reactions. The model's behavior can be easily interpreted because the representative points are selected from the training dataset and directly used in predictions. We concluded that selecting representative points from the training dataset is a useful strategy for improving the interpretability and transparency of ML models. The simplicity and analyzability of the modified convex clustering (regression) method make it a promising tool for deeper investigation of chemical and scientific data.
Conflicts of interest
There are no conflicts of interest to declare.
Data availability
The data supporting this article have been included as part of the supplementary information (SI). Supplementary information: We described (S1) the K-near clustering method and (S2) the radical reaction dataset in another PDF file. See DOI: https://doi.org/10.1039/d5cp03946k.
Acknowledgements
This work is partly supported by Grand-in-Aid for Scientific Research (KAKENHI) of the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Grand Number 21H00026 and 22K05038. This work was partly supported by MEXT as “Program for Promoting Researches on the Supercomputer Fugaku” (JPMXP1020230318). This study used computational resources from Supercomputer Center, the Institute for Solid State Physics, the University of Tokyo and Research Center for Computational Science, Okazaki, Japan.
References
- M. Nakata and T. Shimazaki, J. Chem. Inf. Model., 2017, 57, 1300–1308 CrossRef PubMed.
- M. Nakata, T. Shimazaki, M. Hashimoto and T. Maeda, J. Chem. Inf. Model., 2020, 60, 5891–5899 CrossRef PubMed.
- T. Shimazaki and M. Tachikawa, ACS Omega, 2022, 7, 10372–10381 CrossRef PubMed.
- T. Shimazaki and M. Tachikawa, Chem. Phys. Lett., 2023, 829, 140744 CrossRef.
- T. Shimazaki and M. Tachikawa, Chem. Phys. Lett., 2025, 861, 141830 CrossRef.
- M. Takagi, T. Shimazaki, O. Kobayashi, T. Ishimoto and M. Tachikawa, Phys. Chem. Chem. Phys., 2025, 27, 1772–1777 RSC.
- W. A. Warr, Mol. Inf., 2014, 33, 469–476 Search PubMed.
- J. Schmidt, M. R. G. Marques, S. Botti and M. A. L. Marques, npj Comput. Mater., 2019, 5, 83 CrossRef.
- J. A. Keith, V. Vassilev-Galindo, B. Q. Cheng, S. Chmiela, M. Gastegger, K. R. Mueller and A. Tkatchenko, Chem. Rev., 2021, 121, 9816–9872 CrossRef PubMed.
- K. Jorner, A. Tomberg, C. Bauer, C. Sköld and P. O. Norrby, Nat. Rev. Chem., 2021, 5, 240–255 CrossRef PubMed.
- A. Iskandarov, T. Tada, S. Iimura and H. Hosono, Acta Mater., 2022, 230, 117825 CrossRef.
- R. X. Wang, X. L. Fang, Y. P. Lu and S. M. Wang, J. Med. Chem., 2004, 47, 2977–2980 CrossRef PubMed.
- J. J. Irwin, T. Sterling, M. M. Mysinger, E. S. Bolstad and R. G. Coleman, J. Chem. Inf. Model., 2012, 52, 1757–1768 CrossRef PubMed.
- A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder and K. A. Persson, APL Mater., 2013, 1, 011002 CrossRef.
- S. Kim, P. A. Thiessen, E. E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Y. Han, J. E. He, S. Q. He, B. A. Shoemaker, J. Y. Wang, B. Yu, J. Zhang and S. H. Bryant, Nucleic Acids Res., 2016, 44, D1202–D1213 CrossRef PubMed.
- R. Jose and S. Ramakrishna, Appl. Mater. Today, 2018, 10, 127–132 CrossRef.
- C. Draxl and M. Scheffler, MRS Bull., 2018, 43, 676–682 CrossRef.
- J. S. Smith, O. Isayev and A. E. Roitberg, Sci. Data, 2017, 4, 170193 CrossRef PubMed.
- C. Isert, K. Atz, J. Jimenez-Luna and G. Schneider, Sci. Data, 2022, 9, 273 CrossRef PubMed.
- L. C. Yang, X. Li, S. Q. Zhang and X. Hong, Org. Chem. Front., 2021, 8, 6187–6195 RSC.
- J. E. Alfonso-Ramos, R. M. Neeser and T. Stuyver, Digit Discovery, 2024, 3, 919–931 RSC.
- A. Adadi and M. Berrada, IEEE Access, 2018, 6, 52138–52160 Search PubMed.
- D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf and G. Z. Yang, Sci. Rob., 2019, 4, 7120 CrossRef PubMed.
- P. Linardatos, V. Papastefanopoulos and S. Kotsiantis, Entropy, 2021, 23, 18 CrossRef PubMed.
- S. M. Lundberg, B. Nair, M. S. Vavilala, M. Horibe, M. J. Eisses, T. Adams, D. E. Liston, D. K. W. Low, S. F. Newman, J. Kim and S. I. Lee, Nat. Biomed. Eng., 2018, 2, 749–760 CrossRef PubMed.
- S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal and S. I. Lee, Nat. Mach. Intell., 2020, 2, 56–67 CrossRef PubMed.
- L. Breiman, Mach. Learn., 2001, 45, 5–32 Search PubMed.
- D. Lashkari and P. Golland, Adv. Neural Inf. Process. Syst., 2007, 20, 3181 Search PubMed.
- U. Ali, K. J. B. Abd Karim and N. A. Buang, Polym. Rev., 2015, 55, 678–705 CrossRef.
- S. C. Ligon, K. Seidler, C. Gorsche, M. Griesser, N. Moszner and R. Liska, J. Polym. Sci., Part A: Polym. Chem., 2016, 54, 394–406 CrossRef.
- N. Ballard and J. M. Asua, Prog. Polym. Sci., 2018, 79, 40–60 CrossRef.
- A. Debuigne, C. Michaux, C. Jérôme, R. Jérôme, R. Poli and C. Detrembleur, Chem. – Eur. J., 2008, 14, 7623–7637 CrossRef PubMed.
- I. Degirmenci, V. Aviyente, V. Van Speybroeck and M. Waroquier, Macromolecules, 2009, 42, 3033–3041 CrossRef.
- X. Li, S. Q. Zhang, L. C. Xu and X. Hong, Angew. Chem., Int. Ed., 2020, 59, 13253–13259 CrossRef PubMed.
- K. A. Spiekermann, X. R. Dong, A. Menon, W. H. Green, M. Pfeifle, F. Sandfort, O. Welz and M. Bergeler, J. Phys. Chem. A, 2024, 128, 8384–8403 CrossRef PubMed.
- T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learnin: Data Mining, Inference, and Prediction, Springer, New York, 2nd edn, 2009 Search PubMed.
- C. M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006 Search PubMed.
- A. D. Becke, J. Chem. Phys., 1993, 98, 5648 CrossRef.
- C. T. Lee, W. T. Yang and R. G. Parr, Phys. Rev. B: Condens. Matter Mater. Phys., 1988, 37, 785–789 CrossRef PubMed.
- S. Grimme, J. Antony, S. Ehrlich and H. Krieg, J. Chem. Phys., 2010, 132, 154104 CrossRef PubMed.
- M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery, J. E. P. Jr., F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman and D. J. Fox, Gaussian 16, Revision A.03, Gaussian, Inc., Wallingford CT, 2016 Search PubMed.
- G. V. Rossum and F. L. Drake, Python 3 Reference Manual, CreateSpace, Scotts Valley, 2009 Search PubMed.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay, J. Mach. Learn. Res., 2011, 12, 2825–2830 Search PubMed.
|
| This journal is © the Owner Societies 2026 |
Click here to see how this site uses Cookies. View our privacy policy here.