Open Access Article
Yuan Wang
ab,
Heng Liu
c,
Yusuke Hashimotod,
Kazuyuki Iwase
a,
Hao Li
*c and
Takaaki Tomai
*ad
aInstitute of Multidisciplinary Research for Advanced Materials, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai, 980-8577, Japan. E-mail: takaaki.tomai.e6@tohoku.ac.jp
bGraduate School of Engineering, Tohoku University, 6-6-11 Aramaki-aza Aoba, Aoba-ku, Sendai, 980-8579, Japan
cAdvanced Institute for Materials Research (WPI-AIMR), Tohoku University, Sendai, 980-8577, Japan
dFrontier Research Institute for Interdisciplinary Sciences, Tohoku University, Sendai, 980-8577, Japan
First published on 20th June 2026
Precise control over the particle size of metal–organic frameworks (MOFs) is pivotal for optimizing their performance in catalysis, separation, and drug delivery. However, conventional synthetic strategies largely depend on empirical trial-and-error, which lacks predictive power and fails to decode the complex interplay between nucleation and growth kinetics. Herein, we report a materials process informatics framework for the predictive control of MOF particle size, using zeolitic imidazolate framework-8 (ZIF-8) as a representative model system. A comprehensive database was constructed through systematic curation from the literature, with seven process descriptors employed as input features. Multiple machine-learning algorithms were benchmarked, among which the Categorical Boosting (CB) model achieved the best predictive performance after hyperparameter optimization, with a coefficient of determination (R2) of 0.90 on the test set. Furthermore, SHapley Additive exPlanations (SHAP) analysis identified the precursor concentration ratio and reaction time as the most influential parameters governing particle size. Experimental validation using an automated synthesis platform showed excellent agreement between predicted and measured particle sizes, confirming the model's robustness and predictive reliability. Overall, the proposed framework enables intelligent synthesis optimization and targeted experimental design, thereby providing a practical route toward controllable MOF synthesis. This work demonstrates how materials process informatics can shift MOF particle-size engineering from empirical optimization toward data-driven design, offering a broadly applicable strategy for advanced materials synthesis.
Zeolitic imidazolate framework-8 (ZIF-8), constructed from zinc ions (Zn2+) and 2-methylimidazole (2-HmIm) linkers, has emerged as one of the most representative MOF systems because of its remarkable chemical and thermal stability.8 Importantly, the performance of ZIF-8 is governed by complex structure–property relationships, in which particle size serves as a critical parameter. In general, smaller particles are favourable for catalysis and drug delivery, as they can increase surface accessibility and shorten diffusion distances.9–11 However, when ZIF-8 is used as a sacrificial template or precursor, excessively small particles may become structurally vulnerable during pyrolysis, thereby compromising the integrity and durability of the derived catalytic materials.12,13 In membrane-based separation, particle size also strongly affects interfacial compatibility and defect formation, both of which are closely related to separation performance.14,15 Therefore, precise control of ZIF-8 particle size during synthesis is essential for optimizing performance across diverse functional domains.
The synthesis of ZIF-8 is governed by a multidimensional process space defined by precursor concentration, ligand-to-metal ratio, reaction time, temperature, and other synthesis variables.16–19 Despite this complexity, current strategies for particle size control remain largely empirical, typically relying on iterative trial-and-error adjustment of synthesis conditions based on literature protocols or prior experience.20 Such approaches are not only time-consuming and resource-intensive, but also make it difficult to capture complex parameter interactions and non-linear dependencies that govern particle formation. Consequently, the precise and reproducible synthesis of ZIF-8 with target particle sizes remains a significant challenge.
Materials informatics has emerged as an important paradigm in modern materials science by accelerating the discovery and optimization of functional materials through the integration of data science and domain knowledge. Within this context, machine learning (ML) has become a particularly powerful approach because it enables predictive models to be constructed directly from experimental and computational datasets.21,22 In synthesis research, this concept further extends to materials process informatics, where multidimensional process variables are correlated with target material characteristics to guide rational process design.23,24 This paradigm has shifted materials research from trial-and-error experiments toward data-driven strategies, particularly in nanomaterial synthesis.25,26 Recent studies have demonstrated the potential of machine learning for predicting particle size from synthesis parameters. Liu et al. demonstrated that random forest models could accurately predict the phase and classify particle sizes into broad categories such as nano, sub-micron, and micron, while the model failed to predict the exact particle size.27 Pellegrino et al. developed a stacking ensemble model to estimate the size of ZnO nanoparticles, but the relatively small dataset used for training may limit the robustness of the resulting model.28 Zhang et al. further applied ML regression to Eu-MOF synthesis, but the sparsity of the dataset (27 samples) restricted the analysis mainly to the dominant variable and hindered the identification of more complex multivariate effects.29 In addition, machine learning has also been applied to the synthesis analysis of ZIF-8.30,31 Allegretto et al. proposed a unified roadmap for ZIF-8 nucleation and growth by analyzing how synthetic variables influence particle size and morphology under water- and methanol-based conditions, providing valuable insights into the synthesis mechanism of ZIF-8.31 Despite these advances, the development of more accurate predictive frameworks for experimentally guided particle size control across multidimensional synthesis conditions remains an important direction.
In this study, we develop a materials process informatics framework to predict the particle size of ZIF-8 as a function of synthesis parameters. Multiple regression algorithms are systematically benchmarked to determine the optimal predictive model. Model interpretability is established using SHapley Additive exPlanations (SHAP) to quantify the contribution of each process parameter and identify the key descriptors governing particle size. The predictive capability of the resulting model is further evaluated through independent synthesis experiments conducted by an automatic synthesis system. By integrating predictive modelling with feature-importance analysis, this study achieves exceptional size prediction and provides meaningful insight into the process–size relationship, thereby clarifying how key synthesis parameters govern particle size evolution. Overall, these results demonstrate the potential of machine learning to move MOF synthesis beyond empirical optimization toward more rational and data-guided process design.
000 rpm for 4 minutes to collect the solid product. The obtained precipitate was washed 3 times with methanol to remove unreacted precursors and then dried at 60 °C for 12 hours to yield the final ZIF-8 powder. The detailed process parameters for these experiments are summarized in Table 1. To enhance experimental precision and minimize deviation by human error, an automated synthesis system was employed. The detailed equipment specifications and photographs of the system are provided in Fig. S1. The particle size of these samples was measured using ImageJ software.
| Sample | CZn | C2-HmIm/CZn | Solvent amount (mL) | Reaction time (min) | Stirring | Solvent | Temperature (°C) | Predicted size (nm) | Actual size (nm) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.40 | 8 | 20 | 1440 | T | Methanol | 25 | 365 | 290 |
| 2 | 0.13 | 4 | 60 | 1440 | T | Methanol | 25 | 236 | 210 |
| 3 | 0.40 | 16 | 20 | 30 | T | Methanol | 25 | 184 | 204 |
| 4 | 0.13 | 8 | 60 | 120 | F | Methanol | 25 | 130 | 141 |
| 5 | 0.08 | 40 | 100 | 60 | T | Water | 25 | 344 | 292 |
| 6 | 0.20 | 16 | 140 | 720 | Initial | Methanol | 25 | 252 | 228 |
| 7 | 0.05 | 38 | 100 | 1440 | T | Water | 25 | 1372 | 1382 |
| 8 | 0.08 | 16 | 100 | 30 | T | Methanol | 25 | 78 | 61 |
Seven synthesis parameters were used as input features, including the zinc precursor concentration (CZn), ligand-to-metal concentration ratio (C2-HmIm/CZn), solvent type (water or methanol), solvent volume, reaction time, reaction temperature, and stirring conditions (continuous stirring, denoted as T; stirring only during the initial mixing step, denoted as Initial; and quiescent synthesis, denoted as F). These parameters were selected based on the fundamental mechanism of particle formation, which involves nucleation and crystal growth. In particular, CZn and C2-HmIm/CZn govern the degree of supersaturation and therefore strongly influence nucleation behaviour and the formation of primary nuclei. Solvent type also plays an important role because its physicochemical properties, such as dielectric constant, dipole moment, and van der Waals volume, affect solubility and surface energy, thereby modulating nucleation behavior.32 During the subsequent growth stage, particle size is further influenced by supersaturation, which determines the supply of building units and the rate of diffusion-limited growth, whereas reaction time governs the extent of growth and possible coarsening associated with Ostwald ripening.33 Temperature exerts a coupled effect on both thermodynamic and kinetic factors, including solubility, interfacial energy, diffusion, and attachment rate, thereby exerting a strong influence on particle size.34 In addition, solvent volume and stirring conditions were included because they influence mixing uniformity and mass transport, both of which are important for achieving homogeneous crystal growth.35
Before model training, the database was randomly divided into training and testing subsets using an 80
:
20 split ratio with a fixed random state of 60. Continuous variables were standardized to have a mean of 0 and a standard deviation of 1, thereby eliminating scale disparities among descriptors. Except for the CB model, categorical variables were converted into numerical features using one-hot encoding to ensure compatibility with machine learning algorithms. Hyperparameter optimization was conducted by a randomized search of the parameter space with 100 iterations combined with 5-fold cross-validation (CV).38 Model performance was evaluated using three standard regression metrics: the coefficient of determination (R2), root-mean-square error (RMSE), and mean absolute error (MAE).39
Fig. 2 presents the distributions of particle size and the seven synthesis descriptors. The dataset covers a relatively broad synthesis domain, with particle size spanning a wide range, indicating substantial variability in the reported ZIF-8 crystals under different preparation conditions. Pronounced right-skewed distributions are observed for CZn, C2-HmIm/CZn, and solvent volume, suggesting that most reported syntheses were conducted within relatively limited parameter ranges. Reaction temperature is strongly centred around 25 °C, reflecting the predominance of room-temperature synthesis in the literature. In contrast, reaction time exhibits a bimodal distribution, with one cluster at short durations on the order of tens of minutes and another extending to long durations exceeding 1000 min. For the categorical descriptors, methanol is used more frequently than water, and continuous stirring is much more common than either initial stirring only or quiescent synthesis. Collectively, these results indicate that the dataset spans a diverse but unevenly sampled synthesis space, reflecting the intrinsic heterogeneity of literature-derived experimental data. To prepare the dataset for machine learning, all continuous variables were standardized to remove differences in scale, whereas categorical variables were converted into numerical features through one-hot encoding.
To further elucidate the relationships among numerical process parameters and their association with particle size, a Spearman correlation matrix was constructed (Fig. 3a). Strong correlations between input variables (|ρ| > 0.8) are generally undesirable because they may introduce redundancy and reduce model robustness. In this database, most descriptors show only weak pairwise correlations (|ρ| < 0.3), indicating minimal collinearity and a high degree of statistical independence among the input variables. With respect to particle size, reaction time shows the highest positive correlation, followed by CZn and solvent volume, whereas temperature exhibits almost no linear correlation with particle size. These weak correlations indicate that particle size cannot be explained by any single descriptor alone, highlighting the complexity of the underlying synthesis–size relationship. The categorical effects are further visualized in Fig. 3b and c. Water-based syntheses tend to yield larger particles with a broader size distribution than methanol-based systems. In addition, continuously stirred systems show a wider spread in particle size than quiescent or initially stirred conditions. Overall, these results highlight the importance of machine learning for capturing the coupled effects of multiple synthesis parameters on particle size.
Hyperparameter optimization was performed using a randomized search strategy within predefined parameter spaces, combined with 5-fold cross-validation to ensure robust model training and reliable parameter selection. The optimal hyperparameters obtained for each model are summarized in Table S2. Following optimization, the final models were retrained on the full training dataset and evaluated on an independent hold-out test set to assess their predictive performance. As shown in Fig. 4d–f, all three ensemble models demonstrated strong predictive performance, with predicted particle sizes closely matching the experimental values. Among these, the CB model delivered the highest accuracy, achieving R2 values of 0.93 for the training set and 0.90 for the testing set, thereby establishing CB as the most accurate and reliable algorithm for predicting ZIF-8 particle size.
To further elucidate the effects of individual synthesis parameters, SHAP dependence plots derived from the optimized CB model were analyzed. SHAP values quantify how each feature value shifts the prediction relative to the baseline, defined as the average model output. Positive SHAP values indicate a tendency toward larger predicted particle sizes, whereas negative values correspond to smaller predicted sizes. As shown in Fig. 5c, the solvent type exhibits a pronounced and systematic effect on the predicted particle size. Methanol is predominantly associated with negative SHAP values, indicating a tendency toward smaller particles, whereas water yields mainly positive SHAP values, corresponding to larger predicted sizes. This difference is consistent with the distinct physicochemical properties of the two solvents. The lower polarity of methanol can reduce precursor solubility and promote higher supersaturation, thereby favouring rapid nucleation and suppressing subsequent crystal growth.46 In contrast, the higher polarity and stronger coordination ability of water can stabilize Zn2+ species in solution, moderate supersaturation, and shift the system toward more growth-dominated regimes.47 On this basis, the effects of the main synthesis descriptors were further interpreted with explicit consideration of solvent type.
In comparison, stirring conditions show a much weaker effect on the model output (Fig. 5d). Although slight differences are observed among the three stirring modes, their interpretation remains limited because the literature data generally do not provide sufficiently detailed information on stirring speed or intensity. A more comprehensive assessment of stirring effects would require controlled experiments with well-defined mixing parameters, which will be examined in future work.
Among the continuous descriptors, the precursor-related parameters are the most influential. As shown in Fig. 5e, the distribution of C2-HmIm/CZn is strongly separated by solvent type, with methanol-based samples concentrated mainly in the low-ratio region and water-based samples extending to much higher values. The inset highlights the low-ratio range that contains most methanol-derived data points. Despite this disparity, both solvents display a similar overall trend, in which increasing C2-HmIm/CZn shifts the SHAP value toward more negative contributions, indicating smaller predicted particle sizes. This observation is consistent with classical nucleation–growth theory, wherein low linker-to-metal ratios suppress nucleation and facilitate the formation of larger crystals, whereas high ratios generate abundant nuclei that compete for available precursors and ultimately yield smaller particles.48,49
Reaction time also exhibits an overall positive dependence (Fig. 5f), with longer synthesis durations corresponding to more positive SHAP values. This trend is consistent with Ostwald ripening, wherein smaller crystallites dissolve and redeposit onto larger ones, leading to progressive coarsening until thermodynamic equilibrium is reached.50
A generally positive trend is observed for CZn (Fig. 5g). At concentrations below 0.1 mol L−1, the SHAP values fluctuate around zero, likely because particle size in this regime is strongly influenced by other synthesis variables. As CZn increases, the SHAP values become increasingly positive, indicating a tendency toward larger predicted particle sizes. High precursor concentrations dominantly accelerate primary nucleation and crystal growth through increased supersaturation. Moreover, the resulting high particle density enhances inter-particle collisional growth, and the rapid coalescence of primary clusters and secondary nuclei also leads to the formation of larger particles.51
Solvent volume (Fig. 5h) exhibits a nonmonotonic and relatively dispersed effect on the SHAP values. Most data points are distributed around zero, suggesting that the influence of solvent volume is not systematic but likely coupled with other synthesis parameters. The large-volume systems, where extended diffusion distances allow local concentration gradients to persist, induce spatial inhomogeneity, creating microenvironments that promote uneven nucleation and localized aggregation. Therefore, the solvent volume mainly affects not particle size but particle size distribution.
Reaction temperature (Fig. 5i) does not exhibit a clear trend in SHAP values, indicating the absence of systematic temperature dependence within the present dataset. Most data points are concentrated in the near-ambient temperature range and show SHAP values close to zero, although a few higher-temperature samples display positive deviations. This is likely due to the narrow temperature range reported in the literature, as most syntheses were conducted under near-ambient conditions.
Overall, SHAP analysis reveals a physically meaningful interpretation of the factors controlling ZIF-8 particle size. In detail, precursor-related descriptors, particularly C2-HmIm/CZn and CZn, together with reaction time, make the dominant contributions to the model output, whereas solvent-related variables provide additional modulation. Notably, the SHAP dependence plots of the RF and XGB models shown in Fig. S5 and S6 exhibit similar trends for the major descriptors, indicating that the identified parameter effects are consistent across different models. This mechanistic insight enables more rational synthesis design, in which target particle sizes can be approached through systematic adjustment of the key process variables identified by the feature-importance analysis.
Fig. S8 shows the XRD patterns of the prepared samples. All the diffraction peaks of the samples are consistent with the standard PDF card of ZIF-8 (JCPDS 00-062-1030), indicating the successful synthesis of ZIF-8 and the well-preserved framework structure. The SEM images and corresponding particle-size distributions of all samples are provided in Fig. S9. A representative SEM image is shown in Fig. 6a, where the obtained ZIF-8 particles exhibit a well-defined polyhedral morphology. For statistical reliability, particle sizes were determined from more than 100 individually measured particles for each sample, and the average value was taken as the experimental particle size. As shown in Fig. 6b, the predicted particle sizes agree closely with the experimentally measured values over a broad size range.
The observed particle size differences among the validation samples are also consistent with the SHAP analysis of the dominant synthesis descriptors and provide practical guidelines for directional particle size control. For synthesizing larger ZIF-8 particles, conditions with positive SHAP contributions should be selected, particularly water-based synthesis environments, longer reaction times, and suitable C2-HmIm/CZn values. For example, sample 7 exhibited the largest particle size because of the combined positive SHAP contributions of its key synthesis descriptors. As shown in Fig. 3, the water-based synthesis environment generally contributes more positively to particle growth than methanol-based systems. In addition, the long reaction time of 1440 min corresponds to a strongly positive SHAP region, indicating prolonged crystal growth after nucleation. The C2-HmIm/CZn value of 38 is also located within a positive SHAP region under the water-based environment. These factors collectively promote the formation of large ZIF-8 particles. In contrast, smaller particles are favoured by methanol-based synthesis and short reaction time, as observed for sample 8. In addition, sample 3 exhibited a larger particle size than sample 8, which can be mainly associated with its higher CZn. A higher CZn may increase the concentration of primary nuclei and enhance the probability of inter-particle collision, aggregation, and secondary particle formation, ultimately contributing to larger particle sizes. Therefore, reducing CZn, together with using methanol-based synthesis and limiting prolonged growth, can be an effective strategy for obtaining smaller ZIF-8 particles within the investigated synthesis space. These results further confirm the robustness of the proposed framework and demonstrate its capability to reliably guide the synthesis of ZIF-8 with targeted particle sizes under diverse reaction conditions.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6sc03212e.
| This journal is © The Royal Society of Chemistry 2026 |