Materials process informatics-assisted precise particle size control of metal–organic frameworks

Yuan Wang; Heng Liu; Yusuke Hashimoto; Kazuyuki Iwase; Hao Li; Takaaki Tomai

doi:10.1039/D6SC03212E

View PDF Version

Open Access Article

This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

DOI: 10.1039/D6SC03212E (Edge Article) Chem. Sci., 2026, Advance Article

Materials process informatics-assisted precise particle size control of metal–organic frameworks

Yuan Wang^ab, Heng Liu^c, Yusuke Hashimoto^d, Kazuyuki Iwase^a, Hao Li*^c and Takaaki Tomai*^ad
^aInstitute of Multidisciplinary Research for Advanced Materials, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai, 980-8577, Japan. E-mail: takaaki.tomai.e6@tohoku.ac.jp
^bGraduate School of Engineering, Tohoku University, 6-6-11 Aramaki-aza Aoba, Aoba-ku, Sendai, 980-8579, Japan
^cAdvanced Institute for Materials Research (WPI-AIMR), Tohoku University, Sendai, 980-8577, Japan
^dFrontier Research Institute for Interdisciplinary Sciences, Tohoku University, Sendai, 980-8577, Japan

Received 17th April 2026 , Accepted 18th June 2026

First published on 20th June 2026

Abstract

Precise control over the particle size of metal–organic frameworks (MOFs) is pivotal for optimizing their performance in catalysis, separation, and drug delivery. However, conventional synthetic strategies largely depend on empirical trial-and-error, which lacks predictive power and fails to decode the complex interplay between nucleation and growth kinetics. Herein, we report a materials process informatics framework for the predictive control of MOF particle size, using zeolitic imidazolate framework-8 (ZIF-8) as a representative model system. A comprehensive database was constructed through systematic curation from the literature, with seven process descriptors employed as input features. Multiple machine-learning algorithms were benchmarked, among which the Categorical Boosting (CB) model achieved the best predictive performance after hyperparameter optimization, with a coefficient of determination (R²) of 0.90 on the test set. Furthermore, SHapley Additive exPlanations (SHAP) analysis identified the precursor concentration ratio and reaction time as the most influential parameters governing particle size. Experimental validation using an automated synthesis platform showed excellent agreement between predicted and measured particle sizes, confirming the model's robustness and predictive reliability. Overall, the proposed framework enables intelligent synthesis optimization and targeted experimental design, thereby providing a practical route toward controllable MOF synthesis. This work demonstrates how materials process informatics can shift MOF particle-size engineering from empirical optimization toward data-driven design, offering a broadly applicable strategy for advanced materials synthesis.

Introduction

Metal–organic frameworks (MOFs), a class of crystalline porous materials assembled from metal nodes (ions or clusters) and organic linkers, have attracted considerable attention due to their tunable structures and versatile functionalities.¹ By varying the metal centers and linker motifs, MOFs with diverse topologies and well-defined pore environments can be systematically designed and synthesized.^2,3 As a result, MOFs have shown considerable promise in a wide range of applications, including gas storage and separation, heterogeneous catalysis, and biomedical fields such as drug delivery and therapy.^4–7

Zeolitic imidazolate framework-8 (ZIF-8), constructed from zinc ions (Zn²⁺) and 2-methylimidazole (2-HmIm) linkers, has emerged as one of the most representative MOF systems because of its remarkable chemical and thermal stability.⁸ Importantly, the performance of ZIF-8 is governed by complex structure–property relationships, in which particle size serves as a critical parameter. In general, smaller particles are favourable for catalysis and drug delivery, as they can increase surface accessibility and shorten diffusion distances.^9–11 However, when ZIF-8 is used as a sacrificial template or precursor, excessively small particles may become structurally vulnerable during pyrolysis, thereby compromising the integrity and durability of the derived catalytic materials.^12,13 In membrane-based separation, particle size also strongly affects interfacial compatibility and defect formation, both of which are closely related to separation performance.^14,15 Therefore, precise control of ZIF-8 particle size during synthesis is essential for optimizing performance across diverse functional domains.

The synthesis of ZIF-8 is governed by a multidimensional process space defined by precursor concentration, ligand-to-metal ratio, reaction time, temperature, and other synthesis variables.^16–19 Despite this complexity, current strategies for particle size control remain largely empirical, typically relying on iterative trial-and-error adjustment of synthesis conditions based on literature protocols or prior experience.²⁰ Such approaches are not only time-consuming and resource-intensive, but also make it difficult to capture complex parameter interactions and non-linear dependencies that govern particle formation. Consequently, the precise and reproducible synthesis of ZIF-8 with target particle sizes remains a significant challenge.

Materials informatics has emerged as an important paradigm in modern materials science by accelerating the discovery and optimization of functional materials through the integration of data science and domain knowledge. Within this context, machine learning (ML) has become a particularly powerful approach because it enables predictive models to be constructed directly from experimental and computational datasets.^21,22 In synthesis research, this concept further extends to materials process informatics, where multidimensional process variables are correlated with target material characteristics to guide rational process design.^23,24 This paradigm has shifted materials research from trial-and-error experiments toward data-driven strategies, particularly in nanomaterial synthesis.^25,26 Recent studies have demonstrated the potential of machine learning for predicting particle size from synthesis parameters. Liu et al. demonstrated that random forest models could accurately predict the phase and classify particle sizes into broad categories such as nano, sub-micron, and micron, while the model failed to predict the exact particle size.²⁷ Pellegrino et al. developed a stacking ensemble model to estimate the size of ZnO nanoparticles, but the relatively small dataset used for training may limit the robustness of the resulting model.²⁸ Zhang et al. further applied ML regression to Eu-MOF synthesis, but the sparsity of the dataset (27 samples) restricted the analysis mainly to the dominant variable and hindered the identification of more complex multivariate effects.²⁹ In addition, machine learning has also been applied to the synthesis analysis of ZIF-8.^30,31 Allegretto et al. proposed a unified roadmap for ZIF-8 nucleation and growth by analyzing how synthetic variables influence particle size and morphology under water- and methanol-based conditions, providing valuable insights into the synthesis mechanism of ZIF-8.³¹ Despite these advances, the development of more accurate predictive frameworks for experimentally guided particle size control across multidimensional synthesis conditions remains an important direction.

In this study, we develop a materials process informatics framework to predict the particle size of ZIF-8 as a function of synthesis parameters. Multiple regression algorithms are systematically benchmarked to determine the optimal predictive model. Model interpretability is established using SHapley Additive exPlanations (SHAP) to quantify the contribution of each process parameter and identify the key descriptors governing particle size. The predictive capability of the resulting model is further evaluated through independent synthesis experiments conducted by an automatic synthesis system. By integrating predictive modelling with feature-importance analysis, this study achieves exceptional size prediction and provides meaningful insight into the process–size relationship, thereby clarifying how key synthesis parameters govern particle size evolution. Overall, these results demonstrate the potential of machine learning to move MOF synthesis beyond empirical optimization toward more rational and data-guided process design.

Experimental and methods

Chemicals and materials

All the reagents and solvents were used without further purification. Zinc nitrate hexahydrate (Zn(NO₃)₂·6H₂O, 99%), 2-methylimidazole (C₄H₆N₂, 98%), and methanol (CH₃OH) were purchased from FUJIFILM Wako Pure Chemical Corporation.

Synthesis of ZIF-8 for model validation

To evaluate model accuracy, a series of independent experiments were conducted. Specifically, 2-methylimidazole and Zn(NO₃)₂·6H₂O were separately dissolved in equal volumes of methanol or water under continuous magnetic stirring to prepare solution A and solution B, respectively. Solution A was subsequently added to solution B through an automatic synthesis system to generate a homogeneous mixture. After a certain reaction time, the resulting suspension was centrifuged at 10 [thin space (1/6-em)]

000 rpm for 4 minutes to collect the solid product. The obtained precipitate was washed 3 times with methanol to remove unreacted precursors and then dried at 60 °C for 12 hours to yield the final ZIF-8 powder. The detailed process parameters for these experiments are summarized in Table 1. To enhance experimental precision and minimize deviation by human error, an automated synthesis system was employed. The detailed equipment specifications and photographs of the system are provided in Fig. S1. The particle size of these samples was measured using ImageJ software.

Table 1 Summary of experimental conditions for model validation

Sample	C_Zn	C_2-HmIm/C_Zn	Solvent amount (mL)	Reaction time (min)	Stirring	Solvent	Temperature (°C)	Predicted size (nm)	Actual size (nm)
1	0.40	8	20	1440	T	Methanol	25	365	290
2	0.13	4	60	1440	T	Methanol	25	236	210
3	0.40	16	20	30	T	Methanol	25	184	204
4	0.13	8	60	120	F	Methanol	25	130	141
5	0.08	40	100	60	T	Water	25	344	292
6	0.20	16	140	720	Initial	Methanol	25	252	228
7	0.05	38	100	1440	T	Water	25	1372	1382
8	0.08	16	100	30	T	Methanol	25	78	61

Characterization

The crystallization properties of ZIF-8 were characterized by X-ray diffraction (XRD) using a Rigaku SmartLab 9MTP diffractometer equipped with a Cu Kα radiation source (λ = 1.5406 Å). The morphologies and particle sizes of the synthesized ZIF-8 were measured using a scanning electron microscope (SEM, JSM-7800F).

Database creation

The dataset used in this study was established through a systematic literature survey of the Web of Science database using the keyword “ZIF-8 size”. An initial search returned 2392 records, of which 2304 remained after excluding review articles. Titles and abstracts were then manually screened to identify studies explicitly reporting the synthesis of ZIF-8. To ensure data consistency, only studies employing Zn(NO₃)₂ and 2-methylimidazole (2-HmIm) as precursors were retained. The target variable was average particle size, which was determined from scanning electron microscopy (SEM) or transmission electron microscopy (TEM) images.

Seven synthesis parameters were used as input features, including the zinc precursor concentration (C_Zn), ligand-to-metal concentration ratio (C_2-HmIm/C_Zn), solvent type (water or methanol), solvent volume, reaction time, reaction temperature, and stirring conditions (continuous stirring, denoted as T; stirring only during the initial mixing step, denoted as Initial; and quiescent synthesis, denoted as F). These parameters were selected based on the fundamental mechanism of particle formation, which involves nucleation and crystal growth. In particular, C_Zn and C_2-HmIm/C_Zn govern the degree of supersaturation and therefore strongly influence nucleation behaviour and the formation of primary nuclei. Solvent type also plays an important role because its physicochemical properties, such as dielectric constant, dipole moment, and van der Waals volume, affect solubility and surface energy, thereby modulating nucleation behavior.³² During the subsequent growth stage, particle size is further influenced by supersaturation, which determines the supply of building units and the rate of diffusion-limited growth, whereas reaction time governs the extent of growth and possible coarsening associated with Ostwald ripening.³³ Temperature exerts a coupled effect on both thermodynamic and kinetic factors, including solubility, interfacial energy, diffusion, and attachment rate, thereby exerting a strong influence on particle size.³⁴ In addition, solvent volume and stirring conditions were included because they influence mixing uniformity and mass transport, both of which are important for achieving homogeneous crystal growth.³⁵

ML algorithms and evaluation methods

A comprehensive benchmarking study was performed to evaluate the predictive performance of seven representative machine learning (ML) algorithms. The models included Random Forest (RF), eXtreme Gradient Boosting (XGB), Categorical Boosting (CB), Light Gradient Boosting Machine (LGBM), Support Vector Regression (SVR), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), covering ensemble approaches, kernel-based methods, instance-based learning, and neural networks. The XGB and CB models were implemented using the xgboost and catboost libraries,^35,36 respectively, whereas the remaining algorithms were developed with the scikit-learn package (version 1.4.2) in Python.³⁷

Before model training, the database was randomly divided into training and testing subsets using an 80 [thin space (1/6-em)] :20 split ratio with a fixed random state of 60. Continuous variables were standardized to have a mean of 0 and a standard deviation of 1, thereby eliminating scale disparities among descriptors. Except for the CB model, categorical variables were converted into numerical features using one-hot encoding to ensure compatibility with machine learning algorithms. Hyperparameter optimization was conducted by a randomized search of the parameter space with 100 iterations combined with 5-fold cross-validation (CV).³⁸ Model performance was evaluated using three standard regression metrics: the coefficient of determination (R²), root-mean-square error (RMSE), and mean absolute error (MAE).³⁹

Results and discussion

Workflow

Fig. 1 illustrates the overall workflow of the materials process informatics framework for size control of ZIF-8 particles. A literature-curated database was first established through systematic screening and data extraction, from which key synthesis parameters were defined as input descriptors, and particle size was used as the target. The resulting dataset was subsequently subjected to feature engineering, model training, and performance evaluation to identify an appropriate regression model for particle-size prediction. Feature-importance analysis was then performed to quantify the relative contributions of individual synthesis parameters. Feature importance analysis was then performed to reveal the relative contributions of each synthesis parameter. Finally, the model was experimentally validated through an automated synthesis platform, and the newly generated data provide a basis for iterative refinement of the predictive framework.


	Fig. 1 Proposed workflow of materials process informatics-assisted size control of MOFs.

Database creation

After extraction, 274 valid data points were obtained from 130 articles covering 17 years from 2009 to 2025. The literature sources included in the database are summarized in Table S1. As shown in Fig. S2, these sources form an interconnected citation network, reflecting the cumulative development of ZIF-8 synthesis research, in which many later studies were derived from or built upon previously reported protocols. The color scale further shows that these publications span multiple years, reflecting the temporal distribution of the literature considered in this work. In addition, a digital MOF platform (DigMOF, https://www.digmof.org/) was developed to integrate the curated dataset, enabling dynamic data visualization, precise literature tracking, and efficient data management. The corresponding user interface is shown in Fig. S3.

Fig. 2 presents the distributions of particle size and the seven synthesis descriptors. The dataset covers a relatively broad synthesis domain, with particle size spanning a wide range, indicating substantial variability in the reported ZIF-8 crystals under different preparation conditions. Pronounced right-skewed distributions are observed for C_Zn, C_2-HmIm/C_Zn, and solvent volume, suggesting that most reported syntheses were conducted within relatively limited parameter ranges. Reaction temperature is strongly centred around 25 °C, reflecting the predominance of room-temperature synthesis in the literature. In contrast, reaction time exhibits a bimodal distribution, with one cluster at short durations on the order of tens of minutes and another extending to long durations exceeding 1000 min. For the categorical descriptors, methanol is used more frequently than water, and continuous stirring is much more common than either initial stirring only or quiescent synthesis. Collectively, these results indicate that the dataset spans a diverse but unevenly sampled synthesis space, reflecting the intrinsic heterogeneity of literature-derived experimental data. To prepare the dataset for machine learning, all continuous variables were standardized to remove differences in scale, whereas categorical variables were converted into numerical features through one-hot encoding.


	Fig. 2 Data distributions of synthesis parameters and particle size in the database: (a) particle size, (b) C_2-HmIm/C_Zn, (c) C_Zn, (d) solvent amount, (e) temperature, (f) reaction time, (g) solvent type, and (h) stirring conditions.

To further elucidate the relationships among numerical process parameters and their association with particle size, a Spearman correlation matrix was constructed (Fig. 3a). Strong correlations between input variables (|ρ| > 0.8) are generally undesirable because they may introduce redundancy and reduce model robustness. In this database, most descriptors show only weak pairwise correlations (|ρ| < 0.3), indicating minimal collinearity and a high degree of statistical independence among the input variables. With respect to particle size, reaction time shows the highest positive correlation, followed by C_Zn and solvent volume, whereas temperature exhibits almost no linear correlation with particle size. These weak correlations indicate that particle size cannot be explained by any single descriptor alone, highlighting the complexity of the underlying synthesis–size relationship. The categorical effects are further visualized in Fig. 3b and c. Water-based syntheses tend to yield larger particles with a broader size distribution than methanol-based systems. In addition, continuously stirred systems show a wider spread in particle size than quiescent or initially stirred conditions. Overall, these results highlight the importance of machine learning for capturing the coupled effects of multiple synthesis parameters on particle size.


	Fig. 3 (a) Spearman correlation coefficients between pairs of numerical synthesis parameters and between each parameter and the target particle size; (b and c) violin plot showing the influence of solvent type and stirring conditions on particle size.

Model selection and evaluation

Seven supervised learning algorithms with default hyperparameters were first benchmarked to identify suitable models for ZIF-8 particle-size prediction. The ensemble tree-based models were included because of their ability to capture nonlinear feature interactions and their strong robustness for relatively small datasets.^36,40–42 SVR and KNN were included as representatives of kernel-based and instance-based learning paradigms, respectively, while MLP was selected as a prototypical neural network model capable of learning complex nonlinear mappings.^43–45 As shown in Fig. 4a–c, clear differences in predictive performance were observed among the tested models. Overall, the ensemble tree-based algorithms exhibited the strongest performance. Among them, the CB model achieved the best overall accuracy, with the highest R² value (0.79) and the lowest RMSE and MAE. RF and XGB also showed strong predictive capability, whereas KNN, MLP, and SVR performed substantially worse. Collectively, these results identify ensemble learning methods, particularly CB, RF, and XGB, as the most suitable candidates for subsequent hyperparameter optimization and feature-importance analysis.


	Fig. 4 Performance comparison of machine learning models evaluated by 5-fold cross-validation in terms of (a) R², (b) RMSE, and (c) MAE, with error bars indicating the standard deviation across the five folds. The performance of the hyperparameter-optimized models on the training and hold-out test sets is further presented for (d) CB, (e) RF, and (f) XGB models.

Hyperparameter optimization was performed using a randomized search strategy within predefined parameter spaces, combined with 5-fold cross-validation to ensure robust model training and reliable parameter selection. The optimal hyperparameters obtained for each model are summarized in Table S2. Following optimization, the final models were retrained on the full training dataset and evaluated on an independent hold-out test set to assess their predictive performance. As shown in Fig. 4d–f, all three ensemble models demonstrated strong predictive performance, with predicted particle sizes closely matching the experimental values. Among these, the CB model delivered the highest accuracy, achieving R² values of 0.93 for the training set and 0.90 for the testing set, thereby establishing CB as the most accurate and reliable algorithm for predicting ZIF-8 particle size.

Importance analysis

To interpret the optimized models and identify the dominant synthesis parameters governing particle size, SHapley Additive exPlanations (SHAP) analysis was performed. SHAP quantifies the contribution of each input descriptor to the model output and further reveals whether a given feature value drives the prediction toward larger or smaller particle sizes. Firstly, the SHAP summary plot presents the overall distribution of feature contributions across the dataset, with each point corresponding to an individual sample. The accompanying bar chart provides a quantitative ranking of feature importance based on the relative contributions derived from normalized mean absolute SHAP values. As shown in Fig. 5a, b and S4, the three optimized ensemble models exhibit highly consistent importance rankings, indicating that the identified descriptor effects are robust across different models. In all cases, C_2-HmIm/C_Zn emerged as the most influential parameter, exhibiting the highest relative contribution of around 25%. Reaction time was identified as the second most important factor, followed by C_Zn. Solvent volume and solvent type have moderate contributions, whereas stirring conditions and temperature contribute much less to the final prediction. The relatively low SHAP importance of temperature is likely attributed to the narrow temperature range represented in the literature dataset, with most syntheses being conducted near room temperature.


	Fig. 5 SHAP analysis of the CB model, including (a) the global SHAP summary plot, (b) the SHAP-based feature importance ranking of all features, and SHAP dependence plots for (c) solvent type, (d) stirring conditions, (e) C_2-HmIm/C_Zn, (f) reaction time, (g) C_Zn, (h) solvent volume, and (i) temperature. Points are colored by solvent type (blue: methanol; orange: water). The inset in (c) provides a magnified view of the low-ratio region (0–20).

To further elucidate the effects of individual synthesis parameters, SHAP dependence plots derived from the optimized CB model were analyzed. SHAP values quantify how each feature value shifts the prediction relative to the baseline, defined as the average model output. Positive SHAP values indicate a tendency toward larger predicted particle sizes, whereas negative values correspond to smaller predicted sizes. As shown in Fig. 5c, the solvent type exhibits a pronounced and systematic effect on the predicted particle size. Methanol is predominantly associated with negative SHAP values, indicating a tendency toward smaller particles, whereas water yields mainly positive SHAP values, corresponding to larger predicted sizes. This difference is consistent with the distinct physicochemical properties of the two solvents. The lower polarity of methanol can reduce precursor solubility and promote higher supersaturation, thereby favouring rapid nucleation and suppressing subsequent crystal growth.⁴⁶ In contrast, the higher polarity and stronger coordination ability of water can stabilize Zn²⁺ species in solution, moderate supersaturation, and shift the system toward more growth-dominated regimes.⁴⁷ On this basis, the effects of the main synthesis descriptors were further interpreted with explicit consideration of solvent type.

In comparison, stirring conditions show a much weaker effect on the model output (Fig. 5d). Although slight differences are observed among the three stirring modes, their interpretation remains limited because the literature data generally do not provide sufficiently detailed information on stirring speed or intensity. A more comprehensive assessment of stirring effects would require controlled experiments with well-defined mixing parameters, which will be examined in future work.

Among the continuous descriptors, the precursor-related parameters are the most influential. As shown in Fig. 5e, the distribution of C_2-HmIm/C_Zn is strongly separated by solvent type, with methanol-based samples concentrated mainly in the low-ratio region and water-based samples extending to much higher values. The inset highlights the low-ratio range that contains most methanol-derived data points. Despite this disparity, both solvents display a similar overall trend, in which increasing C_2-HmIm/C_Zn shifts the SHAP value toward more negative contributions, indicating smaller predicted particle sizes. This observation is consistent with classical nucleation–growth theory, wherein low linker-to-metal ratios suppress nucleation and facilitate the formation of larger crystals, whereas high ratios generate abundant nuclei that compete for available precursors and ultimately yield smaller particles.^48,49

Reaction time also exhibits an overall positive dependence (Fig. 5f), with longer synthesis durations corresponding to more positive SHAP values. This trend is consistent with Ostwald ripening, wherein smaller crystallites dissolve and redeposit onto larger ones, leading to progressive coarsening until thermodynamic equilibrium is reached.⁵⁰

A generally positive trend is observed for C_Zn (Fig. 5g). At concentrations below 0.1 mol L⁻¹, the SHAP values fluctuate around zero, likely because particle size in this regime is strongly influenced by other synthesis variables. As C_Zn increases, the SHAP values become increasingly positive, indicating a tendency toward larger predicted particle sizes. High precursor concentrations dominantly accelerate primary nucleation and crystal growth through increased supersaturation. Moreover, the resulting high particle density enhances inter-particle collisional growth, and the rapid coalescence of primary clusters and secondary nuclei also leads to the formation of larger particles.⁵¹

Solvent volume (Fig. 5h) exhibits a nonmonotonic and relatively dispersed effect on the SHAP values. Most data points are distributed around zero, suggesting that the influence of solvent volume is not systematic but likely coupled with other synthesis parameters. The large-volume systems, where extended diffusion distances allow local concentration gradients to persist, induce spatial inhomogeneity, creating microenvironments that promote uneven nucleation and localized aggregation. Therefore, the solvent volume mainly affects not particle size but particle size distribution.

Reaction temperature (Fig. 5i) does not exhibit a clear trend in SHAP values, indicating the absence of systematic temperature dependence within the present dataset. Most data points are concentrated in the near-ambient temperature range and show SHAP values close to zero, although a few higher-temperature samples display positive deviations. This is likely due to the narrow temperature range reported in the literature, as most syntheses were conducted under near-ambient conditions.

Overall, SHAP analysis reveals a physically meaningful interpretation of the factors controlling ZIF-8 particle size. In detail, precursor-related descriptors, particularly C_2-HmIm/C_Zn and C_Zn, together with reaction time, make the dominant contributions to the model output, whereas solvent-related variables provide additional modulation. Notably, the SHAP dependence plots of the RF and XGB models shown in Fig. S5 and S6 exhibit similar trends for the major descriptors, indicating that the identified parameter effects are consistent across different models. This mechanistic insight enables more rational synthesis design, in which target particle sizes can be approached through systematic adjustment of the key process variables identified by the feature-importance analysis.

Experimental validation

To further evaluate the predictive reliability of the developed framework, eight independent validation experiments were carried out under synthesis conditions that were not included in the original database. Given that tree-based algorithms such as the CB model generally have limited extrapolation capability outside the descriptor domain represented in the database, the validation experiments were intentionally designed within the experimentally accessible synthesis parameter space covered by the literature-derived dataset. Meanwhile, the reaction temperature was fixed at 25 °C to focus on the effects of the more influential synthesis variables while minimizing uncertainty associated with the sparsely represented temperature conditions in the dataset. To further clarify the relationship between the validation samples and the original database, principal component analysis (PCA) was performed. As shown in Fig. S7, all eight validation samples are located within the main descriptor space covered by the literature-derived dataset, rather than in regions far outside the distribution of the dataset, indicating that the validation experiments mainly assess the interpolative reliability of the model within the investigated synthesis space. Although the validation is mainly interpolative, the curated dataset spans a relatively broad range of synthesis conditions, allowing the model reliability to be evaluated across a broad synthesis space. The detailed experimental conditions are summarized in Table 1.

Fig. S8 shows the XRD patterns of the prepared samples. All the diffraction peaks of the samples are consistent with the standard PDF card of ZIF-8 (JCPDS 00-062-1030), indicating the successful synthesis of ZIF-8 and the well-preserved framework structure. The SEM images and corresponding particle-size distributions of all samples are provided in Fig. S9. A representative SEM image is shown in Fig. 6a, where the obtained ZIF-8 particles exhibit a well-defined polyhedral morphology. For statistical reliability, particle sizes were determined from more than 100 individually measured particles for each sample, and the average value was taken as the experimental particle size. As shown in Fig. 6b, the predicted particle sizes agree closely with the experimentally measured values over a broad size range.


	Fig. 6 Experimental validation of the predictive model. (a) SEM image of a representative ZIF-8 sample, with the corresponding particle-size distribution shown in the inset, obtained from statistical analysis of more than 100 particles. (b) Comparison between the predicted and experimentally measured particle sizes; the dashed line represents perfect agreement.

The observed particle size differences among the validation samples are also consistent with the SHAP analysis of the dominant synthesis descriptors and provide practical guidelines for directional particle size control. For synthesizing larger ZIF-8 particles, conditions with positive SHAP contributions should be selected, particularly water-based synthesis environments, longer reaction times, and suitable C_2-HmIm/C_Zn values. For example, sample 7 exhibited the largest particle size because of the combined positive SHAP contributions of its key synthesis descriptors. As shown in Fig. 3, the water-based synthesis environment generally contributes more positively to particle growth than methanol-based systems. In addition, the long reaction time of 1440 min corresponds to a strongly positive SHAP region, indicating prolonged crystal growth after nucleation. The C_2-HmIm/C_Zn value of 38 is also located within a positive SHAP region under the water-based environment. These factors collectively promote the formation of large ZIF-8 particles. In contrast, smaller particles are favoured by methanol-based synthesis and short reaction time, as observed for sample 8. In addition, sample 3 exhibited a larger particle size than sample 8, which can be mainly associated with its higher C_Zn. A higher C_Zn may increase the concentration of primary nuclei and enhance the probability of inter-particle collision, aggregation, and secondary particle formation, ultimately contributing to larger particle sizes. Therefore, reducing C_Zn, together with using methanol-based synthesis and limiting prolonged growth, can be an effective strategy for obtaining smaller ZIF-8 particles within the investigated synthesis space. These results further confirm the robustness of the proposed framework and demonstrate its capability to reliably guide the synthesis of ZIF-8 with targeted particle sizes under diverse reaction conditions.

Conclusions

In conclusion, this work establishes a materials process informatics framework for the predictive control of ZIF-8 particle size. The optimized CB model achieved the best performance, with an R² of 0.90. Interpretable SHAP analysis further identified the precursor concentration ratio and reaction time as the dominant parameters governing particle size. These findings provide a holistic insight into the particle formation mechanism by clarifying how process parameters affect nucleation and crystal growth. Experimental validation under synthesis conditions outside the database confirmed the framework's predictive accuracy and robustness. However, the model remains limited in resolving temperature-dependent effects because of the narrow temperature variation represented in the current dataset. In addition, incomplete reporting of stirring speed and mixing intensity in the literature limits quantitative assessment of hydrodynamic influences on particle size distribution. Future work will focus on expanding the accessible synthesis space through automated and systematically designed experiments, while incorporating underrepresented process variables to improve model generalizability and predictive resolution. Overall, the framework developed here is readily transferable to other MOFs and related porous materials and highlights the broader potential of data-driven approaches to move synthesis research from empirical optimization toward predictive design.

Author contributions

Yuan Wang: writing – original draft, writing – review & editing, methodology, investigation, conceptualization. Heng Liu: writing – review & editing, methodology. Yusuke Hashimoto: writing – review & editing, methodology. Kazuyuki Iwase: writing – review & editing. Hao Li: supervision, writing – review & editing. Takaaki Tomai: supervision, writing – review & editing, resources.

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The database supporting this article has been uploaded to a digital MOF platform (DigMOF, https://www.digmof.org/). The curated dataset and source code supporting this article are available at https://github.com/digmof/MOF_size_prediction.

Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6sc03212e.

Acknowledgements

This work was supported by grants from the Japan Society for the Promotion of Science (JSPS), KAKENHI (Grant Numbers JP23K23113, JP25K01737, and JP24K23069) and JST SPRING (Grant Number JPMJSP2114).

Notes and references

J. L. C. Rowsell and O. M. Yaghi, Microporous Mesoporous Mater., 2004, 73, 3–14 CrossRef CAS.
H. Furukawa, K. E. Cordova, M. O'Keeffe and O. M. Yaghi, Science, 2013, 341, 1230444 CrossRef PubMed.
N. Sun, S. S. A. Shah, Z. Lin, Y.-Z. Zheng, L. Jiao and H.-L. Jiang, Chem. Rev., 2025, 125, 2703–2792 CrossRef CAS PubMed.
K. Heinz, S. M. J. Rogge, A. Kalytta-Mewes, D. Volkmer and H. Bunzen, Inorg. Chem. Front., 2023, 10, 4763–4772 RSC.
Y. Pan, T. Li, G. Lestari and Z. Lai, J. Membr. Sci., 2012, 390–391, 93–98 Search PubMed.
C.-H. Kuo, Y. Tang, L.-Y. Chou, B. T. Sneed, C. N. Brodsky, Z. Zhao and C.-K. Tsung, J. Am. Chem. Soc., 2012, 134, 14345–14348 CrossRef CAS PubMed.
T. Van Tran, H. Huy Dang, H. Nguyen, N. T. Thao Nguyen, D. Hai Nguyen and T. T. Thanh Nguyen, Nanoscale Adv., 2025, 7, 3941–3960 RSC.
K. S. Park, Z. Ni, A. P. Côté, J. Y. Choi, R. Huang, F. J. Uribe-Romo, H. K. Chae, M. O'Keeffe and O. M. Yaghi, Proc. Natl. Acad. Sci. U. S. A., 2006, 103, 10186–10191 CrossRef CAS PubMed.
M. d. J. Velásquez-Hernández, R. Ricco, F. Carraro, F. T. Limpoco, M. Linares-Moreau, E. Leitner, H. Wiltsche, J. Rattenberger, H. Schröttner, P. Frühwirt, E. M. Stadler, G. Gescheidt, H. Amenitsch, C. J. Doonan and P. Falcaro, CrystEngComm, 2019, 21, 4538–4544 RSC.
O. M. Linder-Patton, T. J. de Prinse, S. Furukawa, S. G. Bell, K. Sumida, C. J. Doonan and C. J. Sumby, CrystEngComm, 2018, 20, 4926–4934 RSC.
L. Zhou, H. Li, D. Wang, W. Jiang, Y. Wu, L. Shang, C. Guo, C. Liu and B. Ren, Electroanalysis, 2023, 35, e202200158 CrossRef CAS.
X. X. Wang, D. A. Cullen, Y.-T. Pan, S. Hwang, M. Wang, Z. Feng, J. Wang, M. H. Engelhard, H. Zhang, Y. He, Y. Shao, D. Su, K. L. More, J. S. Spendelow and G. Wu, Adv. Mater., 2018, 30, 1706758 CrossRef PubMed.
V. Armel, J. Hannauer and F. Jaouen, Catalysts, 2015, 5, 1333–1351 CrossRef CAS.
M. J. C. Ordoñez, K. J. Balkus, J. P. Ferraris and I. H. Musselman, J. Membr. Sci., 2010, 361, 28–37 CrossRef.
M. Vinoba, M. Bhagiyalakshmi, Y. Alqaheem, A. A. Alomair, A. Pérez and M. S. Rana, Sep. Purif. Technol., 2017, 188, 431–450 CrossRef CAS.
A. Schejn, L. Balan, V. Falk, L. Aranda, G. Medjahdi and R. Schneider, CrystEngComm, 2014, 16, 4493–4500 RSC.
J. Cravillon, S. Münzer, S.-J. Lohmeier, A. Feldhoff, K. Huber and M. Wiebcke, Chem. Mater., 2009, 21, 1410–1412 CrossRef CAS.
S. Tanaka, K. Kida, M. Okita, Y. Ito and Y. Miyake, Chem. Lett., 2012, 41, 1337–1339 CrossRef CAS.
S. R. Venna, J. B. Jasinski and M. A. Carreon, J. Am. Chem. Soc., 2010, 132, 18030–18033 CrossRef CAS PubMed.
D. Kim, J. Park, J. Park, J. Jang, M. Han, S. Lim, D. Y. Ryu, J. You, W. Zhu, Y. Yamauchi and J. Kim, Small Methods, 2024, 8, 2400236 CrossRef CAS PubMed.
M. I. Jordan and T. M. Mitchell, Science, 2015, 349, 255–260 CrossRef CAS PubMed.
K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev and A. Walsh, Nature, 2018, 559, 547–555 CrossRef CAS PubMed.
Z. Shi, W. Yang, X. Deng, C. Cai, Y. Yan, H. Liang, Z. Liu and Z. Qiao, Mol. Syst. Des. Eng., 2020, 5, 725–742 RSC.
M. Fernandez, P. G. Boyd, T. D. Daff, M. Z. Aghaji and T. K. Woo, J. Phys. Chem. Lett., 2014, 5, 3056–3060 CrossRef CAS PubMed.
S. Alayou, M. Mengesha and G. Tizazu, Measurement, 2025, 253, 117785 CrossRef.
L. Fan, H. Yu, Y. He, J. Guo, G. Min and J. Wang, ACS Appl. Nano Mater., 2025, 8, 2682–2692 CrossRef CAS.
J. Liu, Z. Zhang, X. Li, M. Zong, Y. Wang, S. Wang, P. Chen, Z. Wan, L. Liu, Y. Liang, W. Wang, S. Wang, X. Guo, E. G. Saldanha, K. M. Rosso and X. Zhang, Chem. Eng. J., 2023, 473, 145216 CrossRef CAS.
F. Pellegrino, R. Isopescu, L. Pellutiè, F. Sordello, A. M. Rossi, E. Ortel, G. Martra, V.-D. Hodoroaba and V. Maurino, Sci. Rep., 2020, 10, 18910 Search PubMed.
Q. Zhang, H. Liang, Y. Tao, J. Yang, B. Tang, R. Li, Y. Ma, L. Ji, X. Jiang and S. Li, Small Methods, 2022, 6, 2200208 CrossRef CAS PubMed.
Y. Du, C. Sanchez and D. Du, Mater. Today Commun., 2025, 42, 111177 CrossRef CAS.
J. A. Allegretto, D. Onna, S. A. Bilmes, O. Azzaroni and M. Rafti, Chem. Mater., 2024, 36, 5814–5825 CrossRef CAS PubMed.
E. L. Bustamante, J. L. Fernández and J. M. Zamaro, J. Colloid Interface Sci., 2014, 424, 37–43 Search PubMed.
P. W. Voorhees, J. Stat. Phys., 1985, 38, 231–252 Search PubMed.
W. Xuan, R. Ramachandran, C. Zhao and F. Wang, J. Solid State Electrochem., 2018, 22, 3873–3881 CrossRef CAS.
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush and A. Gulin, arXiv, 2019, preprint, arXiv:1706.09516, DOI:10.48550/arXiv.1706.09516.
T. Chen and C. Guestrin, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco California USA, 2016, pp. 785–794 Search PubMed.
F. Pedregosa, F. Pedregosa, G. Varoquaux, G. Varoquaux, N. Org, A. Gramfort, A. Gramfort, V. Michel, V. Michel, L. Fr, B. Thirion, B. Thirion, O. Grisel, O. Grisel, M. Blondel, P. Prettenhofer, P. Prettenhofer, R. Weiss, V. Dubourg, V. Dubourg, J. Vanderplas, A. Passos, A. Tp and D. Cournapeau, J. Mach. Learn. Res., 2011, 12, 2825–2830 Search PubMed.
J. Bergstra, J. Bergstra, Y. Bengio and Y. Bengio, J. Mach. Learn. Res., 2012, 13, 281–305 Search PubMed.
T. Chai and R. R. Draxler, Geosci. Model Dev., 2014, 7, 1247–1250 CrossRef.
L. Breiman, Mach. Learn., 2001, 45, 5–32 CrossRef.
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye and T.-Y. Liu, Adv. Neural Inf. Process. Syst, 2017, 30 Search PubMed.
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush and A. Gulin, Adv. Neural Inf. Process. Syst., 2018, 31 Search PubMed.
A. J. Smola and B. Schölkopf, Stat. Comput., 2004, 14, 199–222 CrossRef.
K. Hornik, M. Stinchcombe and H. White, Neural Networks, 1989, 2, 359–366 CrossRef.
T. Cover and P. Hart, IEEE Trans. Inf. Theory, 1967, 13, 21–27 Search PubMed.
M. Malekmohammadi, S. Fatemi, M. Razavian and A. Nouralishahi, Solid State Sci., 2019, 91, 108–112 CrossRef CAS.
A. A. Tezerjani, R. Halladj and S. Askari, RSC Adv., 2021, 11, 19914–19923 RSC.
Y. Pan, Y. Liu, G. Zeng, L. Zhao and Z. Lai, Chem. Commun., 2011, 47, 2071–2073 RSC.
J. Cravillon, R. Nayuk, S. Springer, A. Feldhoff, K. Huber and M. Wiebcke, Chem. Mater., 2011, 23, 2130–2141 Search PubMed.
S. R. Venna, J. B. Jasinski and M. A. Carreon, J. Am. Chem. Soc., 2010, 132, 18030–18033 CrossRef CAS PubMed.
J. A. Dirksen and T. A. Ring, Chem. Eng. Sci., 1991, 46, 2389–2427 CrossRef CAS.

Click here to see how this site uses Cookies. View our privacy policy here.