Open Access Article
Shi Feng
a,
Jing Liuc,
Zhuxin Li*c and
Shengyang Tao
*bc
aChina Nuclear Power Engineering Co., Ltd, Beijing, 100142, China
bState Key Laboratory of Fine Chemicals, Frontier Science Center for Smart Materials, Dalian University of Technology, Dalian, 116024, China. E-mail: taosy@dlut.edu.cn
cDalian Key Laboratory of Intelligent Chemistry, CR Belt and Road Joint Laboratory on Intelligent Chemistry and Advanced Materials of Liaoning Province, School of Chemistry, Dalian University of Technology, Dalian, 116024, China
First published on 26th May 2026
Preparing sodium alginate (SA) microspheres via spray-precipitation challenges precise size control due to complex parameter coupling. We propose an intelligent framework integrating adaptive Bayesian optimization (BO) with microfluidic spraying to maximize the yield of 90–110 µm microspheres. Utilizing a Gaussian process regression model and Latin hypercube sampling, the framework demonstrated exceptional efficiency. Initializing with 10 prior data points achieved convergence in just 12 iterations, reducing iteration cost by 29.4% compared to using 5 priors. Under optimal conditions, the predicted target droplet proportion (18.60%) precisely matched experiments (18.34%), yielding 18.75% target-sized SA gel microspheres post-curing. Additionally, SHAP analysis revealed that gas and liquid pressures dominate size distribution, elucidating the physical mechanism behind multiple local optima via multi-feature compensation. This study provides an efficient, low-cost strategy for customizing polymer microspheres, establishing a robust machine-learning paradigm for optimizing complex multiphase flows.
Among various microsphere fabrication technologies, the spray-precipitation technique stands out as a core process for synthesizing gel microspheres owing to its operational simplicity, excellent continuous processing capability, and facile industrial scale-up.13–15 However, this method faces a critical challenge in practical applications: the rapid, precise, and tailored fabrication of microspheres with specific target sizes. The final microsphere size in the spray-crosslinking process is directly dictated by the droplet size formed during instantaneous atomization of the precursor solution, as well as by the uniformity of its spatial distribution.16–18 The atomization and breakup of droplets are jointly governed by multiple variables, including the physical properties of the precursor (e.g., viscosity and surface tension, which are concentration-dependent),19 operating parameters (gas and liquid pressures),20 and spray distance.21 Given the highly complex, nonlinear coupling among these variables, traditional one-factor-at-a-time (OFAT) methods or orthogonal experimental designs often require extensive, tedious trial-and-error experiments. Such approaches are not only time-consuming, labor-intensive, and costly, but also generally fail to pinpoint the global optimum within a vast parameter space.
To overcome this bottleneck, data-driven machine learning (ML) strategies have been increasingly integrated into the parameter optimization of chemical engineering processes and microfluidic systems in recent years.22–25 When addressing high-cost, small-sample experimental optimization involving multiphase flows, Bayesian optimization (BO) demonstrates unparalleled advantages over traditional grid search or random search.26–28 BO employs a probabilistic surrogate model (such as Gaussian process regression, GPR)29 to approximate the objective function and uses an acquisition function to balance exploration of uncharted regions with exploitation of known high-reward areas. This adaptive learn-and-predict mechanism can converge to the global optimum with a minimal number of experimental evaluations, making it exceptionally well-suited to resolving the black-box parameter-regulation challenges inherent in microsphere fabrication.
Based on the above background, this study proposes an intelligent spray process framework integrating adaptive Bayesian optimization (BO) with Gaussian process regression (GPR), aimed at achieving the rapid and precise tailored fabrication of SA microspheres with target sizes (Fig. 1). Utilizing a custom-built micro-spraying system equipped with a real-time laser particle size analyzer, we selected SA concentration, spray distance, gas pressure, and liquid pressure as independent variables. The optimization objective was to maximize the yield proportion of droplets within a specific size range (90–110 µm). During this process, Latin hypercube sampling (LHS) was employed to acquire highly representative prior data, and the impact of varying prior data sizes on the model's convergence efficiency and iteration cost was thoroughly investigated. Furthermore, to address the persistent black-box nature of machine learning models, this study introduced the game-theoretic SHapley Additive exPlanations (SHAP) interpretability model. By extracting the SHAP values of each parameter, we quantitatively elucidated the dominant roles and nonlinear synergistic mechanisms of physical properties and gas/liquid pressures in governing the microsphere size distribution. Consequently, the multiple local optima predicted by the model were scientifically interpreted from the dual perspectives of physics and chemical engineering. This work not only successfully overcomes the chronic limitation of traditional spray-precipitation methods in achieving precise size control, but also establishes a highly interpretable, dual-driven “AI + Experiment” paradigm at the methodological level for the efficient and cost-effective synthesis of smart materials.
The experimental apparatus included a multi-channel microfluidic pressure pump (OB1 MK3+, Elveflow, France) coupled with an oil-free air compressor (OTS-550, Taizhou Outes Industry and Trade Co., Ltd) to precisely control the flow rates of the gas and liquid phases. A spray laser particle size analyzer (DP-02, Zhuhai OMEC Instruments Co., Ltd) was utilized for the real-time online monitoring of the droplet size distribution. Microscopic morphological characterization was performed using an inverted biological microscope (ECLIPSE Ts2, Nikon, Japan). Furthermore, the solution preparation, processing, lyophilization, and weighing procedures of the samples involved the use of a heating magnetic stirrer (C-MAG HS4, IKA, Germany), an ultrasonic cleaner (KQ2200, Kunshan Ultrasonic Instruments Co., Ltd), a freeze dryer (Scientz-10NIA, Ningbo Scientz Biotechnology Co., Ltd), and a precision electronic balance (ME204E, Mettler Toledo, Switzerland), respectively. Additionally, the internal-mixing micro-spraying nozzle device,30 which constitutes the core of the experimental setup, was custom-designed and fabricated by our research group. The gas and liquid outputs are controlled by a multi-channel ElveFlow pressure controller. One channel is driven by the gas pressure to control the liquid flow, while the other channel serves as the gas source, facilitating the simultaneous control of the material flow and the gas output volume. The spray device uses an internally mixed three-stream air atomization nozzle (the diameter of the liquid outlet of the nozzle is 1 mm, the diameter of the gas outlet is 1.4 mm, and the gas outlet is 0.3 mm higher than the liquid outlet), which can form a uniform shear force compared to traditional two-stream nozzles, resulting in a better shear effect between gas and liquid phases. At the nozzle outlet, there is a significant relative speed between the gas and liquid phases, generating a large shear force, which causes the precursor to be atomized and dispersed into extremely fine droplets. To prevent long-term corrosion by organic solvents, the nozzle is made of stainless steel. The collection device uses a Petri dish made of polytetrafluoroethylene (PTFE) dual-phobic surface material, which reduces issues such as microsphere breakage and agglomeration.
The atomized droplets travelled downward over a specific spray distance (d) and traversed the detection optical path of the spray laser particle size analyzer. Based on the Mie scattering theory, the detector captured the variations in laser scattering angles caused by droplets of different sizes (where larger particles yield smaller scattering angles). It enabled the real-time online calculation and output of the droplets' volume median diameter (D50) and size distribution range. Subsequently, the droplets fell into the underlying CaCl2-receiving bath, where the carboxyl groups on the SA molecular chains instantaneously underwent ionic crosslinking reactions with Ca2+ in solution. The droplets were allowed to cure statically in the receiving bath for 5 min, yielding stable SA gel microspheres. After collection and washing, the actual size distribution of the cured microspheres was further characterized.
280 data points (28 × 13 × 34 × 30). The optimization objective was to maximize the proportion of droplets in the 90–110 µm size range (y2), while simultaneously monitoring the volume median diameter (D50; y1).
To ensure uniform distribution and diversity of the initial data across the multi-dimensional space, Latin hypercube sampling (LHS) was employed to generate 5 and 10 sets of experimental parameter combinations as prior datasets. Spray experiments were then conducted to obtain the actual droplet proportions, which were subsequently used as starting points for the iterations. Given the characteristics of the small-sample dataset, Gaussian process regression (GPR) was selected as the surrogate model for the Bayesian optimization.
Upon completion of the model's self-optimization, the system entered a closed-loop, iterative validation phase. First, the partitioned test set was used to evaluate the model's generalization performance, calculating and reporting key metrics such as mean squared error (MSE), coefficient of determination (R2), and mean absolute error (MAE). Subsequently, the algorithm generated continuous combinations of independent variables within the predefined parameter boundaries and used the current best model to predict the target variable values along with their standard deviations. By screening globally for variable combinations that satisfied the boundary conditions, the optimal parameter combination that maximized the predicted value of the target variable (i.e., the yield proportion of 90–110 µm droplets) was identified. Its corresponding predicted value, standard deviation, and optimal kernel function were then output. Then, this optimal predicted parameter set was applied in actual spraying experiments for validation. If the error between the actual proportions measured in three consecutive experiments and the predicted value remained within a minimal range, the process optimization was deemed to have converged, and the iteration was terminated. If the convergence criteria were not met, the newly acquired experimental data and their corresponding actual droplet proportions were added to the training dataset, prompting the model to proceed to the next round of active learning. Finally, for the fully optimized model, the game-theoretic SHAP interpretability model was introduced to quantitatively evaluate the marginal contribution of each process feature to the microsphere proportion, thereby revealing the nonlinear coupling mechanisms among the parameters.
280 data points was constructed. To efficiently locate the optimum within this multi-dimensional space, a Bayesian optimization algorithm based on an adaptive strategy was employed, with its operational workflow illustrated in Fig. 2.
This strategy overcomes the limitations of traditional fixed-direction searches. It constructs a surrogate model via GPR and adaptively traverses the Radial Basis Function (RBF), Matern, Rational Quadratic, and Dot Product kernels during each iteration. Consequently, it dynamically matches the optimal kernel based on the minimum mean squared error (MSE) criterion. Simultaneously, the algorithm optimizes the noise level parameter over a log-uniform distribution and the maximum number of restarts over an integer range, thereby effectively preventing the model from being trapped in local minima while ensuring the positive definiteness of the kernel matrix. Such an adaptive learning mechanism enables the model to continuously refine its estimation of the hyperparameter space based on prior data, laying a robust algorithmic foundation for subsequent highly efficient parameter optimization.
280 independent variable combinations—formed by SA concentration, spray distance (d), gas pressure (GP), and liquid pressure (LP)—thereby effectively avoiding the over-concentration of sample points or the omission of critical regions. Based on this method, 5 and 10 sets of experimental parameter combinations were selected to conduct actual spray measurements, yielding a droplet size distribution of 90–110 µm, which served as prior data to initiate the optimization (Tables 1 and 2).
| Entry | SA/wt% | d/cm | GP/mbar | LP/mbar | MV (%) | PV (%) |
|---|---|---|---|---|---|---|
| 1 | 1.42 | 14 | 1250 | 1450 | 10.65 | — |
| 2 | 2.69 | 10 | 1400 | 1550 | 15.97 | — |
| 3 | 1.74 | 2 | 400 | 1750 | 9.42 | — |
| 4 | 1.32 | 13 | 1050 | 500 | 12.38 | — |
| 5 | 2.16 | 3 | 600 | 1900 | 17.33 | — |
| 6 | 2.60 | 2 | 350 | 600 | 0.00 | 20.50 |
| 7 | 2.20 | 3 | 600 | 1900 | 0.00 | 17.30 |
| 8 | 2.80 | 15 | 1850 | 2000 | 0.00 | 43.47 |
| 9 | 1.00 | 5 | 350 | 1250 | 9.92 | 16.00 |
| 10 | 2.80 | 2 | 950 | 1950 | 20.39 | 25.54 |
| 11 | 1.00 | 3 | 300 | 1450 | 1.89 | 0.00 |
| 12 | 2.80 | 2 | 450 | 500 | 23.91 | 43.94 |
| 13 | 2.70 | 15 | 1700 | 700 | 14.74 | 11.68 |
| 14 | 1.00 | 8 | 1200 | 800 | 12.61 | 14.76 |
| 15 | 1.00 | 2 | 2000 | 500 | 14.25 | 24.25 |
| 16 | 1.00 | 2 | 2000 | 500 | 14.69 | 14.00 |
| 17 | 2.80 | 2 | 450 | 500 | 15.45 | 23.86 |
| 18 | 2.80 | 2 | 450 | 500 | 15.51 | 21.17 |
| 19 | 2.80 | 2 | 950 | 1950 | 20.34 | 20.34 |
| 20 | 2.80 | 2 | 650 | 500 | 17.51 | 18.42 |
| 21 | 2.80 | 2 | 950 | 1950 | 20.31 | 20.34 |
| 22 | 2.80 | 2 | 650 | 500 | 18.35 | 17.47 |
| Entry | SA/wt% | d/cm | GP/mbar | LP/mbar | MV (%) | PV (%) |
|---|---|---|---|---|---|---|
| 1 | 2.38 | 4 | 400 | 2000 | 10.60 | — |
| 2 | 2.59 | 13 | 750 | 1550 | 0.00 | — |
| 3 | 2.69 | 7 | 1150 | 1250 | 18.85 | — |
| 4 | 1.74 | 15 | 950 | 500 | 15.59 | — |
| 5 | 1.00 | 11 | 600 | 1450 | 15.32 | — |
| 6 | 2.27 | 12 | 1100 | 1650 | 0.00 | — |
| 7 | 2.16 | 6 | 1500 | 1850 | 18.95 | — |
| 8 | 1.11 | 5 | 1300 | 900 | 14.85 | — |
| 9 | 1.21 | 14 | 1400 | 1100 | 13.16 | — |
| 10 | 1.42 | 10 | 1550 | 1900 | 13.48 | — |
| 11 | 1.00 | 2 | 1200 | 650 | 17.94 | 23.64 |
| 12 | 1.00 | 11 | 600 | 1450 | 12.87 | 15.30 |
| 13 | 1.00 | 2 | 750 | 1250 | 10.68 | 27.04 |
| 14 | 1.20 | 2 | 2000 | 500 | 0.00 | 26.71 |
| 15 | 1.00 | 2 | 550 | 500 | 8.76 | 16.40 |
| 16 | 1.00 | 2 | 450 | 1500 | 3.28 | 13.64 |
| 17 | 2.70 | 15 | 300 | 1550 | 18.71 | 24.87 |
| 18 | 2.70 | 15 | 350 | 1750 | 17.59 | 22.21 |
| 19 | 2.20 | 6 | 1500 | 1850 | 18.86 | 18.90 |
| 20 | 2.80 | 2 | 800 | 1650 | 11.96 | 17.62 |
| 21 | 2.70 | 9 | 800 | 2000 | 18.45 | 18.75 |
| 22 | 2.70 | 15 | 350 | 1750 | 18.34 | 18.60 |
By comparing the optimization trajectories for the two different prior data sizes, it can be observed that the optimization initiated with 5 prior data points required 17 iterations to complete 22 experiments. Its predicted droplet proportion was 17.47%, exhibiting good agreement with the actual measured value of 18.35%. In contrast, with 10 prior data points, 22 experiments were completed in only 12 iterations, and the predicted value of 18.60% closely matched the measured value of 18.34%. From the perspectives of the number of iterations and experimental cost, the 10 prior data points demonstrated a significant advantage. Although both approaches ultimately completed 22 experiments, the 10 prior data points reduced the number of iterations by 29.4%. Given that a single microfluidic spray experiment and the subsequent particle size measurement are relatively rapid (about 3 min), reducing the number of iterations directly decreases overall optimization time. These results clearly demonstrate that the size of the prior data directly affects the quality of subsequent optimization training. Sufficient prior data not only reduces the number of optimization iterations but also ensures more rational optimal variable combinations. Conversely, insufficient prior data leads to more iterations and is prone to local stagnation, where the optimized parameters cluster at boundary values.
Furthermore, a detailed analysis of the algorithm's convergence process is crucial for verifying the model's reliability. The model's convergence process represents the gradual approximation of predicted values to actual values as the number of iterations increases. As illustrated in Fig. 3, the model-predicted droplet yield proportion exhibited anomalous fluctuations, both high and low, during the optimization period. These fluctuations inherently reflect the exploration characteristics of the Bayesian optimization process within the uncharted parameter space.32 Even if a single prediction coincidentally aligns with the actual value, it may merely indicate entrapment in a local extremum. Effective convergence is signified only when the predicted values consistently and stably approximate the actual values across multiple iterations. Regardless of whether 5 or 10 prior data points were used, the model ultimately converged stably as iterations progressed. Furthermore, in conjunction with Fig. 4, it can be observed that as the number of experiments increases, the volume median diameter D50 of the micro-sprayed droplets gradually converges toward the expected 90–110 µm range. Concurrently, the yield of particles within the target range steadily increases, which aligns perfectly with the anticipated process optimization objectives.
![]() | ||
| Fig. 3 Actual and GPR-predicted droplet proportions versus experimental iterations for optimizations initiated with (a) five and (b) ten sets of prior data. | ||
![]() | ||
| Fig. 4 Scatter plots of actual droplet proportion and D50 versus iteration number for optimizations initiated with (a) five and (b) ten sets of prior data. | ||
To further validate the model's predictions and their practical applicability, the predicted optimal atomization conditions were applied to the calcium chloride receiving bath to achieve crosslinking and curing of the SA microspheres. As illustrated in Fig. 5, when the experiment was conducted with the parameter combination derived from the 5 prior data points, the yield of cured gel microspheres in the 90–110 µm range reached 19.67%, corresponding to an original droplet proportion of 18.35%. Conversely, when utilizing the parameters derived from the 10 prior data points, this microsphere proportion was 18.75%, corresponding to an original droplet proportion of 18.34%. Compared with the measured size distribution of the original droplets, the yield proportion of target-sized microspheres showed a slight deviation after post-curing. This phenomenon is primarily attributed to two physicochemical mechanisms: first, upon the droplets falling into the receiving bath, the calcium ions in the solution instantaneously undergo intense ionic crosslinking reactions with the carboxyl groups on the SA molecular chains. It induces conformational shrinkage of the polymer chains, thereby decreasing the overall size of the crosslinked gel microspheres relative to the original droplets. Second, during static curing, minor water evaporation from the system also affects the final size of the gel microspheres. Through statistical analysis of the volume median diameter (D50) before and after crosslinking, an average diameter shrinkage of approximately 4.5% to 6.0% was quantified in our experiments. In future practical production, introducing this average shrinkage rate (e.g., ∼5%) as a calibration coefficient into the objective function of the Bayesian model will further refine the predictive accuracy for the final product, completely bridging the gap between liquid droplet generation and solid microsphere formulation. Therefore, a reasonable discrepancy between the microsphere and droplet proportions within the target size range is considered a normal phenomenon. This further substantiates that the constructed Bayesian model can provide a reliable and flexible predictive benchmark for the tailored fabrication of target-sized microspheres.
The analysis reveals significant differences in the impact of varying prior data sizes on the model's training depth. For the 5 sets of prior data, prior to optimization (Fig. 6a and b), the scarcity of experimental data resulted in poor model generalization capability, failing to clearly extract the underlying mechanisms by which the four features influence the droplet proportion. However, after completing the active learning iterations, the distribution range of SHAP values for the liquid pressure (LP) feature expanded significantly, elevating it to a key dominant factor governing the droplet proportion. This machine learning-derived conclusion perfectly aligns with previously established physical principles that gas and liquid pressures are the most prominent external driving forces for fluid shear atomization 20. Conversely, in the analysis of the 10 prior data sets (Fig. 6c and d), the spray distance (d) exerted a strong negative inhibitory effect in some experiments; as the spray distance increased (indicated by the red, high-feature values), the predicted droplet proportion decreased significantly. From an aerodynamic perspective, this occurs because the droplet swarm undergoes severe dispersion and spatial secondary evolution over the extended flight distance, inevitably broadening the initially concentrated particle size distribution. In contrast, the SHAP values for the SA concentration feature consistently clustered around zero post-optimization, indicating that its marginal impact on the predicted target proportion is relatively minor within the current parameter window.
A comprehensive analysis of Fig. 6 further reveals a key global pattern: the extremely high values of each feature (represented by red points) tend to cluster at both ends of the SHAP value axis, exerting a tremendous absolute impact on the predicted values. From a data science perspective, this phenomenon perfectly corroborates the physical essence underlying the coexistence of multiple optimal parameter combinations discussed in Section 3.3. Owing to strong coupling among multiple variables, when a specific feature (such as SA concentration or spray distance d) reaches an extreme high value that may adversely affect the target particle size, the model can adaptively identify a compensatory pathway. Specifically, by substantially adjusting the gas pressure (GP) and liquid pressure (LP) to appropriate levels, the system rebalances the kinetic energy of fluid breakup against viscous dissipation. This ensures that the yield proportion of droplets within the 90–110 µm range can still be globally maximized even under extreme conditions. Ultimately, this SHAP-based feature interpretation approach not only endows the optimization model with profound physicochemical interpretability but also provides a solid scientific rationale for the multi-dimensional regulation of complex multiphase flow processes.
| This journal is © The Royal Society of Chemistry 2026 |