Designing the ultrasonic treatment of nanoparticle-dispersions via machine learning

Christina Glaubitz; Barbara Rothen-Rutishauser; Marco Lattuada; Sandor Balog; Alke Petri-Fink

doi:10.1039/D2NR03240F

View PDF VersionPrevious ArticleNext Article

Open Access Article

This Open Access Article is licensed under a Creative Commons Attribution-Non Commercial 3.0 Unported Licence

DOI: 10.1039/D2NR03240F (Paper) Nanoscale, 2022, 14, 12940-12950

Designing the ultrasonic treatment of nanoparticle-dispersions via machine learning†

Christina Glaubitz ^a, Barbara Rothen-Rutishauser ^a, Marco Lattuada ^b, Sandor Balog *^a and Alke Petri-Fink *^ab
^aAdolphe Merkle Institute, University of Fribourg, Chemin des Verdiers 4, 1700 Fribourg, Switzerland. E-mail: sandor.balog@unifr.ch; alke.fink@unifr.ch
^bChemistry Department, University of Fribourg, Chemin du Musée 9, 1700 Fribourg, Switzerland

Received 12th June 2022 , Accepted 13th August 2022

First published on 15th August 2022

Abstract

Ultrasonication is a widely used and standardized method to redisperse nanopowders in liquids and to homogenize nanoparticle dispersions. One goal of sonication is to disrupt agglomerates without changing the intrinsic physicochemical properties of the primary particles. The outcome of sonication, however, is most of the time uncertain, and quantitative models have been beyond reach. The magnitude of this problem is considerable owing to fact that the efficiency of sonication is not only dependent on the parameters of the actual device, but also on the physicochemical properties such as of the particle dispersion itself. As a consequence, sonication suffers from poor reproducibility. To tackle this problem, we propose to involve machine learning. By focusing on four nanoparticle types in aqueous dispersions, we combine supervised machine learning and dynamic light scattering to analyze the aggregate size after sonication, and demonstrate the potential to improve considerably the design and reproducibility of sonication experiments.

Engineered nanoparticles (ENPs) have been used in commercial products, such as foodstuff, cosmetics and personal care for over two decades.^1–3 Typical materials are zinc oxide, titania, ceria, and silica.^4,5 Consumers may find titania ENPs in toothpaste,⁶ and zinc oxide in sunscreen.⁷ Such particles are usually produced and sold in large quantities, and thus stored in a dry powdered form (nanopowder). When stored as powders, particles can form agglomerates, which are single particles clumping together, owing to attractive interparticle interactions.⁸ Given that the usefulness of ENPs is due to their specific and tunable physicochemical properties that are mediated by the vast surface-to-volume ratio, powders are to be liquified and redispersed before supplying them to manufacturing processes. To redisperse powders, ultrasonication is the most often-used method.^9,10 Indeed, sonication, acting via fluid cavitation, is preferred over ball milling or high shear mixing due to its greater efficiency.^11,12 Furthermore, ultrasonication is applied not only in industrial labs but in academic research labs as well, preparing ENPs in toxicological and environmental studies.^11,13,14 If overdone, ultrasonication, however, may considerably alter the physicochemical properties of the particles.^11,12,15,16 This effect has been observed especially for mass-produced metal oxide nanopowders, such as silica, titania or alumina, that are synthesized in the high temperature vapor phase.^12,15,16 For example, when it comes to hazard assessment addressing environmental or biological systems, any physicochemical change of the primary particles induced by sonication is undesired.¹¹ In such cases, the reduction of the primary particle size may strongly influence the pathway of cellular internalization,¹⁷ cytotoxicity,¹⁸ and biodistribution,¹⁹ and thus, the overall fate in environmental milieu.²⁰ At the same time, compared to the primary particles, agglomerates may render particle dosimetry characterization more difficult,^21–23 and may also exhibit different hazards.²⁴ Dominantly, two approaches have been followed in the sonication of ENPs: on the one hand, a generic standard operating procedure (SOP) is carried out,^25–28 which enforces uniformity in the experimental settings but directly comes with the disadvantage of non-adjustable experimental parameters and the eventuality of having unwanted changes in the particles, such as size reduction, shift in zeta-potential, or even the appearance of sonication-induced agglomerates.^11,29 On the other hand, custom-made protocols dedicated to given materials and given experimental conditions may be developed.^6,14,30–33 This is, in essence, the more attractive option, offering customization as well as the prospect of standardization. There is a caveat though: the parameter space is vast, and classical methods of optimization become severely limited by time and cost.^14,30–34

To get around this bottleneck eventually, here we approached the problem via machine learning (ML). ML is data-driven and known for the ability to surpass human performance in various situations by being able to capture non-intuitive nonlinear and multivariate relationships. Using supervised ML, we built an algorithmic model that is able to capture characteristic features of the complex relationships between (a) the outcome of redispersion and deagglomeration, and (b) the combinations of sonication parameters—such as particle concentration, dispersion volume, sonicator type, duration of sonication, and sonication power—and selected physicochemical properties of the particles—such as size, zeta-potential, isoelectric point, surface coating and material type. We use a gradient boosted decision tree algorithm and exclusively focus on four ENP types, where we expect similar behavior to ultrasonication. We characterize the outcome via the intensity-weighted Z-average hydrodynamic size (diameter) and polydispersity index (PDI) of the particles via dynamic light scattering (DLS). According to the fundamental principles of DLS, these two are the most reproducible parameters one can determine from DLS experiments.³⁵

Supervised ML aims at establishing a quantitative relationship between features (inputs) and labels (outputs). The outputs of sonication were the Z-average hydrodynamic radius and the PDI, and the inputs were the sonication parameters and particle properties. Our data-driven ML approach was performed on two consecutive studies (Fig. 1). In the first part of the study, we addressed well-defined—but not strictly monodisperse—model ENPs, which were synthesized, processed, and characterized in our own laboratory. For this, we synthesized aqueous dispersions of non-crystalline silicon dioxide nanoparticles (SiO₂ ENPs), commonly known as colloidal silica. The SiO₂ ENPs were synthesized with nominal diameters of roughly 40 nm, 70 nm, and 100 nm, respectively, using a co-condensation reaction adapted from Stöber et al.³⁶ Depending on the desired particle size, different relative amounts of ethanol, ammonia and water (MilliQ) were mixed and heated to 40 °C (ESI, Table SI1†). After stirring the mixture for 1 h at constant temperature for equilibration, tetraethyl orthosilicate was added, and the mixture was stirred for another 8 h. After the mixture cooled down to room temperature, the SiO₂ ENPs were purified by centrifugation (Thermo Scientific, 5000g, 5, 10, 15 and 20 min, five cycles in total). To mimic the extensive degree of agglomeration in powders, the ENPs were agglomerated via the change of ionic strength,³⁷ by adding magnesium chloride hexahydrate (50 g L⁻¹) to the aqueous dispersion. Agglomerated samples were purified again by dialysis and were resuspended in water before undergoing different sonication processes.


	Fig. 1 Workflow of our studies. In the first part of the study, experimental data are generated by characterizing agglomerated dispersions of model SiO₂ ENPs, before and after ultrasonication, using different, systematically combined sonication parameters. These data are used for supervising a ML algorithm, where the sonication parameters and particle sizes are fed into the algorithm as features (inputs), and the DLS results (Z-average and PDI) as predicting labels (outputs). By mapping and approximating the functional relationships between labels and features, the goal of the ML model is to predict the outcome of ultrasonication in terms of DLS analysis. In the second part of the study, meta-analysis is performed on data mined from peer-reviewed publications addressing the ultrasonication of particle systems. We focused on oxides—namely ZnO, SiO₂, CeO₂ and TiO₂—given their evident presence in consumer products.^38,39

The overall number of parameter combinations was obtained via a generalized fractional factorial experimental design aimed at extracting an optimal amount of information from the smallest possible number of experiments, while still being able to understand the relationship between the different parameters and parameter values with the experiment outcome.⁴⁰ In this design of experiments we considered that the primary particle size, particle concentration, dispersion volume, duration and power of sonication were found to have an effect on the degree of deagglomeration of particle agglomerates.^{11,12,41–44} We designed the sonication experiments using the pyDOE2-library,⁴⁵ and used a reduction factor of seven, which reduced the number of experiments from 108 (full factorial design) to 14, with every experiment carried out in triplicate. Each triplicated design contained one of either two or three levels of combinations of particle concentration, water volume, sonicator type (probe vs. bath), duration of sonication, and effective energy and energy density of sonication (Table 1).

Table 1 The experimental design space of ultrasonication. These sonication parameter combinations were used for all the three particle sizes. These parameters are features in the ML model. In the second study (meta-analysis), four additional features (particle properties) were included: isoelectric point, zeta-potential, material type, and surface coating

	Sample vol. (mL)	Particle conc. (mg mL⁻¹)	Type	Energy (J)	Duration (min)	Energy density (J mL⁻¹)
1	1	1	Probe	179	1	179
2	1	1	Probe	17256	20	17256
3	1	1	Bath	2025	45	2025
4	1	5	Probe	3576	20	3576
5	1	5	Probe	38826	45	38826
6	1	10	Probe	8046	45	8046
7	5	1	Probe	3576	20	715
8	5	1	Probe	38826	45	7765
9	5	5	Probe	8046	45	1609
10	5	10	Bath	194	1	39
11	10	1	Probe	8046	45	805
12	10	5	Bath	194	1	19
13	10	5	Bath	45	1	5
14	10	10	Bath	100	20	388

The ultrasonication devices were bath and horn-probe sonicators (Elmasonic P 60 H, ELMA and Branson SFX550 Sonifier equipped with a standard 13 mm diameter disruptor horn, Branson Ultrasonics Corp.). The sonication power and the corresponding energy release were calibrated by calorimetric measurements described elsewhere,²⁸ and the details of calibration are presented in the ESI (Tables SI3 and 4†). After sonication, the particles were characterized by DLS (Malvern Panalytical, Zetasizer Nano-ZS) at room temperature (25 °C). For DLS analysis, the given sample volumes (500, 100, and 50 μL) were diluted with 1 mL water (MilliQ) depending on the particle concentration (1, 5, and, 10 g L⁻¹). Dilution was necessary to minimize any potential bias owing to the negative impact of multiple light scattering and collective diffusion affecting Brownian dynamics.⁴⁶ Each diluted triplicate was then measured three times, and the auto-correlation functions were analyzed by the methods of cumulants^35,47 to determine the so-called Z-average and PDI.^35,47 Typical examples of the DLS field auto-correlation functions, and representative TEM micrographs of the agglomerates are shown in the ESI (Fig. SI1, 2 and 4†). The Z-average and PDI are both functions of the intensity-weighted particle size distribution, which is affected by the parameters of sonication. Loosely speaking, the smaller the Z-average, the higher the impact of sonication on the agglomerates. The overall goal of sonication is to redisperse (that is, disintegrate) agglomerates without altering the primary particles’ properties.

For this straightforward goal, it is essential to closely approximate the unknown but existing functional relationship between the inputs (parameters of sonication) and the corresponding outputs (Z-average and PDI). This task is called multivariate regression analysis, and it is used for predicting and forecasting. The functional relationship is nevertheless complex, and reliable and transferable quantitative regression models are not available yet, to the best of our knowledge. This is the point where we invoke supervised machine learning. The model we implemented is based on a gradient boosted decision tree (GBDT) algorithm by using the XGBoost-library.⁴⁸ The benefit of using GBDT is that it offers good efficiency and flexibility while being relatively fast and relatively easy to implement as well as interpret.^48–51 Therefore, GBDTs are optimal for limited datasets due to their robustness in comparison with e.g., a deep-learner.⁵² Additionally, they enable a straightforward ranking of the feature importance. This offers insights into decision-making of the model, and ranks the importance of the underlying physical processes during ultrasonication. The structure of decision trees is composed of nodes and branches, where branches make ‘one-way’ connections between the nodes. Besides the root node—the very first node—there are two elementary types of nodes: the first type is non-leaf nodes, which are internal crossroad junctions of the decision route in the tree, representing either an attribute (e.g., “ultrasonication via bath sonicator”, “ultrasonication via horn sonicator”) or a question (e.g., is the particle size under 50 nm?”, “is the released energy density over 15 J mL⁻¹?”). The second type is leaf nodes at the end of the decision-making process, which offer the prediction of the label. To improve the predictive accuracy of decision trees, a regression tree algorithm can be deployed, where trees are trained in consecutive learning cycles. In these cycles, the final prediction of a tree is tested on a measured data point. If the tree fails to predict the target, another tree is built on this error until a tree is trained where the prediction and the measured value overlap. With this, the final tree model can map, generalize and compare rules from an ensemble of specific and individual observations. A set of observations forms a dataset, which is described by two main attributes: features (inputs: parameters of sonication and particle properties) and labels (outputs: Z-average and PDI).

Supervising an ML has three main phases: training, validation, and testing. During training, the ML model is given known features and corresponding labels to find a relationship between these two. In a certain number of training rounds, the predictive accuracy of the model is improved by altering the tree structure and the decision rules. To achieve this improvement, the accuracy of the model's predictive power has to be tested after every training step, which is called validation. Briefly, a part of the training data is withheld during training and used to test the success of the trained model. If the prediction for the validation data is incorrect, the model starts another learning round with an adjusted tree. In the testing phase, the ability to generalize and approximate these relationships is tested by quantifying the agreement between ML-prediction and the so-far unseen data. To train, validate, and test the model, feature values must be formatted. First, categorical feature values (for example, parameter corresponding to sonicator type: horn vs. bath) were transformed into numerical values by One-Hot encoding, using the Scikit-learn library in Python.⁵³ Encoding creates a ‘feature vector’ for each category of the parameter and fills it with either 1 or 0 to encode the presence or absence of the feature.^51,54,55 Second, feature values were power-transformed to approximate a standard normal distribution.^51,56,57 To define training and test data, we used random sampling and stratified splitting. Stratification was based on the average values of the features (Z-average or PDI), and as a result of stratification, the training and test sets were balanced, in the sense that both were representative of the population of the observations we had at hand. Stratification forces the model to learn on the full range of label values, and thus, it promotes higher prediction quality. The training set and test set were non-intersecting, that is, they had no common element. This was important to prevent the phenomenon of the so-called data leakage, which is the simultaneous occurrence of data with identical features–label combo in the training as well as in the test set.⁵¹ Therefore, the experimental triplicates of identical labels were never split, were kept in a given set, and any triplicate went either into the training or the test set. We used 70% of the data in the training sets and 30% in the test sets. To train our ML model, the algorithm was presented to features and the corresponding labels of the training set. During training, the model was repetitively tested on a small set, which is referred to as the validation set. In our case, 20% of the training set was allocated into the validation set. The validation set was withheld from the actual training rounds, but it was presented to test the prediction quality after single training steps. This was necessary to tune the hyperparameters of the model.⁵⁸ To optimize the values of the hyperparameters (ESI,† machine learning terms), we used the tree of the Parzen estimator algorithm (maximum 200 trials) implemented in the Optuna library.⁵⁹ At the end of supervision, the quality of learning, that is, the corresponding prediction accuracy was evaluated on the test set.⁵¹ The quality of prediction was quantified by the R² score, which is the coefficient of determination. R² is in fact equal to the square of the Pearson (linear) correlation coefficient, and quantifies the agreement between the experimentally measured values and ML-predicted values. By definition R² = explained variation/total variation, and it may take values between 0 (no agreement) and 1 (perfect agreement). The structure of our ML approach is summarized in Fig. 2. The comparison between experimental values and values predicted by our ML model addressing the colloidal silica particles is shown in Fig. 3.


	Fig. 2 The main units of our supervised machine learning. (a) Definition of data subsets we used for training, validating and testing the ML model. Data splitting is done randomly, but in a stratified manner. The goal of stratification is to obtain balanced subsets wherein the data population is represented equally. (b) Depiction of the algorithm of decision trees, where decisions are taken at nodes and propagating to the next nodes via branches. The validation set is used for hyperparameter optimization, which is achieved by adjusting the number of nodes and branches of the trees. Once the training has finished, the best model is ‘blind-tested’ with data unseen during training. (c) If the model passes this evaluation step, it may be useful for interpreting the relationship between inputs (features) and outputs (labels) by ranking the importance of the features in the ML model prediction.


	Fig. 3 Parity plots of log₁₀Z-average and log₁₀ PDI values of the SiO₂ ENPs synthesized, processed, and characterized in our lab. Data in gray color indicate ten independent training and testing rounds, while the data in turquoise/red color highlight the most successful training (88 data points) and testing (38 data points). The two models for Z-average and PDI show a R² of 0.76 and 0.75. These values correspond to a linear correlation coefficient better than 0.87. The dashed black lines indicate perfect predictions (R² = 1). A distribution of the R² score for the test set of 100 newly randomly seeded and trained models can be found in Fig. SI7.†

To test the ability to extrapolate by our ML model, we predicted labels whose features were not from the interval of the training set. For this, we synthesized and characterized a new batch of particles (approx. 80 nm SiO₂ ENP) and constructed a new experimental design (Table 2) with new parameter levels. Apart from one instance, the agreement between experimental and predicted triplicates is very good with a 6% relative error on average, but the model struggles with predicting accurately larger Z-average values. This, in part, is due to the fact that agglomerates are very heterogeneous in size, and the degree of heterogeneity scales with size. Therefore, the larger the mean aggregate size, the broader the size distribution, and thus, the expectable noise is larger.⁶⁰ Second, the uncertainty of bath sonication is larger than probe sonication, and the noise in the corresponding data points is larger. This indicates some challenges in the reproducibility of bath sonication, likely due to variations in the experimental conditions, such as water and room temperature, relative humidity, bath volume and the vertical and horizontal position of the sonicated vessel.^11,61

Table 2 The performance of our ML model on the sonication experiment of 80 nm diameter SiO₂ particles. The ML model was neither trained nor validated nor tested on these particles beforehand

	Parameters of sonication						Result
	Sample vol. (mL)	Particle conc. (mg mL⁻¹)	Type	Amplitude (%)	Duration (min)	Energy density (J mL⁻¹)	Predicted Z-ave (nm)	Measured Z-ave (nm)	Predicted PDI	Measured PDI
1	2	2	Bath	50	5	258	337 ± 67	324 ± 61	0.38 ± 0.12	0.30 ± 0.23
2	7.5	2	Bath	50	30	298	239 ± 22	796 ± 26	0.21 ± 0.10	0.50 ± 0.13
3	7.5	2	Probe	20	5	322	124 ± 17	141 ± 2	0.08 ± 0.04	0.10 ± 0.01
4	7.5	7.5	Bath	50	5	69	325 ± 40	331 ± 33	0.44 ± 0.25	0.31 ± 0.18
5	2	7.5	Probe	20	30	3878	150 ± 39	112 ± 5	0.12 ± 0.11	0.05 ± 0.01

After successfully constructing the ML model and predicting the outcome of ultrasonicating the silica particles, in the second part of this study we apply our ML approach to the meta-analysis of published and peer-reviewed work reported on the sonication and DLS characterization of oxide particles, such as ZnO, CeO₂ and TiO₂. Following the guidelines of Field and Gillett,⁶² we compiled a set of 203 data points collected from 12 peer-reviewed articles.^{12,30,31,43,63–70} Articles relevant to the project were searched online, by using the combinations of the keywords of “ultrasonication”, “nanoparticles”, and “oxide”. Compared to the laboratory study, the number of features (Table 1) could be increased by adding particle properties like zeta-potential in water, surface hydrophobicity/hydrophilicity, and isoelectric point. Owing to larger dataset, the ML model performed better (Fig. 4), while the structure of supervising the machine learning algorithm was very similar to the lab-based study.


	Fig. 4 Parity plots of log₁₀Z-average and log₁₀ PDI values of oxide ENPs synthesized, processed and characterized elsewhere (meta-analysis). Data in gray color indicate the progress of ten independent training and testing rounds, and data in turquoise/red color show the best models. We had a total of 383 data points for training and 289 data points for testing, and compared to the lab-based ML analysis, we achieved better performance (R² = 0.82 and 0.84 for the Z-average and PDI, which correspond to a linear correlation coefficient higher than 0.9). The dashed black lines indicate perfect predictions (R² = 1). A distribution of the R² score for the test set of 100 newly randomly seeded and trained models can be found in Fig. SI7.†

Nevertheless, similar to the lab-based study, the accuracy at large Z-average values and with bath sonicated samples is also decreased. As the final evaluation of the performance of our ML model, we tested two commercially available ENPs synthesized on a large-scale (aeroxide TiO₂ P 25 and aerosil SiO₂ 200, both by Evonik Operations GmbH) following the experimental design detailed in Table 2. The values predicted by the ML model and the values measured by DLS are listed in Table 3.

Table 3 The performance of our ML model on the sonication of commercialized large-scale produced ENPs. The ML model was nor trained nor validated nor tested on these parameter combinations beforehand

	Aeroxide (TiO₂)				Aerosil (SiO₂)
	Predicted size (nm)	Measured size (nm)	Predicted PDI	Measured PDI	Predicted size (nm)	Measured size (nm)	Predicted PDI	Measured PDI
1	289 ± 14	302 ± 15	0.40 ± 0.09	0.42 ± 0.08	501 ± 41	476 ± 51	0.71 ± 0.34	0.64 ± 0.24
2	261 ± 20	922 ± 63	0.23 ± 0.17	0.61 ± 0.17	171 ± 53	309 ± 39	0.24 ± 0.08	0.57 ± 0.15
3	371 ± 29	352 ± 13	0.34 ± 0.10	0.38 ± 0.07	181 ± 17	173 ± 8	0.09 ± 0.08	0.13 ± 0.02
4	275 ± 36	285 ± 51	0.38 ± 0.09	0.40 ± 0.09	339 ± 87	306 ± 8	0.66 ± 0.22	0.57 ± 0.19
5	143 ± 12	144 ± 26	0.22 ± 0.09	0.19 ± 0.07	114 ± 17	123 ± 2	0.29 ± 0.10	0.34 ± 0.02

Next, we were interested in identifying the relative importance of the individual ultrasonication parameters and particle properties we used as an input on the outcome of the predictions. With this, we were also able to see what parameters are the most decisive—in the eyes of our model—in the process of particle ultrasonication, giving us insights into the underlying physical process of ultrasonication. To obtain these insights, we performed a so-called feature importance analysis (FIA). Feature importance analysis supports strongly the interpretation of ML-prediction.⁷¹ FIA, in essence, assigns a score to each feature used in the ML model, based on their relative importance in predicting the label values. The higher the score, the greater the influence of the feature. Hence, if one wants to optimize ultrasonication experiments in the future, it will be time-saving to start tweaking the most influential parameter and then to follow the importance hierarchy. Our FIA is based on the so-called Shapely values.⁷² Shapley values were invented in the field of cooperative games, and loosely speaking, they are intended to establish a basis of merit-based payoff, by quantifying the marginal contributions of players of a team in a given game and the associated reward.⁷³ In a cooperative game, the success of each player depends not only on what they do, but also on how the players cooperate together. Accordingly, the most useful team-player gets the highest share of the reward. Calculating Shapley values is computational very intensive, and thus, we compute with the method introduced by Lundberg and Lee (SHapley Additive exPlanations, SHAP).⁷⁴ According to the SHAP values, in our ML model trained on the silica ENPs synthesized and processed in our laboratory, the sonicator type has the highest influence on the deagglomeration success and the obtained PDI (Fig. 5a and b). This most likely reflects the fact that the available range of sonication energy is dependent on the type of sonication. The comparison of SHAP values between the models for PDI and Z-average shows only a marginal difference. In the meta-analysis, it is interesting to see that in the ML model, the isoelectric point, the zeta-potential, and surface coating only show a low influence for the ultrasonication process, their cumulative share is less than 15% (Fig. 5c and d). Apart from the energy-related quantities, the material type and particle size are however important features of the model. Their influence may be interpreted by using colloidal science: the amplitude of van der Waals forces—binding the particle agglomerates—is particle size and particle material dependent,^75,76 and thus the model picks up their important role in the binding energy of the particle agglomerates and their influence on cluster deagglomeration.


	Fig. 5 Feature importance analysis by normalized Shapley values. The more important the feature in our ML model, the higher the share in the pie chart. Parameters relating to the sonication process are presented in a cold (blue/green) color palette, and parameters relating to particle physicochemical properties are shown in hot (red/yellow) color palette. (a) Analysis of Z-average for model SiO₂ NPs synthesized, processed and characterized in our lab. (b) Analysis of PDI for model SiO₂ NPs. (c) Analysis of Z-average for oxide NPs synthesized, processed and characterized elsewhere. (d) Analysis of PDI for oxide NPs.

Last but not least, we acknowledge that our study has limits, which point out potential subjects for future studies. First, while DLS is one of the most frequently used method in the characterization of particle dispersions, it is an ensemble technique that is very sensitive to outliers, which may lead to bias, and thus, requires carefully prepared and reproducible samples. Therefore, while DLS has its own merits, other in situ, and perhaps more robust characterization techniques—such as particle tracking analysis, Taylor dispersion analysis, small-angle scattering and diffraction methods—may serve the purpose equally well, if not better. Our choice of experimental characterization technique was due to the widespread use and accessibility of DLS—and therefore the largest number of published data points. Second, our ML model was developed on so-called horn and bath ultrasonicators, but cup horn sonication—also a frequently used device type—is not addressed in this study due to the lack of published data. Third, sonication may benefit from the use of dispersing agents, but we do not address their presence and role here. Fourth, in this study, we concentrated on a given class of ENPs with somewhat similar physicochemical properties, but other highly relevant materials, such as iron oxides, aluminum oxides, quantum dots, carbon nanotubes, or even particle mixtures were not addressed. Additionally, the model might show lower predictability for data points out of the range the model was trained on, e.g. micro sized particles or different particle morphologies like sheets or wires. Fifth, while our analyses are sound, we are able to offer only a hierarchy of importance and degree of association of the features to interpret the prediction of our ML model. Therefore, a detailed mechanistic understanding of the model and a casual inference is missing. To describe in quantitative detail the ML model in terms of cause and effect (by, for example, closed-form analytic algebraic expressions) is beyond our current capacity. Explainable and fully transparent ML is an active field of debate,^77–81 which, however, concerns not only us, but any ML models where information content available (data) is limited.

As a final note for outlook, we co-published a web-based application with a graphical user interface (https://sonipredict.herokuapp.com/) where we offer quantitative guidance for designing sonication processes. While the application in its current form is based on the ML approach presented here, with creating a larger data bank, we hope to extend the model to cover increasing number of parameters and experimental scenarios of greater complexity, such as different dispersants and material types and sizes. The ML model can be greatly improved by incorporating new data, as any ML model learns best on data of high quality and of high volume, and we also call to the community to support us and send their own results on ultrasonicated nanoparticle dispersions. To collect more data, we also co-published a new database (https://tineglaubitz-sonidb-app-8z3bkw.streamlitapp.com/) for researchers to send in their data points. With a collective effort we aim at improving ML analyses, and promoting reproducibility in this impactful field.

Abbreviations

DLS	Dynamic light scattering
ENP	Engineered nanoparticle
GBDT	Gradient boosted decision tree
ML	Machine learning
PDI	Polydispersity index

Author contributions

CG conceived the idea, synthesized the particles, developed the experimental design, performed and analyzed the experiments, built and trained the ML model. SB supervised CG in project management and data analysis. CG and SB wrote the manuscript through contributions of AF, BRR, and ML. AF, BRR, and ML joint-supervised CG during the experiments.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

The authors are grateful for the support of Dr Patricia Taladriz-Blanco, Laetitia Haeni, Liliane Ackermann Hirschi, Dr Miroslava Nedyalkova and Dr Christoph Geers. The authors are grateful for the financial support of the Adolphe Merkle Foundation, the University of Fribourg, and the Swiss National Science Foundation through the National Centre of Competence in Research Bio-Inspired Materials and the grant SNF no. 200020_184635.

References

B. Swain, J. R. Park, K.-S. Park and C. G. Lee, Synthesis of cosmetic grade TiO2-SiO2 core-shell powder from mechanically milled TiO2 nanopowder for commercial mass production, Mater. Sci. Eng., C, 2019, 95, 95–103 CrossRef CAS PubMed.
S. Polat, E. Ağçam, B. Dündar and A. Akyildiz, Nanoparticles in Food Packaging: Opportunities and Challenges, in Health and Safety Aspects of Food Processing Technologies, Springer International Publishing, 2019, pp. 577–611 Search PubMed.
S. Raj, U. Sumod, S. Jose and M. Sabitha, Nanotechnology in cosmetics: Opportunities and challenges, J. Pharm. BioAllied Sci., 2012, 4(3), 186–193 CrossRef PubMed.
R. Kessler, Engineered Nanoparticles in Consumer Products: Understanding a New Ingredient, Environ. Health Perspect., 2011, 119(3), A120–A125 CrossRef PubMed.
J. Dahle and Y. Arai, Environmental Geochemistry of Cerium: Applications and Toxicology of Cerium Oxide Nanoparticles, Int. J. Environ. Res. Public Health, 2015, 12(2), 1253–1278 CrossRef PubMed.
F. Al-Salman, A. Ali Redha, H. Al-Shaikh, L. Hazeem and S. Taha, Toxicity evaluation of TiO2 nanoparticles embedded in toothpaste products, GSC Biol. Pharm. Sci., 2020, 12(1), 102–115 CrossRef CAS.
B. Gulson, M. McCall, M. Korsch, L. Gomez, P. Casey, Y. Oytam, A. Taylor, M. McCulloch, J. Trotter, L. Kinsley and G. Greenoak, Small Amounts of Zinc from Zinc Oxide Particles in Sunscreens Applied Outdoors Are Absorbed through Human Skin, Toxicol. Sci., 2010, 118(1), 140–149 CrossRef CAS PubMed.
S. Shrestha, B. Wang and P. Dutta, Nanoparticle processing: Understanding and controlling aggregation, Adv. Colloid Interface Sci., 2020, 279, 102162 CrossRef CAS PubMed.
H. Feng, G. Barbosa-Canovas and J. Weiss, Ultrasound Technologies for Food and Bioprocessing, Springer, 2011 Search PubMed.
F. Chemat, Zill-e-Huma and M. K. Khan, Applications of ultrasound in food technology: Processing, preservation and extraction, Ultrason. Sonochem., 2011, 18(4), 813–835 CrossRef CAS PubMed.
J. S. Taurozzi, V. A. Hackley and M. R. Wiesner, Ultrasonic dispersion of nanoparticles for environmental, health and safety assessment–issues and recommendations, Nanotoxicology, 2011, 5(4), 711–729 CrossRef CAS PubMed.
N. Mandzy, E. Grulke and T. Druffel, Breakage of TiO2 agglomerates in electrostatically stabilized aqueous dispersions, Powder Technol., 2005, 160(2), 121–126 CrossRef CAS.
P. Laux, C. Riebeling, A. M. Booth, J. D. Brain, J. Brunner, C. Cerrillo, O. Creutzenberg, I. Estrela-Lopis, T. Gebel, G. Johanson, H. Jungnickel, H. Kock, J. Tentschert, A. Tlili, A. Schäffer, A. J. A. M. Sips, R. A. Yokel and A. Luch, Challenges in characterizing the environmental fate and effects of carbon nanotubes and inorganic nanomaterials in aquatic systems, Environ. Sci.: Nano, 2018, 5(1), 48–63 RSC.
J. M. Cohen, J. Beltran-Huarac, G. Pyrgiotakis and P. Demokritou, Effective delivery of sonication energy to fast settling and agglomerating nanomaterial suspensions for cellular studies: Implications for stability, particle kinetics, dosimetry and toxicity, NanoImpact, 2018, 10, 81–86 CrossRef PubMed.
M. Aoki, T. A. Ring and J. S. Haggerty, Analysis and Modeling of the Ultrasonic Dispersion Technique, Adv. Ceram. Mater., 1987, 2(3A), 209–212 CrossRef CAS.
O. Vasylkiv and Y. Sakka, Synthesis and Colloidal Processing of Zirconia Nanopowder, J. Am. Ceram. Soc., 2001, 84(11), 2489–2494 CrossRef CAS.
J. Rejman, V. Oberle, I. S. Zuhorn and D. Hoekstra, Size-dependent internalization of particles via the pathways of clathrin- and caveolae-mediated endocytosis, Biochem. J., 2004, 377(1), 159–169 CrossRef CAS PubMed.
Y. Pan, S. Neuss, A. Leifert, M. Fischler, F. Wen, U. Simon, G. Schmid, W. Brandau and W. Jahnen-Dechent, Size-Dependent Cytotoxicity of Gold Nanoparticles, Small, 2007, 3(11), 1941–1949 CrossRef CAS PubMed.
S.-D. Li and L. Huang, Pharmacokinetics and Biodistribution of Nanoparticles, Mol. Pharm., 2008, 5(4), 496–504 CrossRef CAS PubMed.
T. K. Darlington, A. M. Neigh, M. T. Spencer, O. T. Nguyen and S. J. Oldenburg, Nanoparticle characteristics affecting environmental fate and transport through soil, Environ. Toxicol. Chem., 2009, 28(6), 1191 CrossRef CAS PubMed.
V. Hirsch, C. Kinnear, L. Rodriguez-Lorenzo, C. A. Monnier, B. Rothen-Rutishauser, S. Balog and A. Petri-Fink, In vitro dosimetry of agglomerates, Nanoscale, 2014, 6(13), 7325–7331 RSC.
J. M. Cohen, J. G. Teeguarden and P. Demokritou, An integrated approach for the in vitro dosimetry of engineered nanomaterials, Part. Fibre Toxicol., 2014, 11(1), 20 CrossRef PubMed.
L. Rodriguez-Lorenzo, B. Rothen-Rutishauser, A. Petri-Fink and S. Balog, Nanoparticle Polydispersity Can Strongly Affect In Vitro Dose, Part. Part. Syst. Charact., 2015, 32(3), 321–333 CrossRef CAS.
S. Murugadoss, F. Brassinne, N. Sebaihi, J. Petry, S. M. Cokic, K. L. Van Landuyt, L. Godderis, J. Mast, D. Lison, P. H. Hoet and S. van den Brule, Agglomeration of titanium dioxide nanoparticles increases toxicological responses in vitro and in vivo, Part. Fibre Toxicol., 2020, 17(1), 10 CrossRef CAS PubMed.
Preparing suspensions of nanoscale metal oxides for biological testing, Version 2.3, NanoCare, 2007 Search PubMed.
PROSPEcT, Protocol for Nanoparticle Dispersion, 2010 Search PubMed.
Preparing suspensions of nanomaterials in serumcontaining medium, Version 2.0, NanoGEM, 2011 Search PubMed.
D. Gilliland , Standardised dispersion protocols for high priority materials groups, NanoDefine Technical Report D2.3: Wageningen, 2016.
S. Pradhan, J. Hedberg, E. Blomberg, S. Wold and I. O. Wallinder, Effect of sonication on particle dispersion, administered dose and metal release of non-functionalized, non-inert metal nanoparticles, J. Nanopart. Res., 2016, 18(9), 285 CrossRef PubMed.
I. Kaur, L.-J. Ellis, I. Romer, R. Tantra, M. Carriere, S. Allard, M. Mayne-L’Hermite, C. Minelli, W. Unger, A. Potthoff, S. Rades and E. Valsami-Jones, Dispersion of Nanomaterials in Aqueous Media: Towards Protocol Optimization, J. Visualized Exp., 2017, 130, 56074 Search PubMed.
J. M. Cohen and P. Demokrito, Dosimetry for In Vitro Nanotoxicology: Too Complicated to Consider, Too Important to Ignore, in Nanoparticles in the Lung, CRC Press, 2015, pp. 267–293 Search PubMed.
R. Grall and S. Chevillard, SOP for Dispersion, NANoREG, 2017 Search PubMed.
B. Hellack and C. Nickel, Dispersion Protocol for Nanoparticle Suspension by Cup Horn Sonication, Version 1.1, nanOxiMet, 2016 Search PubMed.
K. A. Jensen, The NANOGENOTOX standard operational procedure for preparing batch dispersions for in vitro and in vivo toxicological studies, Version 1.2, NANOGENOTOX, 2018 Search PubMed.
ISO 22412:2017, Particle size analysis—Dynamic light scattering (DLS), International Organization for Standardization 2017.
W. Stöber, A. Fink and E. Bohn, Controlled growth of monodisperse silica spheres in the micron size range, J. Colloid Interface Sci., 1968, 26(1), 62–69 CrossRef.
C. Pfeiffer, C. Rehbock, D. Hühn, C. Carrillo-Carrion, D. J. de Aberasturi, V. Merk, S. Barcikowski and W. J. Parak, Interaction of colloidal nanoparticles with their local environment: the (ionic) nanoenvironment around nanoparticles is different from bulk and determines the physico-chemical properties of the nanoparticles, J. R. Soc., Interface, 2014, 11(96), 20130931 CrossRef PubMed.
The appropriateness of existing methodologies to assess the potential risks associated with engineered and adventitious products of nanotechnologies; SCENIHR/002/05 SCIENTIFIC COMMITTEE ON EMERGING AND NEWLY IDENTIFIED HEALTH RISKS (SCENIHR): 2006.
S. Raj, U. Sumod, S. Jose and M. Sabitha, Nanotechnology in cosmetics: Opportunities and challenges, J. Pharm. BioAllied Sci., 2012, 4(3), 186 CrossRef PubMed.
I. Surowiec, L. Vikström, G. Hector, E. Johansson, C. Vikström and J. Trygg, Generalized Subset Designs in Analytical Chemistry, Anal. Chem., 2017, 89(12), 6491–6497 CrossRef CAS PubMed.
I. M. Mahbubul, R. Saidur, M. A. Amalina, E. B. Elcioglu and T. Okutucu-Ozyurt, Effective ultrasonication process for better colloidal dispersion of nanofluid, Ultrason. Sonochem., 2015, 26, 361–369 CrossRef CAS PubMed.
R. R. R. Marín, F. Babick, G.-G. Lindner, M. Wiemann and M. Stintz, Effects of Sample Preparation on Particle Size Distributions of Different Types of Silica in Suspensions, Nanomaterials, 2008, 8(7), 454–472 CrossRef PubMed.
T. Meißner, K. Oelschlägel and A. Potthoff, Dispersion of nanomaterials used in toxicological studies: a comparison of sonication approaches demonstrated on TiO2 P25, J. Nanopart. Res., 2014, 16, 2228 CrossRef.
R. Tantra, A. Sikora, N. B. Hartmann, J. R. Sintes and K. N. Robinson, Comparison of the effects of different protocols on the particle size distribution of TiO2 dispersions, Particuology, 2015, 19, 35–44 CrossRef CAS.
R. Sjögren and D. Svensson, https://github.com/clicumu/pyDOE2, 2018.
R. Finsy, Particle sizing by quasi-elastic light scattering, Adv. Colloid Interface Sci., 1994, 52, 79–143 CrossRef CAS.
D. E. Koppel, Analysis of Macromolecular Polydispersity in Intensity Correlation Spectroscopy: The Method of Cumulants, J. Chem. Phys., 1972, 57(11), 4814–4820 CrossRef CAS.
T. Chen and C. Guestrin, XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [Internet], ACM, New York, NY, USA, 2016, pp. 785–794 Search PubMed.
J. H. Friedman, Greedy function approximation: A gradient boosting machine, Ann. Stat., 2001, 29(5), 1189–1232 Search PubMed.
B. Kim, R. Khanna and O. Koyejo, Examples are not enough, learn to criticize! criticism for interpretability, in Proceedings of the 30th International Conference on Neural Information Processing Systems, Curran Associates Inc., Barcelona, Spain, 2016, pp. 2288–2296 Search PubMed.
K. M. Jablonka, D. Ongari, S. M. Moosavi and B. Smit, Big-Data Science in Porous Materials: Materials Genomics and Machine Learning, Chem. Rev., 2020, 120(16), 8066–8129 CrossRef CAS PubMed.
J. Jiang, R. Wang, M. Wang, K. Gao, D. D. Nguyen and G.-W. Wei, Boosting Tree-Assisted Multitask Deep Learning for Small Scientific Datasets, J. Chem. Inf. Model., 2020, 60(3), 1235–1244 CrossRef CAS PubMed.
L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. Vanderplas, A. Joly, B. Holt and G. Varoquaux, API design for machine learning software: experiences from the scikit-learn project 2013, p. arXiv:1309.0238. https://ui.adsabs.harvard.edu/abs/2013arXiv1309.0238B (accessed September 01, 2013).
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson and K. J. Millman, SciPy 1.0: fundamental algorithms for scientific computing in Python, Nat. Methods, 2020, 17, 261–272 CrossRef CAS PubMed.
D. Harris and S. Harris, in Digital design and computer architecture, Morgan Kaufmann, San Francisco, California, 2013 Search PubMed.
I.-K. Yeo and R. A. Johnson, A New Family of Power Transformations to Improve Normality or Symmetry, Biometrika, 2000, 87(4), 954–959 CrossRef.
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, A. Vijaykumar, A. P. Bardelli, A. Rothberg, A. Hilboll, A. Kloeckner, A. Scopatz, A. Lee, A. Rokem, C. N. Woods, C. Fulton, C. Masson, C. Häggström, C. Fitzgerald, D. A. Nicholson, D. R. Hagen, D. V. Pasechnik, E. Olivetti, E. Martin, E. Wieser, F. Silva, F. Lenders, F. Wilhelm, G. Young, G. A. Price, G.-L. Ingold, G. E. Allen, G. R. Lee, H. Audren, I. Probst, J. P. Dietrich, J. Silterra, J. T. Webber, J. Slavič, J. Nothman, J. Buchner, J. Kulick, J. L. Schönberger, J. V. de Miranda Cardoso, J. Reimer, J. Harrington, J. L. C. Rodríguez, J. Nunez-Iglesias, J. Kuczynski, K. Tritz, M. Thoma, M. Newville, M. Kümmerer, M. Bolingbroke, M. Tartre, M. Pak, N. J. Smith, N. Nowaczyk, N. Shebanov, O. Pavlyk, P. A. Brodtkorb, P. Lee, R. T. McGibbon, R. Feldbauer, S. Lewis, S. Tygier, S. Sievert, S. Vigna, S. Peterson, S. More, T. Pudlik, T. Oshima, T. J. Pingel, T. P. Robitaille, T. Spura, T. R. Jones, T. Cera, T. Leslie, T. Zito, T. Krauss, U. Upadhyay, Y. O. Halchenko, Y. Vázquez-Baeza and C. SciPy, SciPy 1.0: fundamental algorithms for scientific computing in Python, Nat. Methods, 2020, 17(3), 261–272 CrossRef CAS PubMed.
J. Bergstra, B. Komer, C. Eliasmith, D. Yamins and D. D. Cox, Hyperopt: a Python library for model selection and hyperparameter optimization, Comput. Sci. Discovery, 2015, 8, 014008 CrossRef.
T. Akiba, S. Sano, T. Yanase, T. Ohta and M. Koyama, Optuna: A Next-generation Hyperparameter Optimization Framework, in Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019 Search PubMed.
S. K. Friedlander, Smoke, Dust, and Haze: Fundamentals of Aerosol Dynamics, Oxford University Press, 2nd edn, 2000 Search PubMed.
C. C. Nascentes, M. Korn, C. S. Sousa and M. A. Z. Arruda, Use of Ultrasonic Baths for Analytical Applications: A New Approach for Optimisation Conditions, J. Braz. Chem. Soc., 2001, 2(1), 57–63 CrossRef.
A. P. Field and R. Gillett, How to do a meta-analysis, Br. J. Math. Stat. Psychol., 2010, 63(3), 665–694 CrossRef PubMed.
J. Bacon, C. Rielly and N. G. Özcan-Taşkin, in Breakup of Silica Nanoparticle Clusters Using Ultrasonication, Presented at the 16th European Conference on Mixing (Mixing 16), Toulouse, France, Toulouse, France, 9–12 September 2018.
J. Beltran-Huarac, Z. Zhang, G. Pyrgiotakis, G. DeLoid, N. Vaze, S. M. Hussain and P. Demokritou, Development of reference metal and metal oxide engineered nanomaterials for nanotoxicology research using high throughput and precision flame spray synthesis approaches, NanoImpact, 2018, 10, 26–37 CrossRef PubMed.
P. Bihari, M. Vippola, S. Schultes, M. Praetner, A. G. Khandoga, C. A. Reichel, C. Coester, T. Tuomi, M. Rehberg and F. Krombach, Optimized dispersion of nanoparticles for biological in vitro and in vivo studies, Part. Fibre Toxicol., 2008, 5, 1–14 CrossRef PubMed.
J. Cohen, G. Deloid, G. Pyrgiotakis and P. Demokritou, Interactions of engineered nanomaterials in physiological media and implications for in vitro dosimetry, Nanotoxicology, 2013, 4, 417–431 CrossRef PubMed.
K. A. Jensen, Y. Kembouche, E. Christiansen, N. R. Jacobsen, H. Wallin, C. Guiot, O. Spalla and O. Witschger, Towards a method for detectingthe potential genotoxicity of nanomaterials: Final protocol for producing suitable manufactured nanomaterial exposure media, NANOGENOTOX, 2010 Search PubMed.
I. M. Mahbubul, E. B. Elcioglub, R. Saidur and M. A. Amalin, Optimization of ultrasonication period for better dispersion and stability of TiO2−water nanofluid, Ultrason. Sonochem., 2017, 37, 360–367 CrossRef CAS PubMed.
R. Marín, F. Babick and M. Stintz, Ultrasonic dispersion of nanostructured materials with probe sonication−practical aspects of sample preparation, Powder Technol., 2017, 318, 451–458 CrossRef.
J. S. Taurozzi, V. A. Hackley and M. R. Wiesner, A standardised approach for the dispersion of titanium dioxide nanoparticles in biological medi, Nanotoxicology, 2011, 7(4), 389–401 CrossRef PubMed.
R. Roscher, B. Bohn, M. F. Duarte and J. Garcke, Explainable Machine Learning for Scientific Insights and Discoveries, IEEE Access, 2020, 8, 42200–42216 Search PubMed.
S. Lipovetsky and M. Conklin, Analysis of regression in game theory approach, Appl. Stoch. Models Bus. Ind., 2001, 17(4), 319–330 CrossRef.
L. S. Shapley, A Value for n-Person Games, in Contributions to the Theory of Games (AM-28), ed. H. W. Kuhn and A. W. Tucker, Princeton University Press, 1953, vol. II, pp. 307–318 Search PubMed.
S. M. Lundberg and S.-I. Lee, A unified approach to interpreting model predictions, in Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Long Beach, California, USA, 2017, pp. 4768–4777 Search PubMed.
J. Chen and A. Anandarajah, van der Waals Attraction between Spherical Particles, J. Colloid Interface Sci., 1996, 180(2), 519–523 CrossRef CAS.
H. C. Hamaker, The London—van der Waals attraction between spherical particles, Physica, 1937, 4(10), 1058–1072 CrossRef CAS.
L. Breiman, Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author), Stat. Sci., 2001, 16(3), 199–231 Search PubMed , 33.
D. R. Cox, [Statistical Modeling: The Two Cultures]: Comment, Stat. Sci., 2001, 16(3), 216–218 Search PubMed.
E. Bareinboim and J. Pearl, Causal inference and the data-fusion problem, Proc. Natl. Acad. Sci. U. S. A., 2016, 113(27), 7345–7352 CrossRef CAS PubMed.
Z. C. Lipton, The mythos of model interpretability, Commun. ACM, 2018, 61(10), 36–43 CrossRef.
C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nat. Mach. Intell., 2019, 1(5), 206–215 CrossRef PubMed.

Footnote

† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2nr03240f

Click here to see how this site uses Cookies. View our privacy policy here.