Anna
Rybińska-Fryca
a,
Alicja
Mikolajczyk
ab and
Tomasz
Puzyn
*ab
aQSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland. E-mail: t.puzyn@qsarlab.com
bUniversity of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
First published on 13th October 2020
A significant number of experimental studies are supported by computational methods such as quantitative structure–activity relationship modeling of nanoparticles (Nano-QSAR). This is especially so in research focused on design and synthesis of new, safer nanomaterials using safe-by-design concepts. However, Nano-QSAR has a number of important limitations. For example, it is not clear which descriptors that describe the nanoparticle physicochemical and structural properties are essential and can be adjusted to alter the target properties. This limitation can be overcome with the use of the Structure–Activity Prediction Network (SAPNet) presented in this paper. There are three main phases of building the SAPNet. First, information about the structural characterization of a nanomaterial, its physical and chemical properties and toxicity is compiled. Then, the most relevant properties (intrinsic/extrinsic) likely to influence the ENM toxicity are identified by developing “meta-models”. Finally, these “meta-models” describing the dependencies between the most relevant properties of the ENMs and their adverse biological properties are developed. In this way, the network is built layer by layer from the endpoint (e.g. toxicity or other properties of interest) to descriptors that describe the particle structure. Therefore, SAPNets go beyond the current standards and provide sufficient information on what structural features should be altered to obtain a material with desired properties.
The search for the structure–property and/or structure–activity relationships can be significantly supported by computational techniques (e.g. quantitative structure–activity/property relationship (QSAR/QSPR) modeling). The application of the QSAR helps in reducing the expenses and the number of necessary experiments. Moreover, it provides quantitative description of the relationships, in the form of mathematical equations. Thus, the developer is able to calculate to what extent the property/toxicity would increase/decrease in response to the considered structural change.
The idea of QSAR modeling for nanomaterials (Nano-QSAR) was introduced by Puzyn et al. in 2009.2 Since then, it has been used to improve the existing models, to develop descriptors for nanomaterials, and to support experimental studies. The most important directions of further developments of Nano-QSAR have been discussed in a joint “EU-US Nanoinformatics 2030 Roadmap” and such EU Horizon 2020 projects as NanoSolveIT (http://www.nanosolveit.eu) and NanoInformaTIX (http://www.nanoinformatix.eu) that work on the introduction of an innovative Integrated Test and Evaluation Approach (IATA) for environmental health and nanomaterial safety and promoting the use of Nano-QSAR methods as a part of the safe-by-design approach.3
To the best of our knowledge, there is a limited number of Nano-QSAR/QSPR models that directly express toxicity or a selected property as a function of the structural features (descriptors expressing structural attributes). Roy et al. presented a nano-QSTR approach based on periodic table based descriptors.4 The authors developed a series of linear regression models for predicting the toxicity of heterogeneous TiO2-based NPs towards the Chinese hamster ovary cell line. However, only one model was based on a descriptor that directly describes the structure (amount of silver metal). The other variables (electrochemical equivalent, 2nd ionization potential, covalent radius, and thermal conductivity) can be considered as properties that are consequences of the structure. Another example is the model for the prediction of zeta potential based on grouping of the NMs according to their nearest neighbors developed by Varsou et al.5 The model utilizes three variables: the type of the core (metal oxide or pure metal), the main elongation expressing the lengthening of the particle, and the pH where the zeta potential was measured.5 There are also models based on quasi-SMILES, which are character-based representations derived from traditional SMILES.6 They can encode the structural features, the physicochemical properties, and the exposure conditions such as cell lines.7–11 Another way to take into account factors such as changes in chemical compositions, assay organisms, or exposure time is to apply the QSAR-perturbation approach.12,13 All of the mentioned models are very useful tools. However, they do not provide specific information on the dependency between the structural features and properties of ENMs that subsequently influences the biological activity.
Majority of the contributions present the predictions of ENMs’ biological activity based on the intrinsic physico-chemical properties (i.e. features independent of the environments that characterize the nanoparticle as an effect of having a given structure).14–16 The intrinsic properties applied as toxicity predictors allow us to explain the toxicity mode of action, but do not provide knowledge about the influence of the structural features on the toxic effect. As a consequence, such models are insufficient from a “designer” point of view (i.e. a designer does not know how to modify the structure to obtain the expected effect). On the other hand, in many discussions nanotoxicologists raise an issue that the influence of the environment (e.g. solvent and pH) on the toxicity is of high importance. Therefore, it might be impossible to predict the toxicity directly from the structure. Instead, the toxicity can be predicted from the system-dependent and/or intrinsic properties and such properties can be linked to the purely structural features.
Here, we propose to replace the traditional Nano-QSAR modeling practice with the use of Structure–Activity Prediction Networks (SAPNets) – an approach that effectively links the description of ENMs’ structure with their toxicity through a series of layers built from nodes that correspond to predictive “meta-models” developed with machine learning techniques as well as Artificial Intelligence (AI) (Fig. 1).
The second step is the identification of relevant properties directly influencing the ENMs’ toxicity by developing “meta-models”. However, these models should not be limited to the Nano-QSAR or more precisely quantitative property–activity relationship (Nano-QPAR) approach. One should consider other methods like read-across, knowledge-based decision rules, models based on omics data, and other types of models based on AI and machine learning techniques.
The subsequent steps are to build the next layers of “meta-models” that describe the relationships between the selected properties of the studied ENMs and their structural features (descriptors that express structural attributes). In this way, the network is built layer by layer from the endpoint (e.g. toxicity or other properties of interest) to the structure. It is worth noting that SAPNets can be used not only for predicting adverse effects (toxicity) but also for optimizing the desired properties of the newly designed ENMs, such as photocatalytic activity, higher electrical conductivity, etc.
Although the development of structure–activity prediction networks requires extensive knowledge and experience in computational nanotoxicology, SAPNets may be further easily applied for predicting the toxicity and properties of ENMs by non-specialists. This is because they are based on descriptors that are understandable for non-specialists (e.g. size, shape, aspect-ratio, and type of coating) and do not require additional computational calculations – the user provides only the values of such descriptors and then the predictions are made. Moreover, the user can see precisely how the modification of the descriptor values (e.g. size) will influence the predicted endpoint.
We do believe that the proposed new approach responds to the needs of the scientific community focused on research on nanomaterials. Here, we present three case studies in which the structure–activity prediction networks methodology is used to estimate the photocatalytic activity as well as the biological activity of ENMs.
accuracy = TP + TN/TP + TN + FP + FN |
sensitivity = TP/TP + FN |
specificity = TN/TN + FP |
misclassification = FP + FN/TP + TN + FP + FN |
TP, TN, FP, and FN stand for true positive, true negative, false positive, false negative respectively. The borders of the applicability domain (AD) of the model are defined by the minimal and maximal values of descriptors characterizing samples from the training set. Moreover, the developed model should be applied only to TiO2-based nanophotocatalysts synthesized in the presence of ionic liquids according to the protocol described in the source publication.17
All of the analyses in the case study were carried out using packages available in the R statistical program (R version 3.6.2).
The last node of the network defines the relationship between the photocatalytic activity of the TiO2-IL semiconductors and experimentally measured properties. Photocatalytic degradation of phenol (τOH) was determined in a model reaction of the compound decomposition in an aqueous solution under visible irradiation and expressed on a continuous scale in the range from 0 to 100%. Since phenol degradation in the case of typical TiO2-based semiconductors is around 30%, the samples were assigned to one of the categories: with high photoactivity or with low photoactivity (percent of degradation less than 35%).18,19 Then, the dataset was divided into training and validation sets. Finally, several machine learning techniques were used to find a property based on which the level of photoactivity can be estimated. The best model was obtained by using logistic regression with one property (intensity of photoluminescence at 398 nm) as an independent variable (predictor). The quality of the model was determined on the basis of parameters such as accuracy, sensitivity, specificity, precision, and misclassification error, calculated for the training and validation sets.20 Details can be found in Table 1 and the Materials and Methods section.
Intercept and coefficient × predictor | |||
---|---|---|---|
Model type: logistic regression | |||
4.32(±2.10) – 0.051(±0.03)(PL 398 ) | |||
Calibration | Internal validation | External validation | |
Accuracy | 0.80 | 0.80 | 0.83 |
Sensitivity | 0.91 | — | 1 |
Specificity | 0.5 | — | 0.5 |
Precision | 0.83 | — | 0.83 |
Misclassification error | 0.20 | 0.20 | 0.17 |
AUC | 0.91 | — | 0.75 |
PL398 − emission intensity at 398 nm (photoluminescence signal) |
In the second step, we have been trying to find the dependency between the property used in the first model and the structure of nanomaterials (meta-model). Each sample is described by surface area, the amount of nitrogen and carbon atoms, ionic liquid decomposition rate (ΔIL), molar ratio and the type of cations and anions. We decided to use Principal Component Analysis (PCA) to visualize the TiO2-IL samples in the multidimensional space of predictors describing the structure of nanomaterials (Fig. 3). The first two principal components explain 59.66% of the variance. Based on the loading's values (coefficients of the linear combination of the original variables which construct the PCs) we were able to estimate how much each predictor contributes to a particular principal component. These are surface area and the amount of nitrogen and carbon atoms for PC1 and molar ratio and the type of ions for PC2 (Table 4S in the ESI†). To identify the possible hidden patterns and relationships the information about photoluminescence intensity at 398 nm was added to the PCA plot. Unfortunately, after visual analysis, we couldn't distinguish groups of samples that are similar in terms of experimental characterization, as well as property. We can assume that available experimental characterization is not sufficient and should be extended to find relevant variables.
Nevertheless, the presented case study highlights one of the most crucial issues related to the Nano-QSAR method – the comprehensive characterization of physical and chemical properties of nanoparticles. Building a model based on structure information that can be easily interpreted requires access to a suitably prepared data set. Each experimental study that provides publicly available data, subsequently used for model development, should meet the standards of findability, accessibility, interoperability, and reusability (FAIR idea).21,22 Thus, collecting additional data from various sources and filling the gaps in developed datasets will be possible. Moreover, the presented SAPNet includes a meta-model that links phenol degradation with experimentally measured photoluminescence. Hence, the predictive potential of the network is limited. This issue can be overcome by adding a node dedicated to the theoretical simulation of signal intensity at a certain wavelength.
In 2019, Mikolajczyk et al.29 used a set of 29 TiO2-based nanomaterials modified with Au, Ag, Pt and Pd nanoclusters (Memix-TiO2) to develop a model capable of predicting toxicity towards the CHO-K1 cell line (epithelial cells obtained from the Chinese hamster ovary, ATCC® CCL-61™).29 The published Nano-QSAR model utilized only one descriptor, which represents additive electronegativity (χmix):
pEC50 = 6.37(±0.07) + 0.56(±0.02) × χmix |
It may seem that the calculation of this particular predictor requires experience in the Nano-QSAR methodology and computational chemistry. Nevertheless, it is strictly related to the designed and synthesized nanostructure. The type and concentration of a certain metal in the mixture (%molMe) as well as electronegativity of a particular metal (χMe) are necessary to estimate the additive electronegativity:
χmix = %molMe1 × χMe1+ ⋯ + %molMen) × χMen |
Therefore, information about the structure can be used to calculate a property that is relevant for a selected biological activity. However, to emphasize these relations, the developed model should be presented according to the SAPNet scheme (Fig. 4).
Fig. 4 Example of presenting a model according to the SAPNet scheme; χMe is the electronegativity of a particular metal. |
In 2015 Mikolajczyk et al.32 published a Nano-QSPR model for predicting the zeta potential (ζ) of metal oxide nanoparticles (NPs) in a medium. The authors collected the experimental values of zeta potentials of selected 15 MeOx NPs. Every nanoparticle was described by 11 image-based descriptors and 17 properties calculated with quantum-mechanical methods (at the level of semi-empirical theory). The best combination of the most relevant input variables was selected with use of the Genetic Algorithm (GA). Finally, the authors used multiple linear regression and obtained the following model:
ζ = −11.26 − 4.46ψ − 2.39 εHOMO/nMe |
Another model was presented by Toropov et al. in 2018.33 In this case, the dataset contained 87 data points of zeta potential measurements in aqueous solutions (ζH20) for nanomaterials made of silica and metal oxides having various sizes. Each nanoparticle was described by a modified version of the simplified molecular input line entry system (quasi-SMILES) that represents all available information on the structure.34 Moreover, the nominal sizes of NPs, as well as sizes in water, were used as input variables. Nano-QSPR models were constructed by using the Monte Carlo approach.
The values of zeta potential in water (ζH20) for different nanoparticles obtained by the application of the mentioned model can be used as the input to the model proposed by Wyrzykowska et al.:
ζKCl = 3.98 + 21.68 ζH2O + 7.88 PN |
Fig. 5 Example of incorporation of two existing models for predicting zeta potential in different environmental conditions developed by Toropov et al.33 and Wyrzykowska et al.35 into the SAPNet scheme. |
The concept of the SAPNets can be described as a “series of mutually dependent predictive models”. A similar approach can be noticed in the case of toxicity-toxicity relationship studies (QTTR). For example, Roy et al. developed several QSTR models for both rat and mouse oral toxicity of carbamate derivatives. Then, the QTTR models were developed by taking each of the predicted responses as independent variables.39 Therefore, the specific information about the structure (expressed by descriptors) can be used to estimate the toxicity towards a particular organism. Then, the predicted value can be used as an input in the QTTR model to fill the data gaps for another species. The approach was used in the cases of various types of chemicals including ionic liquids and metal oxide nanoparticles.40,41
An important aspect of using networks of predictive models is possible propagation of uncertainty, along the network. This is a well-known phenomenon in statistics42 and may affect SAPNets, in which the output variables from one model are used as input variables to the next one. The importance of the uncertainty propagation in the endpoint prediction should be further investigated.
Among the challenges of building classic Nano-QSAR models, one is especially often raised by authors: the limited number of observations in the data set. The issue can be overcome by the development of new modeling algorithms dedicated to small training sets. In 2017, Gajewicz et al. presented the read-across algorithm (Nano-QRA) that can be used to fill data gaps in a quantitative manner.43 The predictions are based on interpolation and extrapolation approaches: one-point-slope, two-point formula, or the equation of a plane passing through three points to predict a particular activity for an unknown chemical(s). Such prediction models as Nano-QRA can be easily implemented as nodes in SAPNets.
As mentioned, the development of nanoinformatics tools should be in line with published guidelines and standards established by the scientific community; especially, the outcomes from projects focused on the regulations are significant for this matter. A recent contribution by Giusti et al.36 summaries existing approaches for nanomaterial grouping and provides a new approach with further recommendations. The authors suggest that risk assessment should be based on three crucial statements: “what they are”, “where they go” and “what they do”. Thus, one should gather information about (i) physico-chemical characterization of pristine materials (as synthesized); (ii) changes of properties in various conditions (system-dependent properties), toxicokinetics, and fate; (iii) physical hazards, human toxicity, and ecotoxicity. The presented “NanoReg2 approach” requires a more detailed description of nanoforms. However, the difference between descriptors (that describe the structure/morphology) of ENMs and the properties (that result from the structure) was not clearly distinguished. Thus, a precise analysis of the relationship between the structure of ENMs and their properties may be challenging. Building the structure–activity prediction networks that include various methods, such as the presented NanoReg2 approach, could be a way to extract all crucial information and point out dependencies between the structure of a NM and its properties and biological activity.
Finally, it is worth mentioning that the additional benefits of using structure–activity prediction networks are more effective incorporation of the computational safety assessment at the stage of design of new materials and better collaboration between experts focused on different aspects of research on nanomaterials. This is especially important in the context of extending the SAPNet with layers related to the synthesis stage.
Prediction of phenol degradation efficiency with the SAPNet concept and generation of the data required to perform the first case study were funded by the National Science Centre within program SONATA 8 (grant entitled: “Influence of the ionic liquid structure on interactions with TiO2 particles in ionic liquid assisted hydrothermal synthesis”), contract No. 2014/15/D/ST5/0274.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d0nr05220e |
This journal is © The Royal Society of Chemistry 2020 |