Structure – activity prediction networks (SAPNets): a step beyond Nano-QSAR for e ﬀ ective implementation of the safe-by-design concept

A signi ﬁ cant number of experimental studies are supported by computational methods such as quantitative structure – activity relationship modeling of nanoparticles (Nano-QSAR). This is especially so in research focused on design and synthesis of new, safer nanomaterials using safe-by-design concepts. However, Nano-QSAR has a number of important limitations. For example, it is not clear which descriptors that describe the nanoparticle physicochemical and structural properties are essential and can be adjusted to alter the target properties. This limitation can be overcome with the use of the Structure – Activity Prediction Network (SAPNet) presented in this paper. There are three main phases of building the SAPNet. First, information about the structural characterization of a nanomaterial, its physical and chemical properties and toxicity is compiled. Then, the most relevant properties (intrinsic/extrinsic) likely to in ﬂ uence the ENM toxicity are identi ﬁ ed by developing “ meta-models ” . Finally, these “ meta-models ” describing the dependencies between the most relevant properties of the ENMs and their adverse biological properties are developed. In this way, the network is built layer by layer from the endpoint ( e.g. toxicity or other properties of interest) to descriptors that describe the particle structure. Therefore, SAPNets go beyond the current standards and provide su ﬃ cient information on what structural features should be altered to obtain a material with desired properties.


Introduction
Precise manipulation and control of the structure of matter on the nanoscale brings an opportunity to design nanomaterials (ENMs) that are safe for humans and the environment in addition to obtaining their maximal efficiency in the context of the required application. This concept is called 'safe-bydesign'. 1 The crucial aspect of 'safe-by-design' is gathering knowledge on the relationships between the nanomaterials' structure, physicochemical properties and toxicity. Thus, the designer knows precisely how to modify the structure to get the expected change in the activity.
The search for the structure-property and/or structureactivity relationships can be significantly supported by computational techniques (e.g. quantitative structure-activity/property relationship (QSAR/QSPR) modeling). The application of the QSAR helps in reducing the expenses and the number of necessary experiments. Moreover, it provides quantitative description of the relationships, in the form of mathematical equations. Thus, the developer is able to calculate to what extent the property/toxicity would increase/decrease in response to the considered structural change.
The idea of QSAR modeling for nanomaterials (Nano-QSAR) was introduced by Puzyn et al. in 2009. 2 Since then, it has been used to improve the existing models, to develop descriptors for nanomaterials, and to support experimental studies. The most important directions of further developments of Nano-QSAR have been discussed in a joint "EU-US Nanoinformatics 2030 Roadmap" and such EU Horizon 2020 projects as NanoSolveIT (http://www.nanosolveit.eu) and NanoInformaTIX (http://www.nanoinformatix.eu) that work on the introduction of an innovative Integrated Test and Evaluation Approach (IATA) for environmental health and nanomaterial safety and promoting the use of Nano-QSAR methods as a part of the safe-by-design approach. 3 To the best of our knowledge, there is a limited number of Nano-QSAR/QSPR models that directly express toxicity or a selected property as a function of the structural features (descriptors expressing structural attributes). Roy et al. presented a nano-QSTR approach based on periodic table based descriptors. 4 The authors developed a series of linear regression models for predicting the toxicity of heterogeneous TiO 2 -based NPs towards the Chinese hamster ovary cell line. However, only one model was based on a descriptor that directly describes the structure (amount of silver metal). The other variables (electrochemical equivalent, 2 nd ionization potential, covalent radius, and thermal conductivity) can be considered as properties that are consequences of the structure. Another example is the model for the prediction of zeta potential based on grouping of the NMs according to their nearest neighbors developed by Varsou et al. 5 The model utilizes three variables: the type of the core (metal oxide or pure metal), the main elongation expressing the lengthening of the particle, and the pH where the zeta potential was measured. 5 There are also models based on quasi-SMILES, which are character-based representations derived from traditional SMILES. 6 They can encode the structural features, the physicochemical properties, and the exposure conditions such as cell lines. [7][8][9][10][11] Another way to take into account factors such as changes in chemical compositions, assay organisms, or exposure time is to apply the QSAR-perturbation approach. 12,13 All of the mentioned models are very useful tools. However, they do not provide specific information on the dependency between the structural features and properties of ENMs that subsequently influences the biological activity.
Majority of the contributions present the predictions of ENMs' biological activity based on the intrinsic physico-chemical properties (i.e. features independent of the environments that characterize the nanoparticle as an effect of having a given structure). [14][15][16] The intrinsic properties applied as toxicity predictors allow us to explain the toxicity mode of action, but do not provide knowledge about the influence of the structural features on the toxic effect. As a consequence, such models are insufficient from a "designer" point of view (i.e. a designer does not know how to modify the structure to obtain the expected effect). On the other hand, in many discussions nanotoxicologists raise an issue that the influence of the environment (e.g. solvent and pH) on the toxicity is of high importance. Therefore, it might be impossible to predict the toxicity directly from the structure. Instead, the toxicity can be predicted from the system-dependent and/or intrinsic properties and such properties can be linked to the purely structural features.
Here, we propose to replace the traditional Nano-QSAR modeling practice with the use of Structure-Activity Prediction Networks (SAPNets)an approach that effectively links the description of ENMs' structure with their toxicity through a series of layers built from nodes that correspond to predictive "meta-models" developed with machine learning techniques as well as Artificial Intelligence (AI) (Fig. 1).

Idea of structure activity prediction networks (SAPNets)
The first step of building a network is to gather information about structural characterization of a nanomaterial, its physi-cal and chemical properties and toxicity. In the case of chemically diverse ENMs, their properties can be derived from experimental as well as computational studies. The advantage of the theoretical/computational approach is the possibility of retrieving characterization for a larger number of ENMs than in the case of experimental studies.
The second step is the identification of relevant properties directly influencing the ENMs' toxicity by developing "metamodels". However, these models should not be limited to the Nano-QSAR or more precisely quantitative property-activity relationship (Nano-QPAR) approach. One should consider other methods like read-across, knowledge-based decision rules, models based on omics data, and other types of models based on AI and machine learning techniques.
The subsequent steps are to build the next layers of "metamodels" that describe the relationships between the selected properties of the studied ENMs and their structural features (descriptors that express structural attributes). In this way, the network is built layer by layer from the endpoint (e.g. toxicity or other properties of interest) to the structure. It is worth noting that SAPNets can be used not only for predicting adverse effects (toxicity) but also for optimizing the desired properties of the newly designed ENMs, such as photocatalytic activity, higher electrical conductivity, etc.
Although the development of structure-activity prediction networks requires extensive knowledge and experience in computational nanotoxicology, SAPNets may be further easily applied for predicting the toxicity and properties of ENMs by non-specialists. This is because they are based on descriptors that are understandable for non-specialists (e.g. size, shape, aspect-ratio, and type of coating) and do not require additional computational calculationsthe user provides only the values of such descriptors and then the predictions are made. Fig. 1 The schematic representation of the structure-activity prediction networks for modeling of ENMs' toxic activity; xdescriptor that describes either structural attributes (D) or properties (P) of ENMs, Mmeta-model (meta-models of the 1st layerindicated in green, and meta-models of the 2 nd layerindicated in blue).
Moreover, the user can see precisely how the modification of the descriptor values (e.g. size) will influence the predicted endpoint.
We do believe that the proposed new approach responds to the needs of the scientific community focused on research on nanomaterials. Here, we present three case studies in which the structure-activity prediction networks methodology is used to estimate the photocatalytic activity as well as the biological activity of ENMs.

Materials and methods
Case study details: predicting the properties of ENMs with SAPNets with the example of the phenol degradation efficiency The presented case study was based on the publicly available data. 17 To assure the reliability of the model, we split the studied data into the training (T, 15 samples) and external validation (V, 6 samples) sets. First, the samples were sorted based on the increasing endpoint value ( percentage of phenol degradation, τ OH ). Then, the ones with the highest and the lowest value were arbitrarily assigned to the training set. The rest of the samples were randomly assigned to the validation set. Therefore, the points from the validation set were evenly distributed within the range of the endpoint of the training set. Details can be found in Table 1S in the ESI. † The training set was used to find an equation that links the properties of samples with phenol degradation (τ OH ) by titanium dioxide. The considered properties were the efficiency of charge carrier trapping, migration, and transfer expressed as the intensity of photoluminescence, and the band gap reduction described through UV-Vis absorption spectra (Tables 2S and 3S in the ESI †). The relationship between the categorical (τ OH ) and independent variables was described by the logistic regression model. The stepwise selection approach was used to find an optimal combination of descriptors. The quality of the model was determined on the basis of parameters such as accuracy, sensitivity, specificity, precision, and misclassification error, calculated for both the training and validation set. Therefore, we were able to verify the goodness-of-fit of the model and ability to predict the endpoint value for external samples. Additionally, the internal validation of the model (leave-oneout cross-validation) was performed to measure its robustness. Moreover, the developed model should be applied only to TiO 2 -based nanophotocatalysts syn-thesized in the presence of ionic liquids according to the protocol described in the source publication. 17 All of the analyses in the case study were carried out using packages available in the R statistical program (R version 3.6.2).

Examples of implementation of the SAPNet methodology
Predicting the properties of ENMs with SAPNets with the example of the phenol degradation efficiency According to the proposed workflow, the developed SAPNet should finally correlate the structure of a nanomaterial to the selected endpoint. Here, we present an example of how the photodegradation of phenol (τ OH ) by titanium dioxide can be determined in line with the SAPNet methodology (Fig. 2). In this case study, for developing SAPNets, we used experimental data from our previous publication. 17 The analyzed dataset contains experimental characterization of 23 samples of titanium dioxide synthesized in the presence of ionic liquids (IL) and information about the photocatalytic degradation of phenol for each sample (Table 1S in the ESI †). Since the values of photoluminescence (PL) had been unavailable for two samples, the dataset used in this case was reduced to 21 observations.
The last node of the network defines the relationship between the photocatalytic activity of the TiO 2 -IL semiconductors and experimentally measured properties. Photocatalytic degradation of phenol (τ OH ) was determined in a model reaction of the compound decomposition in an aqueous solution under visible irradiation and expressed on a continuous scale in the range from 0 to 100%. Since phenol degradation in the case of typical TiO 2 -based semiconductors is around 30%, the samples were assigned to one of the categories: with high photoactivity or with low photoactivity ( percent of degradation less than 35%). 18,19 Then, the dataset was divided into training and validation sets. Finally, several machine learning techniques were used to find a property based on which the level of photoactivity can be estimated. The best model was obtained by using logistic regression with one property (intensity of photoluminescence at 398 nm) as an independent variable ( predictor). The quality of the model was determined on the basis of parameters such as accuracy, sensitivity, specificity, precision, and misclassification error, calculated for the training and validation sets. 20 Details can be found in Table 1 and the Materials and Methods section.
In the second step, we have been trying to find the dependency between the property used in the first model and the structure of nanomaterials (meta-model). Each sample is described by surface area, the amount of nitrogen and carbon atoms, ionic liquid decomposition rate (Δ IL ), molar ratio and the type of cations and anions. We decided to use Principal Component Analysis (PCA) to visualize the TiO 2 -IL samples in the multidimensional space of predictors describing the structure of nanomaterials (Fig. 3). The first two principal components explain 59.66% of the variance. Based on the loading's values (coefficients of the linear combination of the original variables which construct the PCs) we were able to estimate how much each predictor contributes to a particular principal component. These are surface area and the amount of nitrogen and carbon atoms for PC1 and molar ratio and the type of ions for PC2 (Table 4S in the ESI †). To identify the possible hidden patterns and relationships the information about photoluminescence intensity at 398 nm was added to the PCA plot. Unfortunately, after visual analysis, we couldn't distinguish groups of samples that are similar in terms of experimental characterization, as well as property. We can assume that available experimental characterization is not sufficient and should be extended to find relevant variables.
Nevertheless, the presented case study highlights one of the most crucial issues related to the Nano-QSAR methodthe comprehensive characterization of physical and chemical properties of nanoparticles. Building a model based on structure information that can be easily interpreted requires access to a suitably prepared data set. Each experimental study that provides publicly available data, subsequently used for model development, should meet the standards of findability, acces-sibility, interoperability, and reusability (FAIR idea). 21,22 Thus, collecting additional data from various sources and filling the gaps in developed datasets will be possible. Moreover, the presented SAPNet includes a meta-model that links phenol degradation with experimentally measured photoluminescence. Hence, the predictive potential of the network is limited. This issue can be overcome by adding a node dedicated to the theoretical simulation of signal intensity at a certain wavelength.

Predicting the toxicity of ENMs in line with the SAPNet methodology
The modern approach to the introduction of new ENMs should involve synthesis targeted to the specific structure, morphology, and physical and chemical properties, as well as procedure of safety assessment. The evaluation of possible adverse outcomes toward humans and the environment can be supported by computational methods such as Nano-QSAR. Unfortunately, the majority of available models are based on the predictors that require knowledge and experience in computational chemistry. For example, one of the first Nano-QSAR models links the cytotoxicity effect towards E. coli with theoretical predictors for 17 metal oxide nanoparticles derived from molecular models that were optimized at the semiempirical level of theory (PM6). However, the theoretical predictors can be derived from calculations conducted at different levels of theory, i.e. (i) electronic level; (ii) atomistic level; (iii) mesoscopic level; or (iv) continuum level. 14,23 The theoretical predictors have been widely applied in the development of different types of Nano-QSAR models. [24][25][26][27][28] Unfortunately, their application in toxicity prediction requires time-consuming computational resources and specialized knowledge.  In 2019, Mikolajczyk et al. 29 used a set of 29 TiO 2 -based nanomaterials modified with Au, Ag, Pt and Pd nanoclusters (Me mix -TiO 2 ) to develop a model capable of predicting toxicity towards the CHO-K1 cell line (epithelial cells obtained from the Chinese hamster ovary, ATCC® CCL-61™). 29 The published Nano-QSAR model utilized only one descriptor, which represents additive electronegativity (χ mix ): pEC50 ¼ 6:37ð+0:07Þ þ 0:56ð+0:02Þ Â χ mix It may seem that the calculation of this particular predictor requires experience in the Nano-QSAR methodology and computational chemistry. Nevertheless, it is strictly related to the designed and synthesized nanostructure. The type and concentration of a certain metal in the mixture (%mol Me ) as well as electronegativity of a particular metal (χ Me ) are necessary to estimate the additive electronegativity: Therefore, information about the structure can be used to calculate a property that is relevant for a selected biological activity. However, to emphasize these relations, the developed model should be presented according to the SAPNet scheme (Fig. 4).

Incorporation of the existing models into SAPNets
The presented approach is not limited to building new predictive models for nanomaterials. It can be used to combine already published models into a network that will be useful in the context of designing new advanced materials to be "tailored" for specific needs. One of the most important characteristics of a nanoparticle is its stability in different media. The ability to form agglomerates is one of the physical factors that affect the environmental fate and behavior of ENMs. 30,31 For example, agglomeration can change the sedimentation process; therefore, it indirectly influences the effective doses responsible for potential toxicity towards living organisms. 30,31 The agglomerate formation is linked with surface charge; however, it cannot be measured directly. Hence, the zeta potential (ζ) in a given medium is a commonly used parameter to express the surface charge.
In 2015 Mikolajczyk et al. 32 published a Nano-QSPR model for predicting the zeta potential (ζ) of metal oxide nanoparticles (NPs) in a medium. The authors collected the experimental values of zeta potentials of selected 15 MeOx NPs. Every nanoparticle was described by 11 image-based descriptors and 17 properties calculated with quantum-mechanical methods (at the level of semi-empirical theory). The best combination of the most relevant input variables was selected with use of the Genetic Algorithm (GA). Finally, the authors used multiple linear regression and obtained the following model: ζ ¼ À11:26 À 4:46ψ À 2:39 ε HOMO =nMe where ψ is the spherical size of nanoparticles derived from analysis of Transmission Electron Microscopy (TEM) images and εHOMO/nMe is the energy of the highest occupied molecular orbital per metal atom calculated at the semiempirical level of theory (PM6 method). The developed model utilizes the information about the structure of NPs. However, experience in TEM image analysis and theoretical chemistry is still necessary to use the model to predict the ζ of new nanoparticles.
Another model was presented by Toropov et al. in 2018. 33 In this case, the dataset contained 87 data points of zeta potential measurements in aqueous solutions (ζ H 2 0 ) for nanomaterials made of silica and metal oxides having various sizes. Each nanoparticle was described by a modified version of the simplified molecular input line entry system (quasi-SMILES) that represents all available information on the structure. 34 Moreover, the nominal sizes of NPs, as well as sizes in water, were used as input variables. Nano-QSPR models were constructed by using the Monte Carlo approach.
The values of zeta potential in water (ζ H 2 0 ) for different nanoparticles obtained by the application of the mentioned model can be used as the input to the model proposed by Wyrzykowska et al.: where ζ KCl is the zeta potential in potassium chloride; ζ H 2 0 is the zeta potential in water and PN is the periodic number that reflects the number of electron shells in the metal of the oxide. 35 Based on the presented equation we are able to calculate the zeta potential in the ionized environment (here in KCl, ζ KCl ). The described models create the network, in which the first layer consists of a meta-model linking the structure with zeta potential in aqueous solution (ζ H 2 0 ), whereas the second layer consists of a meta-model that allows estimating the value of zeta potential in the ionized environment (ζ KCl ) by using output from the first layer. Thus, the presented SAPNet takes into account the changes in the environment in which the nanoparticle is located. It is an excellent example of how to exploit the potential of tools distributed in the public space (Fig. 5).

Discussion and conclusions
The three presented case studies illustrate the high potential of SAPNets to serve as a valuable tool for developing new nanoparticles in line with the idea of safe-by-design. This is because the user obtains precise information on which structural features of the studied nanoparticles should be modified to acquire the material with desired properties and/or low toxicity. In addition, the designing process can be done fully virtually, without the necessity to synthesize the particle first and measure the experimental predictors, formerly used as input variables to Nano-QSAR models. Moreover, in the case of predicting toxicity, SAPNets may include a series of meta-models that allow considering the influence of the environment (such conditions as different pHs, solvents, etc.) on the nanomaterials' property and behavior. This was an important limitation of simple Nano-QSARs in cases where it may be related to system-dependent properties. [36][37][38] The concept of the SAPNets can be described as a "series of mutually dependent predictive models". A similar approach can be noticed in the case of toxicity-toxicity relationship studies (QTTR). For example, Roy et al. developed several QSTR models for both rat and mouse oral toxicity of carbamate derivatives. Then, the QTTR models were developed by taking each of the predicted responses as independent variables. 39 Therefore, the specific information about the structure (expressed by descriptors) can be used to estimate the toxicity towards a particular organism. Then, the predicted value can be used as an input in the QTTR model to fill the data gaps for another species. The approach was used in the cases of various types of chemicals including ionic liquids and metal oxide nanoparticles. 40,41 An important aspect of using networks of predictive models is possible propagation of uncertainty, along the network. This is a well-known phenomenon in statistics 42 and may affect SAPNets, in which the output variables from one model are used as input variables to the next one. The importance of the uncertainty propagation in the endpoint prediction should be further investigated.
Among the challenges of building classic Nano-QSAR models, one is especially often raised by authors: the limited number of observations in the data set. The issue can be overcome by the development of new modeling algorithms dedicated to small training sets. In 2017, Gajewicz et al. presented the read-across algorithm (Nano-QRA) that can be used to fill data gaps in a quantitative manner. 43 The predictions are based on interpolation and extrapolation approaches: onepoint-slope, two-point formula, or the equation of a plane passing through three points to predict a particular activity for an unknown chemical(s). Such prediction models as Nano-QRA can be easily implemented as nodes in SAPNets.
As mentioned, the development of nanoinformatics tools should be in line with published guidelines and standards established by the scientific community; especially, the outcomes from projects focused on the regulations are significant for this matter. A recent contribution by Giusti et al. 36 sum-maries existing approaches for nanomaterial grouping and provides a new approach with further recommendations. The authors suggest that risk assessment should be based on three crucial statements: "what they are", "where they go" and "what they do". Thus, one should gather information about (i) physico-chemical characterization of pristine materials (as synthesized); (ii) changes of properties in various conditions (system-dependent properties), toxicokinetics, and fate; (iii) physical hazards, human toxicity, and ecotoxicity. The presented "NanoReg2 approach" requires a more detailed description of nanoforms. However, the difference between descriptors (that describe the structure/morphology) of ENMs and the properties (that result from the structure) was not clearly distinguished. Thus, a precise analysis of the relationship between the structure of ENMs and their properties may be challenging. Building the structure-activity prediction networks that include various methods, such as the presented NanoReg2 approach, could be a way to extract all crucial information and point out dependencies between the structure of a NM and its properties and biological activity.
Finally, it is worth mentioning that the additional benefits of using structure-activity prediction networks are more effective incorporation of the computational safety assessment at the stage of design of new materials and better collaboration between experts focused on different aspects of research on nanomaterials. This is especially important in the context of extending the SAPNet with layers related to the synthesis stage.

Conflicts of interest
There are no conflicts to declare.