E. Conterosito,
M. Milanesio,
L. Palin and
V. Gianotti
*
Università del Piemonte Orientale “A. Avogadro” Dipartimento di Scienze e Innovazione Tecnologica, Viale T. Michel 11, I-15121 Alessandria, Italy. E-mail: valentina.gianotti@uniupo.it
First published on 8th November 2016
The Liquid Assisted Grinding (LAG) method for the fast and facile preparation of organic-intercalated Layered Double Hydroxide (LDH) nanocomposites allowing the production of low cost, stable and efficient functional materials, is here employed to rationalize the features of the organic compounds that most likely undergo easy intercalation. LAG method was exploited to determine in a short time which molecules can be successfully intercalated into LDH. A straightforward rationalization of the intercalation yield results was not possible since no individual feature (such as bulkiness or pKa) could alone describe the intercalation behaviour of the whole set of molecules. Therefore, Principal Component Analysis (PCA) together with the use of molecular descriptors to classify molecules, were mutuated from the chemometric approach, widely used in analytical chemistry and applied successfully, for the first time, to a novel area of materials science. A set of molecular descriptors were chosen to cover different features of the molecule (physicochemical, topological, geometrical etc.) and then screened by statistical methods to understand which descriptors affected the intercalation yield. Then PCA allowed us to highlight the presence of various mechanisms, involved in the LAG intercalation and to separate the samples along PC3 as a function of yield. Finally, the classification tree method allowed us to understand the various mechanisms of intercalation and to classify molecules in groups, related to their yield. These groups can be used to estimate the expected yield as a function of the molecular descriptors. The molecules more apt to LAG have medium–low molecular weight, high flexibility and low refractivity. Conversely large and hydrophobic molecules and, surprisingly, small but rigid molecules have a small success rate concerning LAG intercalation. The behaviour of this last class of molecules, that should be in principle easily intercalated by LAG but which was identified by the present study as a difficult case, was thus tested using two molecules and the prediction of the chemometric study was confirmed.
A total of 17 molecules were tested and seven were successfully intercalated, while a partial intercalation was observed for six of them and in four cases the intercalation did not succeed. The reasons why some molecules can be easily intercalated while for other molecules LAG is unfeasible needed to be analysed in more detail, as no one of the features (such as bulkiness or pKa) could alone explain the intercalation behaviours of the whole set of molecules. In fact, the nature and characteristics of the chemical interactions between organic species and mineral surfaces and interlayers at the molecular scale is quite complex and depends upon the characteristics of the guest molecules, layer composition, synthesis conditions and hydration.31 Moreover, the intercalation reaction itself is affected by a number of variables.32 Given the complexity of the information obtained from the experiments on intercalation, trends are very difficult to be found so we looked for a method to seek for patterns in the obtained data. Principal Component Analysis (PCA) is a pattern recognition method that is used to obtain an effective representation of the system under investigation with a lower number of variables (called Principal Components – PCs) than in the original case.33,34 The loadings (weights of the original variables along the PCs) and the scores (projection of the samples on the space given by the PCs) allow the identification of relationships between the variables and the existence of groups of samples respectively. Despite chemometric methods are widely used in analytical,35,36 organic and pharmaceutical chemistry37 and “materials informatics” has been proposed recently and its use encouraged,38–40 there are still few examples of application of multivariate methods to materials science in the literature. These are mainly devoted to the handling of large amount of data to seek structure/properties relationships,41–45 for material optimization,46 for the prediction of crystal structures47 and for crystallographic data analysis.48,49 In fact, in the field of kinetic analysis of in situ XRPD data collected during a solid-state reaction,50 PCA resulted a valid and efficient alternative of Fourier analysis methods known as Phase Sensitive Detection (PSD).51
We here present the results obtained by applying multivariate analysis on the yields obtained by LAG intercalation, in order to rationalize the LAG efficiency versus the features of the organic molecules. The aim of the present work is to derive an “index of intercalability” that can be calculated a priori for each molecule to estimate the possibility to perform the intercalation successfully. Moreover it gives an hint on how to tailor the solution used for the intercalation to achieve the best yield or to make the intercalation possible before applying other strategies, including multivariate approach.52
Guests A,53 B, C, VG1-C2, VG1-C8,54 VG1-C10 (ref. 55) were synthesized in laboratory as described in the literature (Fig. 1). The other guests (also in Fig. 1) are commercially available and were purchased from Sigma Aldrich (Milano, Italy).
All the intercalations were obtained by adding 5 ml of NaOH 0.5 M and EtOH solutions (mixed in a 2
:
1 ratio) for each gram of LDH and grinding the obtained blend ex situ in a mortar or mixing in situ in a capillary as described in ref. 23 The samples intercalated in situ in a capillary were given the suffix “_c”, as detailed in Fig. 3. From our preliminary tests, the duration of the grinding does not affect the result. The grinding has mostly the role of homogenizing the mixture and favouring contact between the reactants. The grinding time was although standardized to avoid crystallinity degradation. The amount of liquid employed was optimized in previous experiments20,23,52 since we found that an excessive amount of liquid has a negative impact on the yield, probably as it induces an excessive swelling of the layers. The amount and kind of impurities generated with the LAG method are similar to that obtained performing ionic exchange in solution. The final LDH product contains impurities of sodium nitrate, derived from the removal of nitrate from the layers, and possible excess of non-intercalated organic compound. These impurities can be easily removed, washing the LDH with water to remove the soluble salts and eventually with another solvent more apt to remove eventual residues of non-intercalated organic guest. Static X-ray measurements were performed on a ThermoARL powder diffractometer XTRA, equipped with a solid state Peltier cooled detector. All powder diffraction spectra were measured in continuous mode using the following conditions: 2θ angular range 2–50°; tube power 45 kV and 40 mA, step size 0.02° 2θ.
![]() | ||
| Fig. 2 Results of the data mining procedure to determine the significant (black) and discarded (red) variables. | ||
TOPAS56 academic 4.1 was used to calculate the yield from the XRPD patterns of the samples. The peak area was calculated using a single peak fitting approach. All the XRPD patterns were already presented in previous works by some of us20,23,52,53,57,58 except those of LDH_COUM,59 LDH_FLUR ex situ, LDH_TIAP ex situ, LDH_PABA and LDH_PTA. The XRPD pattern and fit of the samples not already present in the literature are reported in the ESI file (Fig. SI2–SI4, SI9 and SI10†). The ratio between the area of the basal peak of the starting nitrate LDH and that of the hybrid compound allowed to calculate the reaction yield (reported in Fig. 3) as described by Toson et al.52
Marvin60 calculator plugins, running on an Intel core i7 vPro processor, were employed to calculate the values of the molecular descriptors that, unlike for example charge and molecular weight, are not directly evaluable from the chemical structure of the compounds. All molecules were drawn in Marvin Sketch60 version 15.6.29.0 and the calculations carried out on the lowest energy conformer, calculated employing molecular mechanics force field method MMFF94 as implemented in Chemaxon plugin cxcalc.60 For the calculation of the lowest energy conformer, a strict limit was imposed together with pre-hydrogenize and hyperfine options.
All statistical treatments, Principal Component Analysis (PCA), classification tree and graphical representations were carried out by Statistica61 version 8 and Microsoft Excel (Microsoft Corporation, USA).
P is the octanol/water partition coefficient, which is used in QSAR analysis and rational drug design as a measure of molecular hydrophobicity. The calculation method is based on the publication of Viswanadhan et al.63 log
P is calculated by default at the pH in which molecules are neutral or at their isoelectric point if they are zwitterions. We performed also a calculation forcing the pH at 13.3 (log
P pH 13.3) (corresponding to the reaction pH). Both calculations were run in presence of electrolytes (Cl− 0.25 mol L−1 and Na+ K+ 0.25 mol L−1) to better approximate the reaction conditions. log
D is the octanol/water distribution coefficient and was calculated at different pH values from 5 to 13.3 to consider acid, neutral and basic environments.
The variance explained by the obtained PCs is reported in the scree plot displaying the eigenvalues associated with a component (or factor) in descending order versus the number of the component (Fig. 4). The first Principal Component (PC1) explains approximately the 74%, PC2 the 15%, PC3 the 4% and PC4 the 3% of the total variance contained in the original data set. Since the total variance explained by the first 4 PCs was about 97%, the successive PCs were considered not statistically significant. Fig. 5a reports the score plot of PC1 versus PC2. It is useful to obtain the graphical representation of how the tested molecules are located in the new system of coordinates (PCs) and eventually how they are grouped by the new variables. The samples are reported in different colours corresponding to the intercalation yield: in red, the compounds intercalated with low yield, in yellow those with medium yield and in green those with high yield. It is possible to highlight that there are some partial groupings of the samples and by observing the corresponding loading plot of Fig. 5b it is possible to individuate which variables are responsible for the samples grouping. In fact, the loading plot reports the projection of each variable on the new reference axes (PC1 and PC2 in this case). By observing the loading of the variables (see Fig. 5b), it can be seen that the yield is a significant variable on both PC1 and PC2. In more details, on PC1 the yield has a quite high weight and proves to be anti-correlated to all the other variables being on opposite sides with respect to the PC1 axis. Regarding PC2, yield shows large positive weights together with log
P at pH 13.3 and log
D at pH 5 and 7. Anion charge and ring number are anti-correlated to the above reported variables, since they show a negative high weight. It can be noted that on PC2 the geometrical descriptors, describing sterical hindrance and molecular dimensions (labelled in blue in Fig. 5b) are grouped near to the zero and so they have low weights. The two topological descriptors, ring number and rotatable bonds (labelled in red), have medium weights and are anti-correlated.
In fact, rotatable bonds give flexibility to the molecule while aromatic rings are rigid. Finally, physicochemical descriptors such as log
P, log
D and anion charge have the highest weights on PC2. Unfortunately, the new variables PC1 and PC2 alone are not able to group the samples from the viewpoint of the yield, but it is already possible to individuate some partial separation. The most well separated group (Fig. 5a) collects the NSAID drugs (IBU, FLUR, TIAP, KET) except INDO (which has, in fact, a rather different and bulky structure) and is placed in the top right part of the graph, i.e. at positive value on PC1 and PC2. Its position on PC1 reveals, by comparison with Fig. 5b, that it is characterized by a high value of the yield and low value of all the other variables that are placed on the opposite part with respect to the PC1 axis.
On PC2 the NSAIDs are again grouped on the same side of the yield but this time also other variables (log
P pH 13.3, log
D pH 7 and log
D pH 5) contribute to their location. Moreover on the opposite side (at high negative value on PC2) are located two variables that indicate that NSAIDs are characterized by a low value of ring number and anion charge. All the samples intercalated with lower yields are scattered instead. The VG series, grouped on the left side of Fig. 5a, is of particular interest. In fact, these compounds are all large molecules with many rings, two deprotonated groups and rather bulky. Two compounds of this series, VG1-C8 and VG1-C10, are not intercalated while the similar compound VG1-C2 is intercalated with high yield but is isolated from the other well-intercalated compounds. A similar behaviour is observed for compound A. On the contrary, according to the molecule volume, number of charges and rings PFBS, 2-NSA and COUM should be easy to intercalate while SDS should be more difficult to intercalate, but we found from the experiments that this is not true. Therefore, from the analysis of the first two PC's it is evident that different mechanisms are involved in the intercalation process. In addition, some variables could be involved with non-linear synergistic or antagonistic effects. The yield information is not the predominant one since it does not emerge in the first two PCs that, as expected, account mostly for the structural information since it is the predominant information of the data set. It is important to examine the successive PC's that collect the residual information without the overlapping of the huge, but not informative, structural data. All the score and loading plots deriving from the PCA analysis are reported in ESI (Fig. SI5–SI8†) for completeness purpose. Here we describe in detail the results obtained for PC3. In fact, as evidenced in Fig. 6a that reports the score plot of PC3 versus PC2, the separation among the samples according to the intercalation yield was obtained. Three groups corresponding to the different amount of intercalation yield are well defined. The samples with high yield are all placed at negative value of PC3 and the not intercalated compounds are all placed on the opposite side on PC3 (high positive values). Fig. 6b reports the projection of the variables on the two new axes PC2 and PC3. The loading plot indicates that log
P pH 13.3, log
D pH 5 and log
D pH 7 are variables that drive the good intercalation of the molecules. In details as the yield is at the opposite side on PC3 a high yield can be obtained when the molecules have low values of log
P pH 13.3, log
D pH 5 and log
D pH 7. Moreover, also ASA pH 13.3 and 7.4, volume and avg. mol. pol. affect the intercalation yield. In particular, since they are on the same side of yield on PC3, an increase of the yield is expected when an increase of such variables is observed. By looking at the loading plot it can be seen that the descriptors regarding charge and polarizability (log
P, log
D, avg. mol. pol. and anion ch.) are the most influent in separating the scores as they are the most distant from the origin of the axes and thus have high weights on PC2 and/or PC3. Looking at the scores in Fig. 6a only one sample is out of the group, that of molecule C, which it is the only compound with anion ch. = 3 and negative log
P pH 13.3. The difference in the yield of intercalation between VG1-C2 and the other two squaraines, VG1-C8 and VG1-C10, appears to be due, beside to the number of rotatable bonds, to the different log
P pH 13.3, which is strongly negative for VG1-C2 and positive for VG1-C8 and VG1-C10. Affinity towards water seems to be in this case the driving force that allowed the intercalation of VG1-C2, which has shorter aliphatic chains (thus lower log
D), so a better affinity with water and lower molecular weight with respect to VG1-C8 and VG1-C10.
Therefore, the features that seems to drive the intercalation yields are principally the partition coefficient log
P pH 13.3 and the number of charges. The moieties with two negative charges must have a negative log
P pH 13.3 (hydrophilic molecules) in order to enter the layers. In particular, the molecules with two COOH groups that can be deprotonated are generally more difficult to intercalate, probably because the molecule has to “fit” the distances imposed by the charge distribution of the layers, which is quite dense. Carbonate, for example, has a double charge and it is very stable when intercalated because is a small anion and can interact with both layers but if the double charge anion is a large molecule it is much difficult to accommodate the two charges to fit the layer charges. In the case of VG1-C2 the accommodation in the layers is probably allowed by the fact that the molecule is large but quite flat and the charges are disposed symmetrically on the two ends of the molecule so it can interact with the top and bottom layers. Smaller molecules less flat and/or less symmetrical with multiple charges appear to be more difficult to be intercalated.
With the information gathered by the PCA analysis, we are able to point out that different mechanisms are involved and to obtain a first grouping of the samples based on the mechanism involved in the intercalation. Anyway, it is still difficult to discriminate how the variables involved in the different mechanisms affect the yield, to foresee it. To obtain a model that could predict the yield we performed a multivariate regression but no model has an agreement factor better than 60%. Therefore, this approach was abandoned.
Finally, we performed a data mining procedure using general classification tree models in order to try to better understand how many mechanisms are present and the variables affecting each mechanism. The result of the tree classification is shown in Fig. 7. At each level of the tree is indicated the classification variable and the threshold value. In the boxes at the end of the branches, the expected yield is reported and we added, for sake of clarity, the names of the samples falling in that branch were added. We employed the same colour scheme to highlight the yield values. The tree first separates the molecules according to the molecular weight, and then the heavy molecules are separated according to the number of rotatable bonds.
The heavy flexible molecules are then separated according to the refractivity index and then by log
D pH 7. In the same way, the other branches can be read in Fig. 7, leading to the six groups with their ID number indicated in the boxes in Fig. 7.
Starting from the left of the tree group 4 contains light (with mol. weight ≤ 282) and rigid (rot. bonds ≤ 2) molecules that gave low to medium yields. When we looked at the molecules contained in this group, we found they were the small molecules that surprisingly did not give good results. The next two groups (8 and 9) instead contain the molecules that achieved the highest yields. These molecules are all light, flexible and with low refractivity. One last distinction is made between those having log
D pH 7 below or above 1.56 giving a predicted yield of 0.92 and 0.99 respectively. If the refractivity is high instead, as in the case of KET, the yield is lower. On the other side of the tree, the molecules with a mol. weight higher than 282 g mol−1 are separated into two groups according to their log
P pH 13.3. In group 11 the molecules having a log
P pH 13.3 higher than −0.5 gave very low yield. In group no. 10 instead, the molecules with a lower log
P pH 13.3 gave mixed results. In this group, three molecules with more than one negative charge are contained. One with three charges and yield 0% (C), and two with two charges and medium/high yield (A and VG1-C2).
We concluded that multi-charged molecules might have a significant yield only if log
P pH 13.3 is lower than −0.5. However, the case of C suggests that this class of molecules must be analysed case by case because of their high density of negative charges, when packed in a layer, which difficultly fits the fixed disposition of positive charges of LDH layers.
As a proof of principle and to confirm the unexpected results, among the various possible cases in Fig. 7 the method was used to predict feasibility of intercalation by LAG of two molecules that given their features should fall in group 4. This group, in fact, collects small molecules, in principle more easy to intercalate than bulky molecules. In fact, they request only a limited swelling of the LDH layers and are easy to pack. Conversely, following the classification tree according to their features, summarized in Table 1, a low yield is expected. p-Aminobenzoic acid (PABA) and p-toluic acid (PTA) were chosen since PABA is a small molecule easy to intercalate in LDH by standard methods,65 while PTA, to our knowledge, was never intercalated into LDH (Fig. 8).
| Refract. | Mol. weight | Volume | Ring num. | Max. proj. area | Min. proj. size | Rot. bonds | |
|---|---|---|---|---|---|---|---|
| PABA | 38.01 | 138.15 | 120.03 | 1 | 49.17 | 3.39 | 1 |
| PTA | 38.36 | 136.15 | 125.39 | 1 | 49.55 | 3.53 | 1 |
| Anion char. | Avg. mol. pol. | ASA pH 7.4 | ASA pH 13.3 | log P pH 13.3 |
log D pH 5 |
log D pH 7 |
|
|---|---|---|---|---|---|---|---|
| PABA | 1 | 13.81 | 286.15 | 286.15 | −2.34 | 0.35 | −1.8 |
| PTA | 1 | 14.42 | 318.85 | 318.85 | −0.99 | 1.22 | −0.84 |
As foreseen by the classification tree, the intercalation of PABA and PTA with the standard recipe gave a low yield of 6% and 4% respectively (see XRPD pattern and fit in Fig. SI9 and SI10†). Moreover, in the XRPD pattern (Fig. SI9 and SI10†) the peaks of the crystalline PABA and PTA are still visible indicating scarce solubility or recrystallization outside the layers.
PCA analysis does not group all together the molecules that can be intercalated by LAG into LDHs with good yield results, suggesting that there is not a single mechanism involved in the intercalation. In fact, NSAID drugs are grouped in the cases plot suggesting a common reaction mechanism, while the others are spread suggesting different mechanisms related to their different chemical features. In fact, surprisingly pKa resulted not important in defining the yield of the intercalation, probably because in the basic conditions all molecules are deprotonated. Conversely, log
P pH 13.3, log
D pH 5 and log
D pH 7 variables, related to a good solubility in water and easy diffusion, drive the good intercalation of the molecules. According to the classification tree (Fig. 7) the molecules where grouped into six groups showing different chemical features and different intercalation yield. These groups can be used to estimate the expected yield as a function of the molecular descriptors. The molecules most apt to be intercalated by LAG have medium-low molecular weight, high flexibility and low refractivity. Conversely, large and hydrophobic molecules and, surprisingly, small but rigid molecules have a small success rate in LAG intercalation.
The predictive ability of the method was successfully tested.
The chemometry-assisted LAG approach, applied in this article to LDH intercalation, can be easily extended to other solid-state preparations.
Moreover, by a dedicated selection of the molecular descriptors, the proposed procedure can become a useful tool also for syntheses design and optimization in Materials Science.
Footnote |
| † Electronic supplementary information (ESI) available: Complete dataset, XRPD patterns and fit, complete PCA analysis. See DOI: 10.1039/c6ra17769g |
| This journal is © The Royal Society of Chemistry 2016 |