Supramolecular cages as differential sensors for dicarboxylate anions: guest length sensing using principal component analysis of ESI-MS and 1H-NMR raw data

A differential sensor based on cages discriminate guests according to their length.


Introduction
Differential sensing has recently become an analytical method capable of replacing traditional techniques based on direct molecular recognition. [1][2][3][4][5][6][7][8][9] This approach takes inspiration from the olfactory sense of mammals which can discriminate between complex odorant mixtures without the necessity for highly specialized peripheral receptors. 10,11 In broad terms, differential sensing employs a collection of low-selectivity receptors which give a peculiar signal for the analyte present in a solution. However, the large number of low-selectivity sensors is the key to overcome the problem of complicated matrixes. [12][13][14][15][16][17][18] The discrimination made by the receptors is achieved by a characteristic "ngerprint" related to each system sensor analyte. This characteristic pattern is usually represented by an ensemble of parameters which are not easily described via simple calibration methods. In order to give an easy-to-read interpretation of the data collected, this sensing methodology was coupled to statistical analysis techniques, like discriminant analysis and principal component analysis (PCA), which are intensively used in many elds of academia and industry. [19][20][21][22][23][24][25] Among the possible chemical approaches, the capability of dynamic covalent libraries (DCLs) to respond to external signals has been widely used for the recognition and signalling of chemical stimuli. [26][27][28] In particular, it is possible to employ complex systems characterized by multiple equilibria which can be perturbed by the presence of an analyte toward a particular product distribution. In this context, we have recently developed a new class of supramolecular structures, [29][30][31][32][33] in particular molecular cages, which have been synthesized using imine dynamic covalent chemistry (DCC). These cages are obtained by self-assembly of modied tris(-pyridylmethyl)amine (TPMA) complexes and different diamine linkers. [34][35][36] Variation of diamine linkers has allowed us to create a library of cages which have shown different selectivities toward a series of dicarboxylates (Fig. 1). ESI-MS studies, combined with 1 H NMR, revealed that binding energies correlated the dimension of the cage with the length of the dicarboxylate (viz. the diamine linker length denes the distance between the two metal centres and, as a direct consequence, the preferred length of the guest included). For example, ethylenediamine linkers E in cage C n @E-E-E direct the system toward the preferential inclusion of adipate C 6 as the best guest (Fig. 1a). On the other hand, the longer diamine linkers m-xylylenediamine X in cage C n @X-X-X lead to preferential binding of sebacate C 10 (Fig. 1c). 34 The peculiar capability of these cages to differentiate dicarboxylates by their length, combined with the dynamic nature of their formation, made this molecular system an ideal candidate for the development of a differential sensing array for dicarboxylate guests. Moreover, inspired by the systems reported by Anslyn 1,3,14 and Alfonso, [37][38][39][40][41][42][43] we checked if the differential sensing technique worked with raw data extracted directly from ESI-MS or 1 H NMR analysis. This approach led to a system capable of discriminating dicarboxylate guests in the full range between C 5 and C 14 and unexpected results from the use of 1 H NMR. The obtained results pave the way to another peculiar functional property, among the many other properties that supramolecular cages have shown so far. [44][45][46][47][48][49][50][51][52][53] Results and discussion Development of a differential sensing array using ESI-MS data In order to develop a differential sensing system, we set up a series of experiments in which the DCL is allowed to form different cages incorporating different linkers (viz. diamines) in the presence of a single guest (viz. dicarboxylate). In other words, rather than focusing on the binding selectivity of a series of guests towards a single cage, we investigated the formation of a cross-reactive array of multiple cages towards a single guest. In detail, the DCL consists of a solution containing one equivalent of dicarboxylate guests ranging from C 5 to C 14, two equivalents of complex 1 and a mixture of the selected diamines E, P, and X. Under the experimental conditions, 3 equivalents of each diamine were added. In addition, to compensate the amine excess, p-anisaldehyde was added to the reaction mixture ( Fig. 2a).
In a typical experiment, the dynamic system explores all the possible combinations of binding between the dicarboxylate under study and all possible molecular cages, and the system equilibrates thermodynamically toward the more stable inclusion cages distribution. 72 hours aer mixing, the reaction mixture, diluted to an appropriate concentration for the MS technique, was injected Selectivity profiles for cages (a) C n @E-E-E, (b) C n @P-P-P and (c) C n @X-X-X among the guest series ranging from C 4 to C 14 (I C n / P I C n represents the value of the relative intensity of the monoisotopic peak of each inclusion species among the guest series). The counteranion is perchlorate for the cage metals and triethylammonium for the carboxylate guests. into the ESI ion source. The typical MS trace displayed a series of m/z peaks corresponding to the different inclusion cages. For example, in Fig. 2b is reported the spectrum for the DCL experiment performed with C 8 as the guest in which the dicharged peaks related to all ten possible formed cages are present. This experiment was extended to the series of guests ranging from C 5 to C 14 , and the cage distribution is different for every single guest, as shown in Fig. 3a where the ESI-MS spectrum of each DCL experiment is reported (Fig. S1 †).
At rst glance, it can be reckoned that moving from C 5 towards longer guests results in an increase of the presence of cages containing the longest diamine in the series m-xylylenediamine X. A attening toward a similar distribution can also be noted in the case of longer guests (C 11 -C 14 ). In other words, the DCL system responds similarly to longer dicarboxylate.
In order to gather more information, we employed principal component analysis (PCA), an unsupervised technique used to reduce the dimensionality of data space. As the data source, we extrapolated the relative monoisotopic peak of the included cages for each dicarboxylate guest (Table S1 and Fig. S2 in the ESI †). This chemometric tool allowed us to generate a new set of variables, named principal components (PCs), to explain the variance of the system. Each DCL experiment was repeated four times for each guest to evaluate the strength of the analytical method (see Section S3.1 in the ESI †).
In the PCA of our DCL system, the different dicarboxylates showed effective separation, allowing for discrimination based on the chain length (Fig. 3b). The two main principal components PC1 and PC2, which account for almost 95% of the whole variance, perform an arch disposition which is in accordance with the length of the guests. Although PC2 accounts for 9.56% of the variance and it is not easily associated into a chemical property of the system, the disposition of the data in the score plot could be interpreted by taking into account the "horseshoe effect" which is typical for a unimodal distribution. 54,55 These results are also explained by the loading plot (Fig. S6 in the ESI †) which describes how positive values of PC1 correspond to the formation of cages containing longer linker X. In contrast, negative values indicate that mainly cages containing linker E are formed. However, the system has a strong tendency to promote the formation of mixed cages with the three different linkers C n @E-P-X, due to the stoichiometric ratio of diamines in the DCL system.
The considerations made directly for the ESI-MS data reect the results obtained with the PCA; while short chains are well distributed in the rst part of the arch, a smaller distinction is observed for longer guests.
However, extrapolation of the monoisotopic peak intensities is already a reduction of the data information. For this reason, we decided to perform the PCA on the normalized raw ESI-MS data (Section S3.2, ESI †). 56 In this case, we introduce 126 points for each guest in the soware four times. This reduces the data handling of the operator which could give rise to more information and thus better separation of the different guests. As expected, the results obtained by performing the PCA using the extrapolated monoisotopic peaks intensities (Fig. 3b) and the raw ESI-MS data are in agreement (Fig. 3c) in terms of the distribution of the different guests. However, the PCA obtained from the raw data analysis (i) increases the separation between the guests and (ii) allows for better clustering of the repeated analysis. In other words, the additional information present in the raw data allows for a better reproducibility of the measurement and it leads to a higher sensitivity of the method.

Differential sensing with 1 H NMR
PCA over ESI-MS data conrmed the discrimination capabilities of the cage sensor array. In addition, the PCA performed using raw data provides more information of the system allowing a better discrimination of the guest length.
However, the sensing capabilities were already noticeable from a "rst glance" inspection of the ESI-MS trace. For this reason, we decided to investigate the response of our reactive array over 1 H NMR spectroscopy, a technique that in principle should display very similar spectra to the cages formed.
To explore this possibility, we tested our sensing system in DMSO-d 6 using the same experimental conditions described previously.
As expected, the resulting one dimensional 1 H NMR spectra recorded aer 72 hours related to the ten different experiments are similar and the differences are difficult to interpret since each reaction mixture contains at least ten different cages (Fig. 4 and S3 in the ESI †).
Using the raw data from 1 H NMR, we built up a PCA taking into consideration the whole spectral region related to the cage signals (3.8-9.5 ppm). However, the whole spectra did not give any valuable information about the guest included (Fig. S7 †). Therefore, we decided to focus our analysis on four different spectral windows which showed characteristic signal variations (Fig. 4).
The regions selected cover, respectively, the signals of the aprotons of the pyridine (8.9-9.3 ppm, green region in Fig. 4) and the imine protons of the cages (8.3-8.6 ppm, violet region) and two regions of the cage's aromatic protons related to the phenyl rings (7.9-8.0 ppm red region; 7.7-7.9 ppm blue region) (see ESI † Section S3.2). 57 The PCA performed by analysing the data between 7.7 ppm and 7.9 ppm (Fig. 4d) shows that the system could discriminate between the length of different guests in the PC1/PC2 plane which accounts for 77% of the variance. In particular, only the intervals between C 5 and C 10 are well differentiated. A particular discrimination arises from the PCA performed between 8.9 and 9.3 ppm where a clear distinction between the odd and even length alkyl chains of the guests is observed by plotting PC1 vs. PC2 (Fig. 4c). The observed result is in agreement with previously characterized cages which report a lower chemical shi of the aprotons of the pyridine in the presence of even length guests. 35 However, when the PCA was performed in the region between 7.9 ppm and 8.0 ppm, the length of the guests was discriminated and the PCA displays the above-mentioned "horseshoe effect" (Fig. 4a). A similar result was obtained considering the signals related to the imine protons of the cage (Fig. 4b). To summarize, from the analysis of 1 H NMR spectra which present indistinguishable differences, it is possible to gather information for the development of a sensor array. However, a careful evaluation of the spectral region should be taken. In particular, the large amount of numerical information corresponding to the whole spectra does not correspond to an increase in the discrimination capability of the sensor array, while this capability is achievable considering specic spectral regions.
Projection of an unknown sample in the PCA space In order to validate the recognition system, three unknown samples were chosen to cover the whole carboxylate lengths (C 5 , C 8 and C 13 ) and analyzed using MS and 1 H NMR spectra. The corresponding experimental data were projected on the component space and compared to the original data using a prediction script (see Section S3.3 in the ESI †). Interestingly, the unknown samples are close in the vectorial space to previous validation samples. MS is able to predict the length of the system, and NMR conrms both its capability to predict the length and the oddeven character of the carboxylates (Fig. S12-S14 in the ESI †).

Conclusions
In this work, a cross-reactive array of multiple cages for the differential sensing of guest length was developed. In order to achieve this objective, a series of experiments involving a mixture of three different linkers were performed and analyzed with ESI-MS and 1 H NMR. The data obtained from the resulting spectra were used to form matrix data-sets which were statistically analyzed through PCA. The resulting scores for the ESI-MS spectra show that the system was able to discriminate guests according to their length. In particular, the array was able to efficiently distinguish all the guests in the full range from C 5 to C 14 using the monoisotopic peaks of the cages formed and the raw ESI-MS data as the input for the analysis. The PCA of 1 H NMR spectra was able to distinguish odd and even guests, therefore providing information on structural features related to the guests. In addition, the prediction of unknown guests within the PCA space was evaluated and the results allow us to extend the use of the developed methodology also for the evaluation of unknown samples. More importantly, the developed methodology which extends the chemometric analysis to two techniques less studied in combination with PCA highlights the advantages and precautions in the case of the use of raw data.

Conflicts of interest
There are no conicts to declare.