F. Azimi and
M. H. Fatemi*
Department of Analytical Chemistry, Faculty of Science, University of Mazandaran, Babolsar, Iran. E-mail: mhfatemi@umz.ac.ir; Fax: +98-11-35302350; Tel: +98-11-35302395
First published on 8th November 2016
Multivariate curve resolution with alternating least squares optimization (MCR-ALS), as a soft modeling approach based on factor analysis, was proposed to recover the thorough gas chromatography-mass spectrometry (GC-MS) fingerprint analysis of volatile chemical constituents in Iranian Citrus aurantium L. peel. This technique for two-dimensional data was intended to resolve the overlapping and/or embedded GC-MS peaks into the pure chromatogram and mass spectrum of each chemical constituent, overcoming some challenging fundamental chromatographic problems occurring during GC-MS analysis of the Iranian C. aurantium peel chromatographic fingerprint, such as spectral background, baseline offset, and type of noise. In this way, the chromatographic fingerprints of Iranian C. aurantium peel are properly segmented to the appropriate chromatographic regions, and then MCR-ALS is used to achieve pure response profiles of the chemical constituents in each segment, as well as their relative concentrations. Retention indices, together with mass spectral profiles of pure chemical constituents, were considered for qualitative identification by matching against the standard ones through MS library searching; an overall volume integration (OVI) technique was also used for the semi-quantitative analysis (to obtain the relative concentrations of chemical constituents). GC-MS analysis of the C. aurantium L. peel, with the help of the proposed methodology, resulted in extending the number of identified constituents from 45 to 82 with concentrations higher than 0.01%. The lack of fit (LOF), percent of variance explained (R2) under the optimum conditions, and reverse match factor (RMF) were used for the assessment of the MCR-ALS solutions. The LOF values of MCR-ALS models were lower (12.0%) for all segment matrices with RMF values in the range 713–977 and R2 values higher than 97%. It was found that the major constituents of Iranian C. aurantium L. peel are limonene (72.89%), β-myrcene (9.06%), α-pinene (4.74%) and β-pinene (3.44%). It is concluded that the coupling hyphenated GC-MS measurements with the multivariate curve resolution-alternating least squares method is an effective and powerful strategy to solve current problems in GC-MS analysis, to obtain the required analytical selectivity in complex natural products.
Chemical fingerprints obtained by hyphenated chromatographic techniques offer a qualitative and integrated profile of all data points related to multiple characteristic constituents for the purpose of identifying complex samples.16,17 Among the different chromatographic fingerprinting techniques, gas chromatography-mass spectrometry (GC-MS), as one of the so-called second order separation methods, is one of the most promising and widespread analytical techniques for the determination of volatile constituents of complex real sample extracts.17–19 However, in GC-MS analysis of highly complex samples, due to the existence of various challenging problems, such as spectral background, baseline drift, different types of noise, and some co-elution (overlapped and/or embedded) peaks, obtaining selective information, even under the best experimental conditions, is not possible.19 These challenging issues may originate from the experimental conditions, the variability of GC-MS systems, such as chromatographic and/or detection systems, and the complexity of natural samples.20 Among these problems, co-elution (peak overlap) is perhaps one of the most important and most observed difficulties in chromatographic analysis, which occurs mainly due to the inadequate selectivity of chromatographic columns, peak capacity, and the need for faster chromatographic analysis. The existence of these problems can reduce the similarity indices (SIs) obtained from a direct search in an MS database and, therefore, correct identification of the constituent cannot be achieved.21,22
The chemical composition of C. aurantium L. peel has been studied by different researchers.11,12,15,23,24 The major constituents found in many studies were limonene, β-myrcene, β-pinene, α-pinene, β-ocimene, linalool, γ-terpinene, sabinene, and octanal. Sarrou et al. investigated the volatile constituents and antioxidant activity of the peel, flowers and leaf oils of Citrus aurantium L. growing in Greece based on GC-MS measurements.11 They reported 26, 20, and 16 constituents, in the flowers, peel, and leaf oils of Citrus aurantium L., respectively. They found that peel oil is composed mainly of monoterpene hydrocarbons (98.30%), mainly limonene, while oxygenated monoterpenes are the dominant compounds of leaf and flower oils. Babazadeh Darjazi reported 29 volatile constituents, including 18 oxygenated terpenes and 11 non-oxygenated terpenes, followed by GC-FID and GC-MS analysis.23 Moreover, Lota et al. reported that 39 components of peel oil and 47 components of leaf oil were identified by mass spectrometry/retention indices (GC-MS analysis), and the major components (≤0.05%) by 13C-NMR spectroscopy.15 Their chemical composition was investigated by capillary GC, GC-MS, and 13C-NMR. The most recent studies analyzed desired peel samples, as a complex matrix, through a comparison of Kovats retention indices (RI), their retention times (RT), and mass spectra of authentic samples or literature data, in order to identify their constituents. Identification of the constituents of C. aurantium L. peel sample by GC-MS was the subject of several investigations, but because of the aforementioned problems with this technique, only a small number of chemical constituents were identified. On the other hand, the minor constituents are responsible for the specific odor and flavor, and they may also have important therapeutic value. Additional information is present in GC-MS chromatograms that cannot be extracted merely by using specialized chromatographic technologies. As a result, the qualitative and quantitative results obtained were not reliable or satisfactory. To overcome these problems, extraction of the required information about the constituents in complex matrices has become possible by means of chemometric resolution methods.
Over the past few decades, different multivariate curve resolution (MCR) techniques have matured to improve the sensitivity and resolve separation issues encountered in GC-MS analysis. These mathematical resolution techniques, by applying a bilinear model, attempt to exploit the chromatographic and spectral differences between all the constituents present in a particularly complex mixture (even if their differences are very small), in order to acquire more informative data from the chemical analysis, both in the spectral identification (qualitative information) and chromatographic separation (quantitative information) of the constituents.21,25,26 Many iterative and non-iterative curve resolution approaches have been offered that can all be classified as multi-component factorization of multivariate data techniques.26 The advantages of applying MCR methods to determine the pure contribution of each constituent involved in the system are obtaining an increased resolution power with reduced chemical use, cost, and time, allowing a better understanding of the chromatographic process.18,26 In this study, the volatile constituents of Iranian C. aurantium L. peel were extracted using a static headspace (SHS) technique and were analyzed by GC-MS under suitable conditions to achieve informative chromatographic fingerprints. Multivariate curve resolution based on alternating least squares optimization (MCR-ALS), along with other chemometric methods, was applied to provide chemically meaningful quantitative and qualitative profiles of the pure constituents present in an Iranian C. aurantium L. peel chromatographic data set. The advantages of MCR-ALS over other MCR methods suggested in the literature are that the application of different constraints during the optimization process and generalization to high-complexity data sets is quite simple.27,28 Parastar et al. have proposed extended MCR-ALS and multivariate clustering methods to improve the analysis of GC-MS fingerprints of secondary metabolites in eighteen citrus samples (including eight lemon (C. limon), five orange (C. sinensis), three mandarin (Citrus reticulate), and two grapefruit (Citrus paradise)) through a multi-set augmented data structure.29 The constituents offered in this work were main and commonly identified metabolites for eighteen citrus samples with noticeable relative concentrations, which were relevant for subsequent principal component analysis (PCA) and k-nearest neighbor (KNN) cluster analysis. Finally, they used a counter-propagation artificial neural network (CPANN) supervised method in order to characterize the chemical markers (chemotypes) responsible for the differentiation of four determined clusters by PCA and KNN. To the best of our knowledge, the constituents of C. aurantium L. peel extract have not been identified using a combination of GC-MS and chemometric resolution methods so far. Therefore, the main aim of the present work is based on the identification and semiquantitative analysis of pure contributions of individual constituents present in an Iranian C. aurantium L. peel sample. Inspection of the results confirmed the potential of MCR-ALS as the most effective strategy, saving work and time in overcoming chromatographic challenges in the GC-MS fingerprints of volatile chemical constituents in a given sample.
Xm×n = Cm×kSTk×n + Em×n | (1) |
In order to give physically meaningful and chemically interpretable solutions, and a limited number of possible solutions to maximize the data variance explained by the different constituents, a series of constraints are applied to C and (ST) at each iteration, such as non-negativity46,47 for C and ST matrices and unimodality48 for C, normalization spectra,18,20 and selectivity.49
The convergence criterion can be a difference in fit improvement in two successive iterations, such that if this relative difference in fit is less than a threshold value (predefined cut-off value, usually 0.1%; of course, depending on the step optimization, this may be modified by the user),32 the optimization is finished, or sometimes a preset number of iterations may be applied as the stop criterion.17 In most applications, in MCR-ALS, one opts to normalize the spectral modes in the columns of matrix S to equal length, without changes in data fitting, in order to avoid scale and intensity ambiguities during the ALS optimization algorithm, and so that the retention time profiles can be directly inferred as relative concentrations.40,50 The final MCR-ALS solutions are a set of pure chromatographic and spectral profiles for each instrumental response signal, and the quality of these factors is related to the model fit. The most common methods for this assessment are statistical parameters, such as percent of lack of fit (LOF) and percent of variance explained (R2) under the optimum conditions (see ESI† for definitions of these parameters). Also, other parameters related to assessing the resolution results and identifying chemical constituents for two-dimensional GC-MS data are the match factor (MF), or reverse match factor (RMF), and also the comparison of GC retention indices (RIs).
The LOF and R2 quantities allow a simple comparison between different methods and models in a description of the same data set.32,42 In the presence of data with a very low noise level, the LOF offers more distinction between similar models, but in the case of high-noise-level data and a greater number of chemical components, R2 is preferred, as larger model unexplained variances mean that data fitting is not good. Both LOF and R2 values should be judged with expected levels of experimental noise.27,44 The match factor (MF) and reverse match factor (RMF) reflect the likelihood that the resolved mass spectral fragmentation patterns and reference spectra of the standards in the NIST mass spectral search program or the mass spectra from the literature arose from the same compound. However, when studying complex systems containing significant background noise and non-selective separation (co-elution problem) by the GC column, RMF is the preferred technique, because it only matches the abundance values of resolved mass spectra in common with each reference spectrum, rather than a peak-by-peak match as in MF.51 RMF defined the normalized dot product with square root scaling of the MCR-ALS resolved mass spectrum and the library NIST standard mass spectrum, without including the elements of the resolved mass spectra that are not present in the library spectrum.29,51 In addition, to improve determination of the constituents, the temperature-programmed retention index (RI) proposed by Van den Dool and Kratz22,52 in a quasi-linear equation is calculated for all the constituents as follows;
RIx = 100n + 100(tRx − tRn)/(tRn+1 − tRn) | (2) |
(1) Data preprocessing: two-way data obtained from GC-MS can suffer from a number of fundamental non-informative defects, such as baseline drift, spectral background, low S/N ratio (heteroscedastic and homoscedastic noise), and many others. The occurrence of these artifacts in both chromatographic and spectrometric dimensions can result in the appearance of more components in each segment matrix than what is actually expected. Therefore, to increase the efficiency of MCR techniques in accurate identification of minor chemical constituents, and to resolve embedded or overlapped peaks, both the artifacts and the number of chemical components should remain at a certain level in a segment matrix.42,44 In order to achieve this goal, some preprocessing methods must be handled for some data sets. Due to the presence of many noise channels in GC-MS data obtained in full scan mode, removing this noise would result in a faster computation. In the present work, a morphological score proposed by Shen et al.53 is applied to distinguish the signal from the noise channels based on their frequency difference. This method was used to decrease the homoscedastic noise in two-way data. The signal channels that had morphological scores below the noise limit were deleted. In addition, a Savitzky–Golay smoothing filter,54 by employing the regression-fitting capacity based on least squares, was used to transform heteroscedastic to homoscedastic noise, then reduce the heteroscedastic noise in each data matrix. In order to quantitatively assess the system under study, the baseline must be stable during analysis. Furthermore, the existence of a spectral background in the MS dimension can affect the identification of the constituents in the system.26 Finally, simultaneous background and baseline correction was performed using a congruence analysis method and least-squares fitting developed by Liang et al. on the chromatographic data.29,53,55 Then, each pre-processed data matrix was scaled for having a maximum signal intensity of 1.0.38
(2) Determination of the number of chemical constituents: most methods applied to estimating chemical rank, and finding the direction of the relevant sources of variation in a bilinear data set, are based on PCA or singular value decomposition (SVD). However, because of the accumulation of noise in complex systems analyzed by GC-MS, achieving a true rank for the full data matrix using PCA is difficult. In addition in the presence of high levels of embedded noise, the vector spaces determined by PCA and SVD are incorrect.31,41,56,57 In order to achieve more reliable results, another category of methods based on finding a group of the purest variables (columns or rows), giving the most dissimilar column and row profiles of the data matrix with the reference profiles, was introduced. The simple-to-use interactive self-modeling mixture analysis (SIMPLISMA), orthogonal projection approach (OPA) and simplified Borgen method (SBM) belong to this category.41,58
In the present work, the number of constituents was preliminarily estimated by means of PCA or by the SVD algorithm.31,58 Also, for complex chromatographic regions, to ensure the correct number of chemical constituents, a morphological score technique was employed, using the OPA and SBM methods for key variable selection.20,41,57 In this technique, after noise elimination in the mass channels in each segment matrix, based on a morphological factor criterion, the number of constituents was estimated using the same technique (morphological score).59 Furthermore, subspace comparison as a purity-based method was also used to confirm the number of constituents present in each submatrix.60
On the other hand, local rank information offered by evolving factor analysis (EFA), and fixed-size moving window-evolving factor analysis (FSMW-EFA) procedures were applied to obtain a more accurate estimation of the number of chemical constituents and peak purity assessment of the bilinear data.61 For a more detailed discussion of factor analysis methods, refer de Juan.62,63 However, finally, the change in lack of fit (LOF) values of the MCR-ALS model by using fewer and more constituents was applied as a criterion to confirm chemical rank.
(3) Initial estimate and chemometric resolution: the preliminary information given by an exploratory analysis during the chemical and local determination process can be used to set up good initial estimates for either concentration or response profiles.31 It is important to note that the use of random and irrational initial guesses during ALS optimization can cause this algorithm to be stuck in local minima instead of global minima, and ultimately result in insufficient curve resolution, since a poor initial estimate does not obey the imposed constraints during the optimization and, as a result, does not provide a profile with clearly definable chemical meanings while saving computational time.43 Hence, methods that can be applied for process-like data based on their evolutionary nature, such as EFA62 and general methods, which can work irrespectively of the presence or absence of a sequential elution structure in the concentration direction, with the aim of direct selection of the uncorrelated variables of the analyzed data matrix, such as OPA and SIMPLISMA,58 were used for initial estimates in this work. Depending on the segment matrix conditions under analysis, one of two methods works better. In the present work, MCR-ALS decomposition was implemented with the introduction of additional knowledge of the natural properties of the constituent profiles through the application of the appropriate constraints of non-negativity in both elution and spectral directions to force the profiles to be equal to or greater than zero, unimodality in the elution direction to allow the presence of only one maximum per peak, and spectral normalization to have a length of one (which means the concentration profiles are not normalized), to improve the recovery of meaningful solutions. Finally, by continuing the iterative method, with successive estimation of C or ST alternately under the mentioned constraints with an increasing number of cycles, convergence was tested. When in two consecutive cycles, the relative differences between the sums of the squares of residuals (SSR) are below a preset selected value, convergence is then fulfilled and the optimization is finished.33
(4) Evaluation of the reliability of resolution results: the statistical terms obtained for each chromatographic segment during the implementation of the MCR-ALS technique, such as lack of fit (LOF in % for ALS optimization, PCA) and percent of explained variance (R2) at the optimum were used for evaluation of the resolution results and choice of the best MCR-ALS model. The best model fitting of the experimental and resolved matrices is achieved when the LOF and R2 are close to zero and one, respectively.27
(5) Qualitative and semi-quantitative analysis: after finding the resolved mass spectral profile for each constituent, the constituents have been identified by similarity match in reverse mode (RMF) using the standard spectra in the NIST MS database. The identification and assessment results were more precisely confirmed with the help of RIs.22,52,64 On the other hand, quantitative analysis of the constituents was performed using the overall volume integration (OVI) algorithm.65,66 This is based on summation of the integrated peak areas at every m/z point for each constituent, and allows the relative amount of the constituents in the whole TIC to be obtained. The OVI algorithm is preferred over the total peak area integration, since all mass spectral points are considered in the calculation. These steps were applied for all of the segment matrices in this work.
For this purpose, the whole TIC was segmented to 53 chromatographic regions by zero component regions along elution of the volatiles of Iranian C. aurantium peel extract. Some of these segment matrices were single-component. Peaks associated with these constituents could be simply identified and quantified using NIST MS library searching and peak integration in MSD Chemstation software. However, to give more reliable results, single-constituent peak clusters were pretreated using the MCR-ALS method. The results obtained were much better than those achieved with ChemStation with respect to the reverse match factor (RMF) and percentage of each constituent. Accordingly, two-dimensional data obtained from all of the segment matrices was extracted from MSD Chemstation software and was then changed to ASCII format, which is compatible with the MATLAB environment. Each data matrix gives a peak cluster. Therefore, there were 53 peak clusters in the TIC of Iranian C. aurantium L. peel extract, of which the individual peaks were analyzed by MCR-ALS according to the proposed strategy in Section 3. In order to demonstrate the efficiency of the applied chemometric resolution method, three problematic regions were selected and marked as peak clusters A (13.25–14.36 min), B (15.97–16.11 min) and C (21.55–21.73 min), as examples, with their TICs to obtain a better visualization of the enlarged detailed pattern of the GC-MS fingerprint. Here, the figures and results for peak clusters B and C are shown, and also the figures and results for peak cluster A are available in the ESI† section.
The exported data matrices of peak clusters B and C are displayed in Fig. 2a and b with sizes of (27 × 151) and (31 × 151), respectively. These specific peak clusters were selected with the aim of showing the performance and then evaluating the application of the MCR-ALS method in multi-component systems by changing the degree of overlapping and the presence of different amounts of other chromatographic challenges.
Considering region B in Fig. 1, which is also displayed in Fig. 2a, this appears to be a mixed system of two co-eluted constituents. Furthermore, a direct library search for peak cluster B showed only two constituents, named 1,6-octadien-3-ol,3,7-dimethyl-(β-linalool) (C10H18O) and nonanal (C9H18O). However, after MCR-ALS analysis, different results were achieved. Because first, as mentioned in the proposed strategy in Section 3, baseline/background correction and noise reduction on most of the peak cluster was performed. Congruence analysis and the least squares fitting method introduced by Liang and Kvalheim were applied for baseline/background variation correction. In this method, the essential information for presenting the univariate linear regression with regard to the retention time can be provided by using the local rank analysis of zero component regions. Then, the baseline will be corrected. In addition, homoscedastic noise is reduced by morphological score methods and a Savitzky–Golay filter with a polynomial order of polynomial 2, and a five-point filter was applied for heteroscedastic noise reduction. All of these methods can be easily performed with MCRC software as a chemometric tool for the analysis of two-dimensional chromatographic data,31 and then the morphological score and subspace comparison were used to determine the number of chemical constituents for the pre-processed peak clusters. Fig. 3a shows the results of the morphological score method for cluster peak B. In this figure, to give the purest constituents by the orthogonal projection approach (OPA), the morphological scores of these pure constituents were plotted against the number of constituents. The morphological score for the noise level was estimated by means of an F-test55,59 and is displayed in dotted-line form. Based on the morphological score plot in Fig. 3a, there are three constituents in peak cluster B. This is found by counting the number of significant variables with morphological scores above the noise level. Moreover, subspace comparison methods are used to confirm morphological score method results. Subspace comparison analyzes the key factors similarly to the morphological score method, through the comparison of two subspaces, each of which is determined using a suitable procedure, such as OPA, SIMPLISMA, and PCA, by applying a set of orthonormal vectors in order to select factors.60 These look for the most pure or the most dominant variables. PCA extracts the most dominant linear combination of the actual variable subject to an orthogonality constraint; OPA considers dominant variables among the actual variables in the data matrix, and SIMPLISMA places more importance on purity than on dominance.67
The results of PCA-SIMPLISMA subspace plots in Fig. 3b confirm the presence of three constituents. In this method, the key factors or number of constituents could be concluded from the largest value of K (as columns of subspace matrices with n × k dimensions), when D(k) and sin2(θK) are equal and are close to zero. In addition, a local rank map, obtained by EFA and FSMW-EFA methods as a data microscope, was used to obtain a more accurate estimation of the number of chemical constituents, based on the use of local information on the elution patterns of constituents in each region. In contrast with the EFA method, which acts with PCA analyses on an increasing size of window, the FSMW-EFA method performs PCA analyses on a moving window with a fixed size. The FSMW-EFA method particularly gives more reliable results when detecting impurities or minor compounds under a main peak than the EFA method,65,66 due to performing the local analysis on small elution windows. In this method (FSMW-EFA), the noise level is determined by eigenvalue curves with similar numerical values and emerges together at the bottom. Eigenvalue curves exceeding the noise level display the emergence of new constituents. The FSMW-EFA plot with a fixed size 5 window for peak cluster B showed that there are three eigenvalue curves above the noise level within these regions, which indicated that peak cluster B is definitely not a two-constituent cluster. In this plot, the names of the constituents present in cluster B are marked by their elution order. From the FSMWEFA plot in Fig. 4, one can conclude that regions 1 and 3, having one curve above the noise level, are pure regions of the first and third constituents; the regions 1 + 2, 1 + 2 + 3, and 2 + 3, containing two or more curves above the noise level, are overlapping regions of the first and second constituents, by the first, second and third constituents and by the second and third constituents, respectively. According to the results obtained from the FSMWEFA analysis, it can be seen that there are no selective regions for the second constituent. Therefore, the MCR-ALS method is a feasible way of resolving such a peak cluster. Then, peak cluster B was resolved by the MCR-ALS method with the initial estimate of the concentration profile being calculated by the EFA method and under the proper constraints, such as non-negativity, unimodality, and normalization of spectra during ALS optimization. Resolved elution profiles for three constituents in this chromatographic region, obtained by using this technique, are displayed in Fig. 5. Also, their corresponding resolved mass spectra, with the standard spectrum of each constituent from the NIST/Wiley MS database are shown in Fig. 6.
Theoretical parameters of lack of fit (LOF) as % (for PCA and exp) and explained variance (R2) at the optimum as % were equal to 0.39, 1 and 99.98 respectively. The LOF and R2 values for the optimum MCR-ALS model were satisfactory according to the noise level in this region. Then, each constituent was identified by matching the resolved spectral profile with stored mass spectra in the NIST mass database, and confirmed by comparison of their retention indices. The results showed that β-linalool (C10H18O), dodecane (C12H26), and nonanal (C9H18O) were identified in peak cluster B, with values of the reverse match factor (RMF) equal to 977, 869, and 930, respectively. As can be seen, reliable values of the LOF and R2, together with quite high spectral matches for identified constituents, despite heavy overlap with each other, ascertain the possible identity of the resolved constituents in this chromatographic region. From the resolved chromatographic peaks in Fig. 5, it is clear that the second constituent peak of this peak cluster is embedded in the first constituent peak and also contains a very low quantity over the whole region. Because of these factors, the second constituents were not found in a direct NIST/Wiley MS library search.
Likewise, Fig. 2b shows a TIC curve of peak cluster C, which seems to represent the co-eluted regions of two constituents. Also, the mass spectra of different parts of peak cluster C indicate that there could be more than two constituents or severe noise. However, only three constituents, namely acetic acid, decyl ester (C12H24O2), and dodecanal (C12H24O) can be directly matched in the NIST MS library.
However, after MCR-ALS analysis, different results with more information were achieved for this peak cluster. Firstly, the background and noise were removed using the applied methods for peak cluster B. Then, in a preliminary inspection, morphological score and subspace comparison methods were used for determination of chemical rank. The results of these methods on preprocessed peak cluster C are illustrated in the ESI section (Fig. S6a and b†). It can be seen that region C is a four-constituent system.
Also, in order to verify the obtained rank estimation and peak purity control of peak cluster C, as with peak cluster B, EFA and FSMWEFA were applied. The rank map obtained by using the FSMWEFA method with a fixed size 6 window on peak cluster C is presented in the ESI section (Fig. S7†). This figure showed that there are four eigenvalue curves above the noise level within this region, which indicated that peak cluster C is definitely not a two-constituent region. According to this figure, regions 1 and 4 are selective regions of the first and fourth pure constituents, and regions 1 + 2, 1 + 2 + 3, 2 + 3 + 4, and 3 + 4 indicate the overlapping regions of constituents 1 and 2, by 1, 2 and 3, by 2, 3 and 4 and also by 3 and 4, respectively. According to the results obtained from FSMWEFA analysis, it can be seen that peak cluster C is much more complex than peak cluster B, since there are no selective regions for some of the constituents (2 and 3). Using this prior knowledge of constituents, MCR-ALS analysis was run using an initial SIMPLISMA estimate of the spectral profile for peak cluster C to start ALS optimization under the applied non-negativity, unimodality, and spectral normalization constraints.
The resolved MCR-ALS chromatographic profiles of peak cluster C and their corresponding resolved mass spectra, together with a standard spectrum of each constituent from the NIST/Wiley MS database, are presented in the ESI (Fig. S8c and S9†).
The percent of LOF (for PCA and exp) and R2 values for the optimized MCR-ALS model were 2.35, 4.59, and 99.78, respectively, for peak cluster C. The similarity between the resolved and standard mass spectra and comparison of their retention indices showed the reliability of the resolution method with confirmation of the presence of new detected constituents, cyclohexane, 1-ethenyl-1-methyl-2,4-bis(1-methylethenyl) (C15H24), and benzene, 1,2-dimethoxy-4-(2-propenyl)-(eugenol methyl ether) (C11H14O2) in region C, in addition to constituents that were previously detected by MSD Chemstation software.
The RMF values for the four constituents identified after resolution by the proposed strategy in region C were 936, 969, 862, and 850. Also, according to resolved chromatographic peaks in Fig. S8c,† the four constituent peaks of this region have significantly overlapped, and of course due to the presence of background noise in the TIC of this peak cluster, some constituents could not be identified. But after applying the MCR-ALS method, the number of defined constituents in peak cluster C was improved from 2 (direct mass spectra search) to 4 (ESI methods, Fig. S8c†) with satisfactory statistical parameters, considering the noise level in this region and high RMF values. All chromatographic segment matrices obtained from the TIC of the studied sample were resolved in a similar way, and the elution and mass spectral profiles for each volatile constituent in a desired sample extract were extracted. In Table 1, qualitative analysis results for the identification of the volatile constituents of C. aurantium L. peel are listed. The chemical name and formula, values of retention time (RT), retention index (RI), RMF, and the percent of the relative concentration of each constituent are also presented. The direct analysis of two-dimensional data obtained from the GC-MS method revealed that only 45 constituents exist in Iranian C. aurantium L. peel extract; however, after resolving all peak clusters by applying chemometric methods, the number of constituents was increased to 82, accounting for 95.93% of the total relative content of constituents from Iranian C. aurantium L. peel, also with fitting values for MCR-ALS models in terms of an LOF (exp) lower than 12% and R2 higher than 97% for all the peak clusters.
No. | RT (min) | RI | Chemical name | Molecular formula | RMF | Percentage (%) |
---|---|---|---|---|---|---|
a Sum: 95.93%. | ||||||
1 | 14.644 | 1050.6 | Limonene | C10H16 | 948 | 72.895 |
2 | 13.616 | 1006.2 | β-Myrcene | C10H16 | 923 | 9.060 |
3 | 12.351 | 988.8 | α-Pinene | C10H16 | 911 | 4.740 |
4 | 13.443 | 1002.3 | β-Pinene | C10H16 | 944 | 3.440 |
5 | 13.310 | 1028.0 | Sabinene | C10H16 | 960 | 1.230 |
6 | 12.380 | 991.4 | α-Thujene | C10H16 | 953 | 1.058 |
7 | 16.036 | 1076.0 | β-Linalool | C10H18O | 977 | 1.009 |
8 | 13.853 | 959.4 | Octanal | C8H16O | 975 | 0.346 |
9 | 18.075 | 1139.8 | Decanal | C10H20O | 939 | 0.281 |
10 | 14.038 | 1023.1 | α-Phellandrene | C10H16 | 915 | 0.231 |
11 | 16.088 | 1053.3 | Nonanal | C9H18O | 930 | 0.110 |
12 | 17.976 | 1121.8 | α-Terpineol | C10H18O | 923 | 0.103 |
13 | 19.449 | 1422.4 | Dodecane, 2,6,11-trimethyl | C15H32 | 877 | 0.093 |
14 | 21.267 | 1479.8 | β-Caryophyllene | C15H24 | 953 | 0.092 |
15 | 16.342 | 1129.8 | Cyclohexane, 2-ethenyl-1,1-dimethyl-3-methylene- | C11H18 | 850 | 0.075 |
16 | 16.053 | 1171.1 | Dodecane | C12H26 | 869 | 0.072 |
17 | 18.769 | 1389.9 | β-Cubebene | C15H24 | 853 | 0.064 |
18 | 24.144 | 1570.9 | ±-trans-Nerolidol | C15H26O | 929 | 0.053 |
19 | 18.300 | 1118.1 | p-Menth-1-en-4-ol | C10H18O | 886 | 0.049 |
20 | 18.967 | 1149.5 | cis-Carveol | C10H16O | 882 | 0.047 |
21 | 15.580 | 1067.6 | 2-Furanmethanol, 5-ethenyltetrahydro-α,α,5-trimethyl-, trans- | C10H18O2 | 929 | 0.046 |
22 | 15.193 | 1151.8 | Undecane, 2-methyl- | C12H26 | 851 | 0.044 |
23 | 21.465 | 1380.9 | Tetradecane | C14H30 | 970 | 0.042 |
24 | 17.607 | 1047.3 | Benzene, (isocyanomethyl)- | C8H7N | 856 | 0.038 |
25 | 12.421 | 996.1 | Camphene | C10H16 | 958 | 0.036 |
26 | 18.981 | 1236.4 | Acetic acid linalool ester | C12H20O2 | 889 | 0.034 |
27 | 16.619 | 1086.4 | 2,4,6-Octatriene, 2,6-dimethyl-, (E,Z)- | C10H16 | 947 | 0.034 |
28 | 15.378 | 1010.1 | 1-Octanol | C8H18O | 923 | 0.034 |
29 | 24.809 | 1552.5 | Tetradecanal | C14H28O | 933 | 0.034 |
30 | 15.920 | 1073.9 | Terpinolene | C10H16 | 907 | 0.033 |
31 | 15.337 | 1063.3 | γ-Terpinene | C10H16 | 950 | 0.030 |
32 | 8.880 | 706.2 | Hexanal | C6H12O | 902 | 0.029 |
33 | 20.367 | 1599.9 | Hexadecane | C16H34 | 876 | 0.028 |
34 | 22.829 | 1201.7 | Perilla alcohol | C10H16O | 818 | 0.026 |
35 | 18.993 | 1164.1 | (+)-Carvone | C10H14O | 817 | 0.026 |
36 | 21.650 | 1347.3 | Dodecanal | C12H24O | 969 | 0.024 |
37 | 18.853 | 1128.1 | Carveol, dihydro | C10H18O | 818 | 0.024 |
38 | 5.952 | 589.5 | Butanal, 3-methyl- | C5H10O | 908 | 0.023 |
39 | 10.382 | 790.6 | 3-Hexen-1-ol | C6H10O | 817 | 0.021 |
40 | 10.232 | 784.4 | 2-Hexenal, (E)- | C6H10O | 943 | 0.021 |
41 | 5.250 | 567.04 | 3-Buten-2-ol, 2-methyl | C5H10O | 895 | 0.020 |
42 | 25.438 | 1554.1 | Benzophenone | C13H10O | 939 | 0.012 |
43 | 6.328 | 628.9 | 1-Penten-3-ol | C5H10O | 970 | 0.019 |
44 | 21.182 | 1325.1 | Geraniol acetate | C12H20O2 | 955 | 0.018 |
45 | 11.329 | 881.6 | 1,3,5,7-Cyclooctatetraene | C8H8 | 825 | 0.018 |
46 | 19.880 | 1091.3 | (E)-2-Nonenal | C9H16O | 728 | 0.018 |
47 | 18.514 | 1131.9 | p-Menth-1-en-9-al | C10H16O | 763 | 0.018 |
48 | 21.592 | 1338.8 | Acetic acid, decyl ester | C12H24O2 | 936 | 0.017 |
49 | 13.131 | 959.7 | Benzaldehyde | C7H6O | 889 | 0.017 |
50 | 10.624 | 778.8 | 1-Hexanol | C6H14O | 866 | 0.015 |
51 | 20.408 | 1315.0 | Cyclohexene, 4-isopropenyl-1-methoxymethoxymethyl- | C12H20O2 | 766 | 0.014 |
52 | 12.802 | 1016.2 | Ocimene | C10H16 | 768 | 0.014 |
53 | 7.390 | 612.1 | 1-Butanol, 2-methyl- | C5H12O | 872 | 0.014 |
54 | 21.453 | 1559.0 | 7-Hexadecenal, (Z)- | C16H30O | 847 | 0.014 |
55 | 22.319 | 1513.0 | γ-Elemene | C15H24 | 907 | 0.014 |
56 | 15.880 | 1083.1 | cis-Linalool oxide | C10H18O2 | 920 | 0.013 |
57 | 9.96 | 808.4 | 2-Hexen-1-ol, (E)- | C6H12O | 768 | 0.013 |
58 | 19.928 | 1254.4 | Undecanal | C11H22O | 913 | 0.012 |
59 | 7.292 | 713.7 | Oxirane, 2-(1,1-dimethylethyl)-3-methyl- | C7H14O | 897 | 0.012 |
60 | 11.589 | 894.6 | Ethanol, 2-butoxy- | C7H14O | 933 | 0.012 |
61 | 16.573 | 1095.9 | trans-p-Mentha-2,8-dienol | C10H16O | 901 | 0.012 |
62 | 21.661 | 1492.2 | Cyclohexane, 1-ethenyl-1-methyl-2,4-bis(1-methylethenyl)- | C15H24 | 862 | 0.011 |
63 | 17.516 | 1103.4 | Terpineol, cis-β- | C10H18O | 811 | 0.011 |
64 | 16.700 | 1094.3 | trans-3-Caren-2-ol | C10H16O | 861 | 0.011 |
65 | 16.920 | 1103.6 | Limonene oxide, trans- | C10H16O | 916 | 0.011 |
66 | 24.785 | 1654.1 | Pentadecanal- | C15H30O | 850 | 0.011 |
67 | 16.521 | 1041.9 | Octanoic acid, methyl ester | C9H18O2 | 869 | 0.011 |
68 | 17.387 | 1101.0 | β-Citronellal | C10H18O | 818 | 0.011 |
69 | 20.211 | 1466.4 | Pentadecane | C15H32 | 883 | 0.010 |
70 | 19.518 | 1154.4 | p-Mentha-1,8-dien-3-one, (+)- | C10H16O | 794 | 0.010 |
71 | 19.760 | 1164.4 | cis-Geraniol | C10H18O | 822 | 0.010 |
72 | 20.843 | 1352.7 | 2-Dodecenal | C12H22O | 852 | 0.010 |
73 | 19.588 | 1161.2 | Perilla aldehyde | C10H14O | 854 | 0.010 |
74 | 11.271 | 885.8 | Cardene | C8H8 | 898 | 0.010 |
75 | 16.440 | 1048.3 | Methyl caprylate | C9H18O2 | 819 | 0.010 |
76 | 21.615 | 1321.3 | Eugenol methyl ether | C11H14O2 | 850 | 0.010 |
77 | 22.822 | 1622.1 | α-Humulene | C15H24 | 907 | 0.010 |
78 | 19.270 | 1157.7 | 1-Decanol | C10H22O | 838 | 0.010 |
79 | 19.328 | 1137.0 | α-Citral | C10H16O | 818 | 0.010 |
80 | 15.296 | 1154.1 | Undecane, 4-methyl- | C12H26 | 847 | 0.010 |
81 | 15.578 | 1178.4 | Bicyclo[6.1.0]nonane, 9-(1-methylethylidene)- | C12H20 | 818 | 0.010 |
82 | 14.512 | 1047.8 | Isoterpinole | C10H16 | 886 | 0.010 |
However, after performing the MCR-ALS method while overcoming some chromatographic artifacts in order to resolve the TIC curve in Fig. 2b into pure chromatographic and mass spectral profiles of four constituents, and the use of overall volume integration, much more accurate and reliable results were found for these overlapped peaks. As can be seen, the peak areas of the original constituents, after preprocessing and resolving, were reduced by MCR-ALS, due to the elimination of baseline and noise, obtaining pure chromatographic profiles.
By using the overall volume integration (OVI) technique to achieve the total relative content of each constituent from its two-way response, as integration based on the TIC, quantitative results were calculated for each of the 82 identified constituents, as shown in Table 1. It should be mentioned that, as a result of applying this technique, the reported results in this work are not real, and absolute concentrations and only some information in relation to the relative composition of every constituent in the whole TIC can be acquired.22,68,69 Therefore the exact quantitative results for each constituent can only be obtained when the real standards for all of the constituents are available. In this regard, after internal normalization of all the resolved peak areas, percentages are calculated. In Table 1, the relative concentrations of constituents are identified using chemometric methods.
In addition, for the sake of comparison, in Table 2, the five most abundant constituents of Iranian C. aurantium L. peel from different countries are presented.11,15,24,70 In Table 2, it can be observed that that limonene is the main constituent of C. aurantium L. peel in all countries. Moreover, inspection of Table 2 indicates that in comparison to other countries, the compositions of the C. aurantium L. peel in Iran and Greece are very similar, having four common constituents between the most abundant constituents. Considering the reported results in Table 2, Iranian C. aurantium L. peel could be a rich source of limonene and β-myrcene. Also, according to the results in these tables, one can observe that of the volatile constituents extracted from C. aurantium L. peel, monoterpene hydrocarbons have the highest content. In addition, sesquiterpenes, oxygenated compounds, alcohols, and esters, in order of content, exist.
Reference | Constituent no. | ||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | |
Present work | Limonene (72.896%) | β-Myrcene (9.07%) | α-Pinene (4.74%) | β-Pinene (3.44%) | Sabinene (1.23%) |
11 | Limonene (94.67%) | Myrcene (2.00%) | Linalool (0.76%) | β-Pinene (0.62%) | α-Pinene (0.53%) |
15 | Limonene (92.70%) | Myrcene (1.60%) | α-Pinene (1.50%) | Linalool (1.10%) | β-Bisabolene (0.40%) |
24 | Limonene (65.80%) | Myrcene (2.90%) | Linalyl acetate (1.80%) | β-Pinene (1.80%) | α-Terpinene (0.8%) |
70 | Limonene (90.90%) | β-Myrcene (1.90%) | β-Myrcene (1.51%) | α-Terpinene (1.22%) | Linalool (0.93%) |
Parastar et al.29 reported 37 commonly identified metabolites in eighteen citrus peel samples with their relative concentrations, in order to perform clustering and classification analysis, along with MCR-ALS analysis. It was found that the main constituents identified in the C. aurantium L. peel samples studied in this work, are similar to those in other citrus peel samples. The goal in this work was the complete and detailed visualization of the volatile chemical constituents present in C. aurantium L peel, to show the efficiency of the MCR technique in the accurate identification of a large number of minor constituents with contents lower than 0.05%, which are difficult to identify and quantify by using only GC-MS analysis.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c6ra18871k |
This journal is © The Royal Society of Chemistry 2016 |