Nikolai
Kuhnert
*a,
Rakesh
Jaiswal
a,
Pinkie
Eravuchira
b,
Rasha M.
El-Abassy
b,
Bernd von der
Kammer
b and
Arnulf
Materny
b
aChemistry, Jacobs University Bremen, Campus Ring 1, 28759, Bremen, Germany. E-mail: n.kuhnert@jacobs-university.de; Fax: +49 421 200 3229; Tel: +49 421 200 3120
bChemical Physics, Jacobs University Bremen, Campus Ring 1, 28759, Bremen, Germany. E-mail: a.materny@jacobs-university
First published on 15th November 2010
Within this contribution we have analysed aqueous methanolic extracts by LC-ESI-TOF-MS of a total of 38 green bean coffee samples, which vary in terms of coffee variety and processing conditions. The LC-MS data have been analysed by principal component analysis (PCA) using different PCA processing parameters using an unsupervised non-targeted approach as well as a knowledge-based targeted approach. Furthermore, different normalisation and scaling algorithms have been applied to the PCA dataset. The scope and limitation of the various PCA parameters are discussed with respect to the ability to differentiate between samples of different groups, including different coffee varieties (Arabica or Robusta coffee) or different processing parameters and with respect to the information content of the PCA analysis on a molecular level. We could show that while distinction between different groups of samples can be successfully carried out independent of PCA parameters employed, identifying molecular markers rationalising differentiation between sample groups varies significantly between PCA parameters and requires careful choice as well as critical evaluation.
To understand parameters like sensory properties, beneficial health effects, shelf-life or any other desirable or undesirable property of a food a detailed knowledge of its composition and therefore chemistry is required and therefore becomes foremost a problem of analytical chemistry.
Many foods are as well an important commercial commodity with companies striving to maximise their profit through innovative technologies. Since the original food, e.g. a plant, cannot be patented, profits must be achieved by developing food processing techniques that result in a clear benefit to the consumer. For patent applications this benefit must usually be linked to a molecular parameter that is unique to the new process if compared to unprocessed samples. The identification of such unique molecular markers in a large set of samples if compared to another large set of samples, all containing concomitantly possibly thousands of chemical entities forms therefore a major challenge for analytical chemistry.
In the last decade statistical methods aimed at data reduction have become the method of choice to undertake such a Herculean task, with multi-variant statistical methods, in particular principal component analysis (PCA) becoming increasingly popular.3
The main philosophy of PCA is to reduce a large dataset obtained from a large number of samples, using a selected spectroscopic method, in order to extract the most important variations between the samples without loss of information. These variations are termed principal components of the samples, whereby each principal component is by definition orthogonal to the next. Ideally, variations between sample groups can be identified through the variation of a spectroscopic parameter linked to a set of unique marker molecules.
PCA is mainly employed as an unsupervised pattern recognition technique providing visualisation of a multivariate dataset, thereby revealing trends, observations and outliers. This visualisation is achieved by the transformation of variables into a covariance based coordinate system with the principal components as axis, thereby creating a two dimensional representation termed score plot, from which a grouping or pattern of sample groups can be extracted. Next to the score plot a so called loadings plot provides information about the origin of the variance, in the ideal case of LC-MS data information on a retention time (RT)–m/z pair revealing the molecular origin of the variances.
The success story of PCA has started with Nicholson work using high resolution NMR data to identify disease related biomarkers from urine or plasma samples.4 Using PCA NMR data of large patient groups could be successfully compared and unique biomarkers for certain diseases identified.4
PCA using a wide variety of analytical techniques, including NMR, IR, Raman spectroscopy, HPLC, GC, or GC-MS, has ever since been employed as an established statistical method in other areas of research including metabolomics, food analysis and medical research. Methods used vary in their practicability and information content. For example, IR or Raman analysis provides, in rapid measurements, omitting sample preparation and using portable non-expensive instrumentation, within minutes a reliable result that allows distinction between samples. However, distinction between samples is frequently based on peaks corresponding not to an individual molecular marker but rather to a large group or family of molecules present in the sample. Techniques like MS or NMR, however, require extensive sample preparation, costly sophisticated equipment resulting in satisfactory information on the structures of individual markers being present in the sample.
Due to the particular complexity of food, all of the techniques mentioned above have severe limitations with respect to type of materials amenable to investigation, resolution, sensitivity and information provided at the molecular level. Liquid chromatography coupled to mass spectrometry (LC-MS) appears to be the ideal method for PCA analysis of food material. Chromatographic separation, resulting in some degree of resolution, is coupled to high resolution MS providing a high level of sensitivity along with unsurpassed resolution. Due to molecular formula information and fragmentation data available, a multitude of molecular information on individual distinct structures contained in food responsible for variations between samples can be extracted from the analysis. The large amount of data contained in any LC-MS experiment, in particular if using tandem MS or high resolution MS, results in the fact that PCA analysis of LC-MS datasets is rare, with commercial software packages sufficiently powerful to carry out such PCA analysis only becoming available recently with few examples published.5,6
To our knowledge no example on PCA analysis of a food material has been published yet using LC-MS data.
The aim of this contribution is to carry out a variety of PCA analysis using LC-MS data on a selected food material in order to critically evaluate the results and describe the scope and limitation of this procedure at its current level.
As a food material we have chosen green coffee bean samples for the following three reasons: firstly, we have in our research group acquired an intimate knowledge of the secondary metabolite profile and phytochemistry of this material, having over the last years identified around 100 different secondary metabolites in the green coffee bean, the large majority being chlorogenic acids.7–9 This moderate amount of secondary metabolites ensures additionally that the large majority of signals in the LC-MS datasets can be reliably assigned to well characterised compounds. Secondly, coffee is an important commercial commodity, indeed after water and black tea the third most consumed beverage on this planet with an annual production of 4.5 Mt and a market value in excess of 5 Billion US$ of the raw material alone. Thirdly, green coffee beans are produced in two varieties Caffea arabica and Caffea canephora (otherwise known as Robusta coffee) whose distinction and adulteration form an important problem for the coffee industry. It should, however, be noted that distinction of intact green coffee beans by visual inspection is rather straightforward due to significant morphological differences between Robusta and Arabica coffee beans. Only in the case of processed coffee, either solubilised, roasted or ground a distinction based on chemical composition is required to which the methods presented here can be applied.
Supposedly high quality coffee blends consist typically of 100% Arabica coffee beans. Lower quality, cheaper blends may have some proportion of Robusta beans, or they may consist entirely of Robusta. Arabica beans produce allegedly a superior taste in the cup, being more flavourful and complex than their Robusta counterparts. Robusta beans in contrast tend to produce a less watery and bitterer brew, with a musty flavour and more body. Obviously, this difference in sensory properties could be related to the individual phytochemical profile of the two coffee varieties and could be characterised by PCA.
Metabolomics and phytochemical profiling using PCA based methods have been frequently applied to the problem of distinguishing green Arabica from Robusta coffee beans. Briandet and Downey et al. have used IR and NIR spectroscopy to study the differences between the two varieties.10,11 NIR has been further used by Esteban-Diez and Lyman to distinguish Arabica from Robusta green coffee beans.12,13 Wang et al. could show that as well Kona coffee could be distinguished from other varieties using FTIR spectroscopy.14 In all of this work, distinction between varieties was possible due to PCA analysis, however, due to the nature of the spectroscopic technique used, only spectroscopic bands corresponding to groups of compounds rather than individual phytochemical constituents could be identified. Rubayiza and Meurens15 could show using Raman spectroscopy that levels of the terpene kahweol and lipid content allow distinction between Arabica and Robusta green coffee beans. Materny and co-workers have demonstrated that Raman microscopy can be employed directly on a single green coffee bean to allow distinction between these two varieties, based on signals corresponding to lipids and chlorogenic acids.16 Valdenebro et al. could show that the geographic origin of green coffee beans can be identified using sterol profiles analysed by GC-MS.17 Korhonova et al. found using GC-MS based PCA that differences in volatile fractions exist between Arabica and Robusta beans.18 Mendonca and Alonso Salces were able to distinguish Arabica and Robusta green coffee beans based on PCA data using HPLC analysis of chlorogenic acid profiles.19
In order to address all of these questions we have analysed a series of aqueous methanolic extracts of 38 different green coffee bean samples by high resolution LC-ESI-TOF-MS in the negative ion mode. For the extraction process we used an optimised extraction method, if compared to previous work,1 using a mild Soxhlet method followed by protein removal with Carrez reagent and subsequent freeze drying to yield bright yellow to orange powders. A total of 38 commercial green bean coffee samples, 10 Robusta samples and 28 Arabica samples of different geographic origins were extracted. LC-MS conditions used were as described earlier.20 In addition to the high resolution mass measurements we carried out LC-ESI-tandem-MS measurements using an ion trap mass spectrometer to be able to assign individual compounds not only on the basis of retention time and high resolution m/z value, but as well to use fragmentation data for correct structure assignment. Similar to previous work, around 50–100 well resolved chromatographic peaks could be identified in each chromatogram and peaks assigned to individual distinct compounds, in the majority of chlorogenic acids. A list of selected compounds identified is given in Table 1 and structures are given in the ESI†. A typical chromatogram of a Robusta sample is shown in Fig. 1.
No. | Name | Mol. formula | Theor. m/z (M−H) | Exp. m/z (M−H) | Error (ppm) |
---|---|---|---|---|---|
1 | 3-O-Caffeoylquinic acid | C16H18O9 | 353.0878 | 353.0881 | −0.7 |
2 | 4-O-Caffeoylquinic acid | C16H18O9 | 353.0878 | 353.0884 | −1.6 |
3 | 5-O-Caffeoylquinic acid | C16H18O9 | 353.0878 | 353.0892 | −3.9 |
4 | 3-O-Feruloylquinic acid | C17H20O9 | 367.0929 | 367.1047 | −3.4 |
5 | 4-O-Feruloylquinic acid | C17H20O9 | 367.0929 | 367.1038 | −0.8 |
6 | 5-O-Feruloylquinic acid | C17H20O9 | 367.0929 | 367.1045 | −2.9 |
7 | 3-O-p-Coumaroylquinic acid | C16H18O8 | 337.0929 | 337.0931 | −0.5 |
8 | 4-O-p-Coumaroylquinic acid | C16H18O8 | 337.0929 | 337.0921 | 2.4 |
9 | 5-O-p-Coumaroylquinic acid | C16H18O8 | 337.0929 | 337.0921 | 2.4 |
10 | 3-O-Dimethoxycinnamoylquinic acid | C18H22O9 | 381.1191 | 381.1202 | −2.8 |
11 | 4-O-Dimethoxycinnamoylquinic acid | C18H22O9 | 381.1191 | 381.1191 | −2.5 |
12 | 5-O-Dimethoxycinnamoylquinic acid | C18H22O9 | 381.1191 | 381.1202 | −2.8 |
13 | 3-O-Sinapoylquinic acid | C18H22O10 | 397.1140 | 397.1125 | 3.8 |
14 | 4-O-Sinapoylquinic acid | C18H22O10 | 397.1140 | 397.1150 | −2.5 |
15 | 5-O-Sinapoylquinic acid | C18H22O10 | 397.1140 | 397.1140 | −4.9 |
16 | 3,4-Di-O-caffeoylquinic acid | C25H24O12 | 515.1195 | 515.1190 | 1.0 |
17 | 3,5-Di-O-caffeoylquinic acid | C25H24O12 | 515.1195 | 515.1172 | 4.5 |
18 | 4,5-Di-O-caffeoylquinic acid | C25H24O12 | 515.1195 | 515.1170 | 4.9 |
19 | 3,4-Di-O-feruloylquinic acid | C27H28O12 | 543.1508 | 543.1512 | −0.8 |
20 | 3,5-Di-O-feruloylquinic acid | C27H28O12 | 543.1508 | 543.1514 | −1.1 |
21 | 4,5-Di-O-feruloylquinic acid | C27H28O12 | 543.1508 | 543.1539 | −3.4 |
25 | 3-O-Feruloyl-4-O-caffeoylquinic acid | C26H26O12 | 529.1351 | 529.1343 | 1.7 |
26 | 3-O-Caffeoyl-4-O-feruloylquinic acid | C26H26O12 | 529.1351 | 529.1351 | −0.1 |
27 | 3-O-Feruloyl-5-O-caffeoylquinic acid | C26H26O12 | 529.1351 | 529.1373 | −4.0 |
28 | 3-O-Caffeoyl-5-O-feruloylquinic acid | C26H26O12 | 529.1351 | 529.1367 | −3.0 |
29 | 4-O-Feruloyl-5-O-caffeoylquinic acid | C26H26O12 | 529.1351 | 529.1351 | 0.1 |
30 | 4-O-Caffeoyl-5-O-feruloylquinic acid | C26H26O12 | 529.1351 | 529.1349 | 0.5 |
31 | 3-O-Dimethoxycinnamoyl-4-O-caffeoylquinic acid | C27H28O12 | 543.1508 | 543.1488 | 3.6 |
32 | 3-O-Dimethoxycinnamoyl-5-O-caffeoylquinic acid | C27H28O12 | 543.1508 | 543.1491 | 3.1 |
33 | 4-O-Dimethoxycinnamoyl-5-O-caffeoylquinic acid | C27H28O12 | 543.1508 | 543.1526 | −3.4 |
34 | 3-O-Dimethoxycinnamoyl-4-O-feruloylquinic acid | C27H28O12 | 543.1508 | 543.1508 | −4.1 |
35 | 3-O-Dimethoxycinnamoyl-5-O-feruloylquinic acid | C27H28O12 | 543.1508 | 543.1515 | −1.4 |
36 | 4-O-Dimethoxycinnamoyl-5-O-feruloylquinic acid | C27H28O12 | 543.1508 | 543.1525 | −3.1 |
37 | 3-O-p-Coumaroyl-4-O-caffeoylquinic acid | C25H24O11 | 499.1246 | 499.1227 | 3.7 |
38 | 3-O-Caffeoyl-4-O-p-coumaroylquinic acid | C25H24O11 | 499.1246 | 499.1247 | −0.2 |
39 | 3-O-p-Coumaroyl-5-O-caffeoylquinic acid | C25H24O11 | 499.1246 | 499.1248 | −0.5 |
40 | 3-O-Caffeoyl-5-O-p-coumaroylquinic acid | C25H24O11 | 499.1246 | 499.1247 | −0.2 |
41 | 4-O-Caffeoyl-5-O-p-coumaroylquinic acid | C25H24O11 | 499.1246 | 499.1246 | −4.9 |
42 | 4-O-p-Coumaroyl-5-O-caffeoylquinic acid | C25H24O11 | 499.1246 | 499.1249 | −0.6 |
43 | 3-O-p-Coumaroyl-4-O-feruloylquinic acid | C26H26O11 | 513.1402 | 513.1389 | 2.6 |
44 | 3-O-p-Coumaroyl-5-O-feruloylquinic acid | C26H26O11 | 513.1402 | 513.1141 | −2.9 |
45 | 4-O-p-Coumaroyl-5-O-feruloylquinic acid | C26H26O11 | 513.1402 | 513.1406 | −0.7 |
49 | 3-O-Sinapoyl-5-O-caffeoylquinic acid | C27H28O13 | 559.1457 | 559.1481 | −4.2 |
50 | 3-O-Sinapoyl-4-O-caffeoylquinic acid | C27H28O13 | 559.1457 | 559.1472 | −2.6 |
51 | 3-O-(3,5-Dihydroxy-4-methoxy)cinnamoyl-4-O-feruloylquinic acid | C27H28O13 | 559.1457 | 559.1458 | −0.2 |
52 | 4-O-Sinapoyl-3-O-caffeoylquinic acid | C27H28O13 | 559.1457 | 559.1457 | 0.9 |
53 | 3-O-Sinapoyl-5-O-feruloylquinic acid | C28H30O13 | 573.1614 | 573.1641 | −4.7 |
54 | 4-O-Sinapoyl-5-O-feruloylquinic acid | C28H30O13 | 573.1614 | 573.1599 | −2.5 |
55 | 4-O-Sinapoyl-3-O-feruloylquinic acid | C28H30O13 | 573.1614 | 573.1634 | −3.5 |
56 | 4-O-Trimethoxycinnamoyl-5-O-caffeoylquinic acid | C28H30O13 | 573.1614 | 573.1611 | 0.4 |
57 | 3-O-Trimethoxycinnamoyl-5-O-caffeoylquinic acid | C28H30O13 | 573.1614 | 573.1623 | −1.7 |
58 | 3-O-Trimethoxycinnamoyl-5-O-feruloylquinic acid | C29H32O13 | 587.1770 | 587.1748 | 3.8 |
59 | 3-O-Trimethoxycinnamoyl-4-O-feruloylquinic acid | C29H32O13 | 587.1770 | 587.1766 | 0.7 |
60 | 4-O-Trimethoxycinnamoyl-5-O-feruloylquinic acid | C29H32O13 | 587.1770 | 587.1764 | 1.0 |
61 | 3-O-Dimethoxycinnamoyl-4-O-feruloyl-5-O-caffeoylquinic acid | C37H36O15 | 719.1981 | 719.2001 | −2.7 |
62 | 3,4,5-Tri-O-caffeoylquinic acid | C34H29O15 | 677.1512 | 677.1522 | −3.5 |
63 | 3,5-Di-O-caffeoyl-4-O-feruloylquinic acid | C35H31O15 | 691.1668 | 691.1647 | 3.1 |
64 | 3-O-Feruloyl-4,5-di-O-caffeoylquinic acid | C35H31O15 | 691.1668 | 691.1711 | −6.2* |
65 | 3,4-Di-O-caffeoyl-5-O-feruloylquinic acid | C35H31O15 | 691.1668 | 691.1647 | 3.1 |
66 | 3-O-Caffeoyl-4,5-di-O-feruloylquinic acid | C36H33O15 | 705.1825 | 705.1851 | −3.8 |
67 | 3,4-Di-O-feruloyl-5-O-caffeoylquinic acid | C36H33O15 | 705.1825 | 705.1833 | −1.1 |
68 | 3,4-Di-O-caffeoyl-5-O-sinapoylquinic acid | C36H33O16 | 721.1774 | 721.1795 | −2.9 |
69 | 3-O-Sinapoyl-4,5-di-O-caffeoylquinic acid | C36H33O16 | 721.1774 | 721.1766 | 1.1 |
Fig. 1 Representative chromatogram of green coffee extract of sample no. 33 (Tanzania Robusta): (a) TIC in negative ion mode and (b) UV-VIS chromatogram monitored at 320 nm. |
A final PCA parameter worth discussing comprises normalisation and scaling routines. van den Berg has discussed this important parameter in detail and in the course of the discussion we will return to this important point on several occasions.21 In general, normalisation can be carried out in various ways with a “sum of all bucket intensity normalisation” being commonly employed. All MS-intensities over all buckets are summed up in all samples and normalisation will occur using these values. Details for kernelising and scaling employed are given in the Method section.
Typically, a non-targeted (unsupervised) PCA analysis is carried out, in which the full dataset is processed. Once the principal components are calculated, an inspection of the various PCA score plots allows identification of groups of samples. By inspection of the characteristics of each individual data point in the groups in the score plot a conclusion can be drawn with respect to the nature of these groups. Data points can thus be labelled according to the groups identified.
Subsequently, a non-targeted PCA analysis of all samples was carried out using rectangular bucketing and sum over bucket normalisation. An inspection of the PC1versusPC2 score plot shows that two groups of samples can be readily distinguished (see Fig. 2). The two groups of samples are Arabica and Robusta samples.
Fig. 2 Score (left) and loading (right) plot of PCA analysis using regular bucketing, Robusta samples as triangles and Arabica samples as circles. Numbers in loading plot are assigned in Table 3. |
In a second non-targeted PCA analysis the same dataset was analysed using advanced bucketing under identical normalisation methods. The score plot again allows differentiation between Arabica and Robusta beans (see Fig. 3). The second approach bears two advantages. Firstly, differentiation in the score plot is slightly better and secondly, the PCA dataset produced is considerably smaller requiring less computing power. Some removal of redundant information, here in particular dimeric adducts of ions could be removed from the dataset. As a further difference between the two analyses it becomes apparent that for regular bucketing distinction between Arabica and Robusta samples is achieved in PC1, whereas for the molecular feature routine distinction is achieved in PC2. An influence plot of the molecular feature PCA analysis with no data points present in the upper right quadrant of the plot is shown in the ESI†, indicating a close distance of all data points to the model, which is required in a high quality analysis.
Fig. 3 Score and loading plot of PCA analysis using molecular feature advanced bucketing, Robusta samples as triangles and Arabica samples as circles. Numbers in loading plot are assigned in Table 4. |
At closer inspection it becomes obvious that actually three groups of samples should be recognised from the score plot. The Arabica samples cluster in two distinct groups termed A1 and A2 in both analyses. The A2 group (see Table 2) contains a larger proportion of Arabica coffees grown in Central America if compared to the A1 group. Otherwise, no obvious criteria explaining the nature of the differences between samples can be given at the current state.
Sample no. | Origin/type | Arabica/Robusta | Group |
---|---|---|---|
1 | Tanzania | Arabica | A1 |
2 | Guatemala SHG | Arabica | A1 |
3 | Peru Bio | Arabica | A1 |
4 | Nicaragua Maragogype | Arabica | A1 |
5 | Kenya AA | Arabica | A1 |
6 | Athiopien Wild Forest Bio | Arabica | A1 |
7 | Athiopien Yivgachette | Arabica | A1 |
8 | Athiopien Mokka Sidamo 2 | Arabica | A1 |
9 | Reizaow | Arabica | A1 |
10 | Coffeein free | Arabica | A1 |
11 | Costarica 2 | Arabica | A1 |
12 | Brasilien 1 | Arabica | A1 |
13 | Brasilien 2 | Arabica | A1 |
14 | Maragogype | Arabica | A2 |
15 | Malawi Pamwamba | Arabica | A2 |
16 | Panama Boquete | Arabica | A2 |
17 | Kenia 1 | Arabica | A2 |
18 | Honduras Bio | Arabica | A2 |
19 | Kameruls | Arabica | A2 |
20 | Nicaragua Mataglpa | Arabica | A2 |
21 | Costarica 1 | Arabica | A2 |
22 | Columbia Exulso | Arabica | A2 |
23 | Papua Neuguinea | Arabica | A3 |
24 | Athiopien Mokka Sidamo 1 | Arabica | A3 |
25 | Costarica 3 | Arabica | A3 |
26 | Ethiopien | Arabica | A3 |
27 | Indian Perl Mountain | Arabica | A3 |
28 | Brazilien Santos | Arabica | A3 |
29 | Indian 1 | Robusta | R |
30 | India Cherry AB | Robusta | R |
31 | Uganda | Robusta | R |
32 | India Parchment | Robusta | R |
33 | Tanzania | Robusta | R |
34 | Indonesia 1 | Robusta | R |
35 | Togo 1 | Robusta | R |
36 | Cameron | Robusta | R |
37 | Indonesia 2 | Robusta | R |
38 | India Cherry A | Robusta | R |
A differentiation between samples from different geographic origins or growth conditions was not possible according to any score plots in any set of principal components. Examples of such plots are given in the ESI†.
PCA of LC-MS data therefore allows distinction between different coffee varieties. The next question that requires addressing is, what information was provided by the loading plot. Within the loading plot each data point corresponds to a RT–m/z pair, which is responsible for the observed variations and whose distance from the centre of the plot defines its influence on the sample grouping. From the RTs, m/z values and their corresponding tandem MS data in a separate chromatographic run, structures can be unambiguously assigned to individual data points in the loading plot.
A careful inspection of both loading plots (Fig. 2 and 3) reveals that variances between the Robusta and Arabica coffee samples are exclusively a result of differences in concentrations of regioisomeric monocaffeoyl and monoferuloylquinic acids (chlorogenic acids). It should be noted that any set of LC-MS data only provides information on relative amounts of compounds present and not on absolute concentrations, for which calibration with authentic reference materials is required. Compound assignment is given in Table 3.22,23 From inspection of the bucket statistics in individual chromatograms it follows that concentrations for all of these compounds are higher in Robusta beans if compared to Arabica beans. This observation was already reported earlier by Materny and co-workers.16
Interestingly, the loading plot shows some data points with ions at m/z 353, whose tandem MS spectra identify them as previously not assigned caffeoylquinic acids, originating presumably from minor diastereoisomers of quinic acid. Additionally, the data provided here allow a rationalisation of PCA results obtained earlier by comparing different coffee varieties using low resolution spectroscopic techniques such as Raman, NIR or IR spectroscopy, clearly revealing the nature and relative concentrations of individual molecules present in the coffee samples.
Two samples of steam treated Arabica coffee were compared by PCA analysis of LC-MS data with 20 non-processed Arabica samples. Fig. 4 shows the score and loading plot clearly indicating that processed samples can be readily distinguished from non-processed samples in the PC2 dimension. The loading plot again indicates a substantial change in the chlorogenic acid profile, in particular variances in monocaffeoylquinic acids (see Table 4).27,28
Fig. 4 Score and loading plot of PCA analysis using molecular feature advanced bucketing of processed steamed Arabica samples as circles and unprocessed Arabica samples as triangles. |
Interestingly, the loading plot shows two data points corresponding to ions at m/z 335 (Table 5), which have been assigned on the basis of their retention time and fragmentation pattern as caffeoylquinic acid lactones, reported earlier by Farah et al. in roasted coffee.29
No. | RT/s | m/z [M−H] | Compound |
---|---|---|---|
1 | 1095 | 353 | 3-CQA 1 |
2 | 1665 | 367 | 5-FQA 6 |
3 | 1095 | 707 | 3CQA dimer 1 |
4 | 1425 | 353 | 4-CQA 2 |
5 | 2415 | 515 | 4,5-DiCQA 5 |
6 | 1620 | 335 | 3-CAL 70 |
7 | 1695 | 353 | Unknown CQA |
8 | 257 | 683 | Unassigned |
9 | 255 | 341 | Unassigned |
10 | 1980 | 335 | 4-CAL 71 |
11 | 1605 | 367 | 3-FQA 4 |
12 | 1065 | 367 | Unknown FQA |
The PCA results of this PCA analysis are shown in Fig. 5. Again, from the PC1versusPC2 score plot it can be seen that Robusta and Arabica samples can be readily distinguished. The Arabica samples this time show a grouping into three groups A1, A2 and A3 with the A2 group containing the same samples if compared to the previous analysis. The loading plot reveals differences in concentration of five dicaffeoylquinic acids and three caffeoyl feruloylquinic acids along with some redundant data points corresponding to dimeric ions of monocaffeoylquinic acid (Table 6). Also here, the Robusta samples all contain increased concentrations of diacyl quinic acids if compared to the Arabica samples.
Fig. 5 Score and loading plot of PCA analysis using molecular feature advanced bucketing of Arabica samples as circles and Robusta samples as triangles using a reduced m/z range window for diacyl chlorogenic acids only. |
Fig. 6 Score and loading plot of PCA analysis using molecular feature advanced bucketing of Arabica samples in blue and Robusta samples in red using a reduced m/z range window for minor metabolites (m/z 370–500 and 600–700) only. |
No. | RT/s | m/z [M−H] | Compound |
---|---|---|---|
1 | 1655 | 377 | |
2 | 135 | 387 | |
3 | 165 | 405 | |
4 | 2025 | 481 | |
5 | 735 | 375 | Caffeoyl conjugate |
6 | 135 | 405 | |
7 | 135 | 379 | |
8 | 795 | 375 | Caffeoyl conjugate |
9 | 2985 | 379 | |
10 | 1425 | 375 | Caffeoyl conjugate |
11 | 1725 | 375 | Caffeoyl conjugate |
12 | 375 | 405 | |
13 | 255 | 387 | |
14 | 255 | 455 | |
15 | 255 | 377 |
An influence plot (Fig. 7) of the molecular feature PCA analysis with no data points present in the upper right quadrant of the plot, indicating a close distance of all data points to the model, which is required in a high quality analysis.
Fig. 7 Influence plot of PCA analysis using molecular feature advanced bucketing of Arabica samples as circles and Robusta samples as triangles using a reduced m/z range window for minor metabolites (m/z 370–500 and 600–700) only. |
The results here, using a larger sample group and a more advanced statistical tool, confirm our initial hypothesis that sinapoylquinic acids, trimethoxycinnamoyl quinic acids and triacyl quinic acids are indeed unique phytochemical markers for Robusta coffee, absent in all Arabica samples investigated. The use of these markers might be helpful in differentiating between roasted Arabica and with Robusta adulterated coffee samples.
In both cases, Arabica and Robusta coffee varieties could be distinguished based solely on secondary metabolites observed in this reduced m/z window. Indeed, as postulated, unique markers for Robusta coffee could be identified using this approach comprising sinapoylquinic acids.
This result clearly demonstrates that with a priori knowledge about sample composition unique markers allowing distinction between samples can be identified.
For this reason, we have investigated the outcome of a PCA analysis of LC-MS data using various scaling and normalisation algorithms. It should be noted that in order to achieve similar results alternative mathematical data treatment routines such as data transformation could achieve similar results.18,28 For example the “log ratio” transformation has been previously employed to transform heteroscedastic datasets.18
As scaling and normalisation procedures we have chosen Pareto scaling, variance scaling, and unit variance scaling. Pareto scaling reduces the relative importance of large values, while keeping the data structure largely intact.28 In variance and unit variance scaling (often referred to as autoscaling) the standard deviation is used as the scaling factor resulting in an outcome, in which all metabolites are equally important, however, inflating the measurement errors.28 For the latter poor differentiation for Arabicaversus Robusta coffee was observed, however, in the loading plot indeed minor components could be as expected located as important parameters responsible for variances.
Using Pareto scaling, Arabica and Robusta coffees could be readily distinguished in the PC1versusPC2 score plot (Fig. 8). The loading plot revealed next to major components, such as monocaffeoylquinic acids, a series of minor components including triacyl quinic acids, unique secondary metabolites of Robusta coffee (Table 8).
Fig. 8 Score and loading plot of PCA analysis using molecular feature advanced bucketing, Robusta samples as triangles and Arabica samples as circles using Pareto scaling. |
No. | RT/s | m/z [M−H] | Comp. no | Abbreviation |
---|---|---|---|---|
1 | 1410 | 353 | 2 | 4-CQA |
2 | 1605 | 367 | 4 | 3-FQA |
3 | 1790 | 397 | 15 | 5-SQA |
4 | 1065 | 353 | 1 | 3-CQA |
5 | 1635 | 367 | 6 | 5-FQA |
6 | 2090 | 397 | 14 | 4-SQA |
7 | 2265 | 367 | 5 | 4-FQA |
8 | 3395 | 573 | 57 | 5C-3TQA |
9 | 2470 | 677 | 62 | 3,4,5-TriCQA |
10 | 2295 | 515 | 16 | 3,4-DiCQA |
11 | 2250 | 515 | 17 | 3,5-DiCQA |
12 | 2570 | 515 | 18 | 4,5-DiCQA |
13 | 2725 | 529 | 26 | 4C, 5FQA |
14 | 3125 | 691 | 65 | 3,5-DiC, 4FQA |
15 | 3360 | 705 | 67 | 3,4-DiF, 5CQA |
This paper provides a demonstration of the capabilities of PCA analysis using high resolution LC-MS data, pointing out some potential pitfalls. It represents a first systematic study of PCA methodology using LC-MS data in food chemistry facilitating future use of this powerful data reduction methodology in all areas of food chemistry.3,4
Footnote |
† Electronic supplementary information (ESI) available: structures of compounds in Table 1, additional score and loading plots. See DOI: 10.1039/c0ay00512f |
This journal is © The Royal Society of Chemistry 2011 |