Comparison of antioxidant capacity assays with chemometric methods

Seven antioxidant capacity assays were compared and evaluated (ranked and grouped) using several statistical methods. The aim of the research was to compare the results of different antioxidant capacity assays and choose preferably one (or two) method(s), which could reproduce on its own the consensus results of all of the others. The two datasets (berries and sour cherries) gave quite similar results. Cluster analysis and principal component analysis could point out the methods that are most similar and best connected to each other. Not only are the groupings of the methods novel in this study but also the application of sum of ranking differences (SRD) and the generalized pair correlation method (GPCM) to compare and rank the various antioxidant capacity assays independently. In the case of berry samples, ferric ion reducing antioxidant power assay (FRAP) was the most successful as demonstrated by the results of SRD and GPCM. Moreover, GPCM (with conditional exact Fisher's test and probability weighted ordering) could distinguish between 2,2-diphenyl-1-picrylhydrazyl free radical scavenging (DPPH) and lipid soluble antioxidant capacity (ACL) methods, which was not revealed by the SRD procedure. In the case of sour cherry samples the total polyphenolic content (TPC) was the most appropriate method and FRAP was the second to replace all the other assays. GPCM could differentiate between FRAP and trolox equivalent antioxidant capacity (TEAC) methods. The suggested techniques were FRAP and TPC for both datasets to replace all the others, whereas the ACL and water soluble antioxidant capacity (ACW) techniques give extremely distant results.


Antioxidant activity
Examination of antioxidant capacities has become a hot topic nowadays as health-conscious lifestyles are becoming more popular.Consequently, although the number of determination techniques increases rapidly, as of now, they are not able to determine the antioxidant capacity in vivo precisely. 1ree radicals in cells are formed in biochemical processes under natural conditions, especially in oxidation processes.Some amounts of free radicals are required for the normal maintenance in vivo.A greater amount may damage lipids, proteins, nucleic acids and carbohydrates. 2,3Free radical formation is induced by internal and external factors (smoke, stress, alcohol, etc.).Antioxidants are able to decrease the free radical concentration or inhibit their formation.They are part of an integrated defense system for cellular components. 4,5ome well-known antioxidant components are for example vitamin E, vitamin C, polyphenols, carotenoids, zinc and copper.

Determination of antioxidant activity
7][8][9] Consequently, the number of elaborated methods is over one hundred. 1The reason for this large variety is that they are unable to measure and model all of the natural in vivo reactions precisely. 10ifferent reactions perturb the measured values differently according to the reaction mechanism and the system used.Every method is selective for some antioxidant components and reactions and none of them are able to measure the capacity of all antioxidants properly. 11he techniques can be grouped in many ways.Depending on what kind of reaction is involved, the methods can be classied into two groups: based on hydrogen atom transfer (HAT) and electron transfer (ET).HAT methods (for example ACL, ACW, and ORAC) are related to free radicals and based on reaction kinetics.Most of the HAT methods are based on a competitive reaction scheme, where the antioxidant and substrate compounds compete for thermally generated peroxyl radicals.The scheme of the reaction is as follows: 12 ROOc + AH / ROOH + Ac (1) ET methods (for example FRAP, TPC, TEAC, etc.) are based on the measurement of the capacity of an antioxidant in the reduction of an oxidant, which has a different color in its reduced form.From the change in color and from the extent of this change the antioxidant capacity can be determined. 13he reaction scheme of ET methods is as follows: 12 All antioxidant capacity assays measure the same thing, the radical scavenging ability.However they produce different results, as the assays use different model compounds with various side reactions and other disturbing factors.Numerous antioxidant capacity assays are commonly applied.However, a justied answer is missing about which one(s) should be preferred.Some authors use only one assay without the knowledge of its systematic and random errors.Therefore, a justied substitution of all methods with a less biased (less erroneous) one is a desirable aim.The measurement of the antioxidant activity in fruits is a very popular topic nowadays, especially in berries, because generally their antioxidant effect is higher than for other fruits. 14,15In the present study, we measured the antioxidant capacity of berry and sour cherry cultivars.The aim of the study was to compare the antioxidant capacity assays using statistical methods (HCA, PCA, SRD and GPCM) and select the most representative method or methods for the available datasets based on time and cost efficiency.Grouping and detection of similarities and differences among the methods will also be revealed.The statistical methods presented below are suitable for comparison of other antioxidant capacity assays and datasets.

Results and discussion
Antioxidant capacity values for thirteen berry genotypes and twelve sour cherry cultivars were measured by seven antioxidant capacity assays in total (FRAP, TPC, TRSC, DPPH, ACL, and ACW for the berry samples and FRAP, TPC, TEAC, ACL, and ACW for the sour cherry samples).Every sample had two duplicates and each duplicate was measured three times.The average value of the measurements for each sample was used for the further statistical analysis.
First the comparisons and connections between the methods are shown in the results with PCA and HCA.Finally, the rankings by SRD and GPCM methods are presented.

HCA results
Fig. 1 presents the different clusters and connections between the antioxidant capacity methods in the case of berry samples.The Euclidian distance was used as distance measure and Ward's method as the linkage rule.
Two groups are separated clearly, one contains the ACL and ACW techniques, and the other one contains the TRSC, DPPH and FRAP methods.It means that the results of these techniques are more similar (related more closely) to each other in the two separate groups.
Fig. 2 presents the grouping pattern for the sour cherry dataset.It compares and clusters ve antioxidants capacity methods.The distance measure and the linkage rule were the same as for the previous case.
Two clusters are also observed in this case.The ACW and ACL methods clearly form a distinct group and the other three are more closely connected to each other.
Cluster analysis can reveal similarities, but is not able to rank the different techniques, only to infer the connections.The coupling between the methods and the grouping pattern will be proved by PCA.

PCA results
When PCA was applied to the berry data matrix the rst two principal components were sufficient to explain the majority of the overall variance in the dataset (over 90%).When plotting the rst and second PC loading vectors, the following observations can be summarized (Fig. 3).
Most of the methods in the plot are scattered, but DPPH and FRAP are in close proximity.ACL, ACW and TPC are not grouped in the projection of PC loading 1 vs. PC loading 2. In the second case, when sour cherry samples were analyzed, the rst three principal components were used for evaluation.Three principal components explained 98% of the variance in the data.The second and third components are "individual" components which means, that only one original variable carries most of the variance in that principal component.The projection of points for antioxidant capacity methods onto the plane dened by the rst and third PC loading vectors shows the most characteristic clustering (Fig. 4).
The pattern can verify the results of cluster analysis (where ACW and ACL formed an individual cluster), because here ACW and ACL are detected as outliers (here PC loadings of less than 0.8 were considered as outliers).The points for the other three methods are close to each other as in the previous case.
Although the rst dataset was not sufficient on its own for making conclusions, the PCA and HCA results together have shown that the ACW and ACL methods are very different from the others.The TPC, FRAP and TEAC methods have close resemblance, similar to the FRAP and DPPH methods in the case of berry samples.This is not so surprising if we consider the similarities between TPC and other ET-based assays (FRAP, TEAC) and the differences between ET-based and HAT-based assays (ACL, ACW).In the case of berry samples, the FRAP and DPPH methods are highly correlated with each other, the correlation coefficient is 0.941 (signicant at the a ¼ 0.05 level).Although in that case the correlation coefficients were signicant for both ET-based and HAT-based methods, these results have shown that there are differences between the two types of antioxidant capacity assays.

SRD results
Before the evaluation, the data matrix had to be preprocessed.The number of rows and columns had to be added in the header and the "golden reference" had to be pasted as the last column of the table.In this case, the average was chosen as reference for all of the datasets.It can also be called consensus in accordance with the maximum likelihood principle, which yields a choice of the estimator as the value for the parameter that makes the observed data most probable (the average). 16RD is implemented in an Excel VBA program.SRD values are given in two scales.The rst is the original one and the second is the scaled one denoted by SRD nor .In the diagram (Fig. 5) the scaled results are used, which makes the methods comparable, because the number of samples are different in the two datasets.The scaled SRD values are between 0 and 100.The equation of the scaling is as follows where SRD max ¼ the maximum of the SRD values for the actual variable (method).
We have assumed that all methods measure the same antioxidant capacity with random and systematic errors (biases).Then the row-average (or row-sum) is the best way of data fusion.We can plausibly expect that the random errors and biases at least partially cancel each other out.The lower the SRD value is, the closer it is to the reference (to the average).Thus, FRAP can substitute all methods for antioxidant capacity with the smallest error.We have also plotted the random probability distribution curve (a Gauss like one), which helps us decide whether the applied method is better than or similar to the use of random numbers.All of the methods produce better results than random numbers, except for ACW.
Validation of the ranking has been carried out using a randomization test and a seven-fold cross-validation.For the latter, the dataset was split into subsets and then each subset's SRD values were calculated.SRDs calculated on the seven 6/7 th portion and the original SRD values dene the uncertainty of the SRD values for each method.Otherwise, we would not know whether the colored lines on the diagram are indistinguishable or not (whether the distances between lines are negligible or statistically signicant).The Box & Whisker plot shows the SRD ranges (minimum and maximum), the second and third quartiles (box in which the 50% of the data are located) and the median small box.
Fig. 6 shows that the median of DPPH and ACL is very close to each other.The null hypothesis is that the mean values of the methods are equal (assuming normality).The two sample t-test compares the mean SRD values pair-wise for all methods.Similarly, the nonparametric Sign and the Wilcoxon matched pair tests will reveal the differences in median values without distributional assumptions.The nonparametric tests were suitable, because not all SRD values followed normal distribution.The t-test and the nonparametric tests clearly indicated SRDs for DPPH and ACL are derived from the same distribution.The median SRDs for all other methods are signicantly different.
For the sour cherry dataset, SRD analysis was applied in the same manner as for the previous case.Scaled SRD values for the ve methods are shown on Fig. 7. Fig. 7 suggests that TPC has the smallest error out of the ve applied methods and hence it can be used to replace all of the other methods.ACL and ACW methods are outside the acceptable region of the graph.FRAP and TEAC had the same SRD value, and their medians are indistinguishable according to the Sign and the Wilcoxon matched pair tests.The pattern found for the berry dataset is very similar to the one for the sour cherry dataset as evidenced by the Box & Whisker plot and the aforementioned tests.
The Box & Whisker plot and the above tests produced univocal results.Fig. 8 shows that the medians of FRAP and TEAC's SRD values are the same, and the parametric and nonparametric tests reject the null hypothesis, i.e. there is no signicant difference between the SRD values of these two methods.
In addition, SRD could rank the different antioxidant activity methods for berry and sour cherry samples.For both datasets FRAP and TPC have the lowest SRD values, while ACW and ACL the highest.

GPCM results
GPCM may also be used for ranking antioxidant activity techniques.We wanted to verify the SRD results with another ranking method, which is based on entirely different  calculations (and a different way of thinking).GPCM can be run using an Excel VBA program, so the dataset is quite similar to the SRD dataset, but the reference has to be in the rst column for the evaluation.Again, the row average was selected as the "golden standard" (as in the case of SRD methodology).
The analysis was completed for all possible variable pairs.Probability weighted ordering and conditional Fisher's exact test were chosen for GPCM analysis.In the case of probability weighted ordering, wins and losses are counted, but they are weighted with the calculated condence level (¼ 1 À calculated signicance level) based on a suitably chosen test statistic (in this work conditional Fisher's exact test).
Table 1 contains a selected GPCM result for the berry dataset.The variables are ordered according to the probability weighted differences between the number of wins and number of losses.FRAP and TRSC are associated the most with the Y variable, which contained the mean of the antioxidant capacity values.The probability corrected numbers of wins were very close to each other in the case of these two methods.The TPC method was the third one, just a little behind them.The ACW proved to be the worst method for this case.
Table 2 contains the same selection of GPCM results for the sour cherry dataset as in the case of berry data.This time the TPC and FRAP assays were practically indistinguishable and superior to the other methods.The difference between the probability weighted numbers of wins minus numbers of losses for the two methods was very small.The third in line was TEAC and ACW was the last for this case, too.
The results are highly similar to the SRD results, but GPCM could even distinguish the methods DPPH and ACL in the rst case, and FRAP and TEAC in the second case.Table 1 Comparison of six antioxidant capacity determination techniques for berry dataset by GPCM.pWinner means the number of probability weighted wins; pLoser is the number of probability weighted losses.The ranking was performed by the differences between the pWinner and pLoser.The predefined error limit is a (user); a (emp.)means the theoretical limit probability.The critical sum is the sum of the probability weighted wins without losses (and its confidence value, 1 À a times the critical sum)

Antioxidant capacity methods
The antioxidant capacity was determined by different assays.FRAP, which is based on the ferric reducing power of the antioxidants in the sample, was developed by Benzie and Strain. 17he reaction is based on the reduction of the Fe 3+ -TPTZ complex to the ferrous form (Fe 2+ -TPTZ) by the antioxidants at pH 3.6. [Fe(III)(TPTZ The reduced complex has a blue color, thus the reaction can be followed with a spectrophotometer at 593 nm. 12 The total polyphenolic content was measured using the Folin-Ciocalteu's reagent according to the method of Singleton and Rossi. 18,19The reagent is a mixture of tungsten and molybdenum oxides and its color is yellow, while the product of the metal oxide reduction has a blue color.In the electron transfer reaction molybdenum(VI) is reduced to molybdenum(V).Contrary to its name the measurement is not selective for the polyphenolic components. 13The reaction can be followed at 760 nm.
The TEAC method was developed by Miller et al. 20 The key reaction is that the antioxidants reduce the quantity of the 2,2 0azinodi-(3-ethylbenzothiazoline)-6-sulfonic acid free radical (ABTSc + ).The ABTSc + radical cation has a dark green color, but if antioxidants are in the reaction medium, the radical cation is transformed to ABTS 2À and loses its color.The reaction can be followed with a spectrophotometer at 734 nm. 21he radical-scavenging activity was measured by DPPH 22 and TRSC 23 methods.The DPPH is one of the earliest methods, which uses the 2,2-diphenyl-1-picrylhydrazyl commercially available stable radical.Upon reduction the color of the solution fades.It can be measured with a spectrophotometer at 515 nm.The TRSC method is based on the inhibition of the H 2 O 2 / OH microperoxidase-luminol system.This system is emitting light in alkaline solution and OHc is generated from H 2 O 2 in a Fenton type reaction by the iron complex (microperoxidase).The total scavenger capacity was measured by a chemiluminescence assay at 420 nm. 24he ACW and ACL antioxidant capacity was measured using methods described by Popov and Lewin. 25,26This assay involves the photochemical generation of superoxide anion free radicals (O 2 c À ) combined with chemiluminescence detection.The superoxide anion free radicals react with the antioxidant compounds present in the sample, while the quantity of the free radicals decreases.Luminol is activated by the residue of the superoxide radical anions and exhibits luminescence.Based on these reactions the method is suitable for the measurement of radical scavenging properties.The results are expressed in Trolox equivalent in the case of lipid soluble and in ascorbic acid equivalent in the case of water soluble antioxidant capacity.
Nicolet Evolution 300 BB (Thermo Electron Corporation, Cambridge, UK) was used for all spectrophotometric measurements.The TRSC is measured with a Lumat 9501 luminometer (Berthold, Bad Wildbad, Germany) and the ACW and ACL assays with a Photochem instrument (Analytik Jena AG, Jena, Germany).In the latter case the hydrophilic antioxidants were measured with the ACW kit, which contains 1.5 mL reagent 1 (carbonate buffer solution pH 10.5), 1 mL reagent 2 and 25 mL reagent 3 (luminol as the photosensitizer).The lipophilic antioxidants were measured with the ACL kit.The main components of the kit are as follows: Merck methanol (reagent 1), carbonate buffer solution (reagent 2) and working solution of reagent 3 (luminol as the photosensitizer). 25,26atistical methods PCA.Principal component analysis is an unsupervised pattern recognition method, 27 which has been commonly used for these problems in all elds of chemistry in the last twenty years. 28,29The basic idea is that "latent variables" are created by the linear combination of the original variables.It means that the original data matrix (X) can be decomposed into the product of two matrices, which must be orthonormal.The two matrices are called PC loadings (P) and scores (T).The principal components are ordered in such a way that the variance explained by the rst principal component is the greatest; the variance explained by the second one is smaller, and so on, whereas that of the last is the smallest.
A basic assumption in the use of PCA is that the score and loading vectors corresponding to the largest eigenvalues contain the most useful information related to a specic problem and that the remaining ones comprise mainly the noise.The points (samples) are projected onto a subspace of smaller dimensions, where dominant groups (clusters) and outliers can be observed.The similarity of variables can also be evaluated (i.e. from the directions, PC loadings) only their scaling is differentbetween À1 and +1in the case of a standardized input matrix.In the present study the variables (assays), were grouped and evaluated, though the pattern (similarity) of samples was also calculated (data not shown).HCA. Hierarchical cluster analysis is also an unsupervised pattern recognition technique. 30,31It is a very simple and illustrative statistical method, which is used in many elds of science. 32More oen it is used with PCA to conrm each other's result. 33The basic idea of the method is that the connections between antioxidant capacity methods are based on distance measures (between samples or methods) and linkage (amalgamation) rules.Linkage rules determine the way to dene distances between groups (clusters), whereas the distance measure denes the distance between samples or methods.The latter can be Euclidian distance, Mahalanobis distance, Manhattan distance, Minkowski distances, etc. Linkage rules can be simple-, complete linkage, Ward's method, etc.The disadvantage of cluster analysis is that the use of different distance denitions and linkage rules can provide different results.The best practice is that we try every combination and accept the pattern only if there is no signicant difference.The most-used distance measure and linkage rule is Euclidian distance with Ward's method.We also used the above combination in this work.The data were (column-wise) standardized before HCA.
SRD methodology.Sum of ranking differences is a novel and simple method 34,35 to compare models, methods, analytical techniques, etc. and it is entirely general. 32,36,37In the input matrix the samples are arranged in the rows and the methods (variables) are arranged in the columns.In the rst step the antioxidant capacity values in every column are ranked by increasing magnitude.Then, the difference between the rank of the actual method and the rank of the known reference (golden standard) is computed.If the golden standard is not known, the average (minimum or maximum) can be used instead.In the last step, the absolute values of the differences are summed together for all methods to be compared.The closer the SRD value is to zero, the closer is the method to the reference (i.e.contains less errors and in this sense it is a "better" method).The SRD methodology utilizes the advantage, the basic principle in analytical chemistry that random and systematic errors (biases) of the different antioxidant capacity techniques also cancel each other, at least partially.We are better off using the average than any of the individual methods not knowing the "truth".We cannot reasonably assume that all measurements are shied in one direction (biased in the same way).SRD is validated by a randomization test and a bootstrap like crossvalidation.Leave-one-out cross-validation is used, if the number of samples is smaller than 14 whereas a seven-fold cross-validation is applied if the number of samples is higher than 13.
GPCM.The pairwise correlation method is also an easy and fast way to select and rank different variables (features) in our case of antioxidant capacity methods. 32,38,39Here the input matrix is the same as for SRD, so the samples are arranged in the rows and the methods (variables) are arranged in the columns.The variables are compared in pairs and in all possible combinations.The three possible outcomes are winner (when one member of the compared pair is superior to the other in interrelation to the reference according a statistical test), loser (when one member of the compared pair is inferior to the other) and no decision (tie) if none of the pair is superior (or inferior) according to the statistical test.There are three ways to order the variables: simple ordering (it counts the number of wins), difference ordering (it calculates the differences between wins and losses) and signicance ordering (the probability weighted variant of difference ordering).The conditional Fisher's exact test based on testing signicance in the 2 Â 2 contingency tables is a suitable selection criterion for GPCM, but there are other selection criteria available, for example McNemar's test, 40 c 2 test 41 and Williams-t test.From the above tests only the Williams t-test is parametric and requires the assumption of normality.
The general comparison of the four statistical methods is shown in Table 3.The methods are compared according to the type, used soware, distance measure, signicance test and robustness.

Conclusion
Although the chemometric methods provide somewhat deviating results, credible general conclusions can be drawn: the principal component analysis (PCA) and hierarchical cluster analysis (HCA) prove that the antioxidant capacity assays based on similar principles are connected to each other closely.All statistical methods suggest that water soluble antioxidant capacity (ACW) and lipid soluble antioxidant capacity (ACL)

Fig. 1
Fig. 1 Grouping pattern of six different antioxidant activity assays (dendrogram) for berry dataset.Euclidean distance and Ward's method were used.

Fig. 2
Fig. 2 Grouping pattern of five different antioxidant activity assays (dendrogram) for sour cherry samples.Euclidean distance and Ward's method were used.

Fig. 3
Fig. 3 PC loading 1 against PC loading 2 for the berry dataset.

Fig. 4
Fig. 4 PC loading 1 against PC loading 3 for the sour cherry dataset.

Fig. 5
Fig.5Evaluation of six antioxidant capacity methods using sum of ranking differences (berries dataset).Average was used as reference.Scaled SRD values are plotted on the x axis and left y axis.The right y axis shows the relative frequencies for the black Gauss-like curve with triangles (exact theoretical distribution).

Fig. 6
Fig. 6 Box & Whisker plot of SRD % values for six antioxidant capacity methods of the berry dataset.The uncertainties for SRDs are derived from a seven-fold-cross-validation.

Fig. 7
Fig. 7 Evaluation of five antioxidant capacity methods using sum of ranking differences of the sour cherry dataset.Average was used as reference.Scaled SRD values are plotted on the x axis and left y axis.The right y axis shows the relative frequencies for the black Gauss-like curve with triangles (exact theoretical distribution).

Fig. 8
Fig. 8 Box & Whisker plot of SRD % values for five antioxidant capacity methods of the sour cherry dataset.The uncertainties for SRDs are derived from a seven-fold-cross-validation. Abbreviations

Table 2
Comparison of five antioxidant capacity determination techniques for sour cherry dataset by GPCM.For notations see Table1

Table 3
Comparison of statistical methods.The statements about distance measure and software are connected to our investigations differ from the others.It can be a conrmation of the difference between the reaction mechanism of the group of ACL, ACW and the other used methods.Ferric ion reducing antioxidant power assay, FRAP, (and total polyphenolic content methods, TPC) was recommended to substitute all the other antioxidant capacity methods for both datasets.Our goal was to determine which assay(s) can be used with the least error, if we have to choose only one technique.Sum of ranking differences and the pairwise correlation method order antioxidant capacity assays in a statistically correct way using Wilcoxon's matched pair test, Conditional Fisher's exact test, McNemar's test, etc.The mentioned methods based on different ways of thinking and different ways of calculation still support each other in revealing the order of assays. methods