Aline Renée
Coscione
,
João Carlos de
Andrade
* and
Ronei J.
Poppi
Universidade Estadual de Campinas, Instituto de Química, CP 6154, 13083-970 Campinas, SP, Brazil. E-mail: dandrade@iqm.unicamp.br
First published on 13th December 2001
Real samples were used for PLS model calibration and validation steps, showing that this approach can be of value in preventing deviations in the results caused by the matrix effects for the simultaneous spectrophotometric determination of aluminum and iron in plant extracts. One hundred UV-vis spectra, obtained from samples of the 1997 to 2000 International Plant-Analytical Exchange (IPE) program (The Netherlands), were used for model development, with ICP-AES aluminum and iron determinations as reference values for model calculation. The plant extracts were analyzed both by ICP-AES and by the PLS models developed in this work, using calibrations with both aqueous standard solutions and with real sample extracts. In addition, since the use of smaller calibration sets could be of value in reducing both the cost and the time of analysis, sets with fewer calibration samples were also investigated, with the help of the Kennard and Stone algorithm for sample selection. Comparison of the predictability of the best model obtained with each calibration set was made using the ratio of their relative root mean square error (%RMSEV) for samples in the validation set, for aluminum or iron determinations, and were compared against F-test tabulated values. For all the models developed with real samples, the differences in the %RMSEV values for the aluminum or iron determinations were found not to be statistically significant, at a confidence level of 95%. Although it was observed that the aluminum, but not the iron, determinations with the PLS 2 model prepared with aqueous standards tend to be slightly lower than the ICP-AES determinations, this model has a good global prediction ability, as observed through the correlation curves presented, and can be used for screening determinations or for other agricultural purposes.
Thus, the use of real samples for both the calibration and validation steps is desirable in multivariate modeling procedures, but this is most commonly used in industrial process monitoring, owing to their well defined behavior, where large changes are less expected.
On the other hand, owing to the high number of variables present in environmental and biological samples, associated with their usual complex interaction with the chemical components of these matrices, the use of real samples for calibration should be preferred. However, the lack of reliable real samples that can be used as reference standards has led to the almost exclusive use of pure aqueous standard solutions in multivariate calibration and often also in the validation steps, when modeling these chemical systems.2–6 Spiked real samples may also be used in the calibration process, but their applicability can be questioned from the analytical point of view, if the interfering species are present in lower concentrations..4,6
This work was intended to demonstrate that the use of real samples in the calibration and validation steps of a multivariate modeling procedure is possible and that this approach can be of value in preventing deviations in the results caused by matrix effects on the simultaneous spectrophotometric determination of aluminum and iron in plant extracts.5
An Hitachi U2000 double-beam scanning spectrophotometer, equipped with a sipper and 10 mm pathlength silica spectrophotometric cells, was used to obtain the UV-vis spectra. All data were recorded as ASCII files, by interfacing a computer through the serial port to an RS 232C bi-directional cable, according to the manufacturer’s instruction manual. The data acquisition software was written in Quick Basic.
A Jobin-Yvon JY-50P inductively coupled argon plasma atomic emission spectrometer was employed for the analysis of the plant extracts used in the calibration and validation steps, as described below. The ICP operating frequency was 40.68 MHz at a power of 1000 W, with a argon torch flow rate of 12 l min−1, using the atomic lines of 308.215 nm for aluminum and 259.940 nm for iron.
Stock standard solutions of 1000 mg l−1 aluminum and of iron were prepared by dilution of the contents of standardized Merck Titrisol ampoules to 1000 ml with water. Aliquots of these stock standard solutions were taken to prepare working standard solutions for both the spectrophotometric multivariate aqueous standard calibration experiments and the ICP-AES multielemental calibration plots.
Both a 3.5 mol l−1 NaOH solution, prepared by dissolving 140 g of NaOH pellets in water and diluting to 1 l, and a formate–formic acid buffer solution, obtained by dissolving 102 g of sodium formate in about 600 ml of water followed by the addition of 132 ml of formic acid (98% v/v) and dilution to 1000 ml with water, were employed to adjust the pH of the final solutions, according to the sample preparation procedure described below.
A 0.5% m/v xylenol orange (XO) solution, used as the chromogenic reagent in the proposed spectrophotometric procedure,5 was obtained by dissolution of a mass of 2.5 g (±10 mg) of the solid tetrasodium salt (Merck) in 500.0 ml of water. Before final dilution to the mark, it is necessary to add 2 ml of chloroform and a few drops of 2 mol l−1 HCl, until a clear orange solution is obtained. The addition of CHCl3 prevents the action of microorganisms and HCl is added to make the pH adjustment of the final reaction medium easier. If kept in a refrigerator when not in use, this XO solution was shown to be effective, without changes, for at least 6 months.
Then, 5.00 ml of each of these plant extracts were transferred into other 50.0 ml calibrated flasks, followed by the addition of 1.0 ml of the XO solution and 1.0 ml of 3.5 mol l−1 NaOH solution plus a few additional drops of the NaOH solution, until the xylenol orange color first turn deep violet. A larger excess of base should be avoided. In sequence, 5.0 ml of the buffer solution and 25.0 ml of ethanol were added. The volume was completed with water and the solution was mixed carefully. At this point, a clear orange to reddish solution should be obtained, depending on the metal concentrations. The color development is almost fully achieved after a standing period of 2 h. For the best results, the spectra of these solutions should be recorded from 2 to 4 h after mixing the reagents, against water as blank, in the wavelength range 350–650 nm, using a spectral resolution of 2 nm. A higher resolution will slow the sample throughput and a lower resolution may result in lower precision and accuracy. A mineralized sample blank should always be run in parallel, for further correction of the predicted sample values. Since inconsistent results may be observed if the glassware is not well cleaned, the calibrated flasks should be decontaminated prior their use by immersion in 10% v/v nitric acid solution for at least 2 h, followed by efficient washing with water.
In the present study, 100 UV-vis spectra obtained from samples from the 1997 to 2000 International Plant-Analytical Exchange (IPE) program (Wageningen Agricultural University, The Netherlands)7–11 were used for model development and validation. Forty of these spectra, including a few replicates, obtained from new sample preparations, were also used in the calibration set.
The identification of the plant extracts used as the calibration set and their aluminum and iron concentrations are shown in Table 1. These extracts contained from 7.6 to 2949 mg kg−1 of aluminum and from 51 to 2348 mg kg−1 of iron, dry weight basis (DW), respectively. This corresponds to values of 0.02–1.16 mg l−1 of aluminum and 0.10–1.05 mg l−1 of iron in the final solutions used to obtain the M–XO (M = Al, Fe) complex spectra.
Spectra | IPE sample No.–year/report |
Plant extracts/mg kg−1 (DW)
Reported values |
Final solution/mg l−1a
ICP-AES values |
||
---|---|---|---|---|---|
Al | Fe | Al | Fe | ||
a The figures in terms of mg l−1 are the concentration values found in the final solutions used to obtain the spectra, after the appropriate dilutions.5 b 10 sample calibration set. c 10 sample calibration set, selected after PCA. | |||||
1 | 950–98/5 | 229.8 | 344.4 | 0.46 | 0.69 |
2b | 980–98/5 | 108.6 | 158.1 | 0.22 | 0.32 |
3 | 999–98/2 | 55.5 | 142.6 | 0.11 | 0.29 |
4 | 652–98/4 | 50.7 | 87.9 | 0.10 | 0.18 |
5c | 124–98/4 | 264.2 | 247.2 | 0.53 | 0.50 |
6c | 100–98/4 | 433.9 | 487.5 | 0.87 | 0.98 |
7bc | 883–98/6 | 2948 | 1370 | 1.15 | 0.54 |
8 | 126–99/3 | 396.8 | 523.8 | 0.80 | 1.05 |
9 | 980–99/4 | 165.2 | 219.1 | 0.33 | 0.44 |
10 | 652–99/4 | 59.9 | 108.9 | 0.12 | 0.22 |
11 | 950–99/4 | 229.9 | 403.7 | 0.46 | 0.81 |
12b | 132–99/3 | 201.6 | 245.0 | 0.40 | 0.49 |
13c | 952–99/2 | 263.2 | 432.6 | 0.53 | 0.87 |
14 | 133–99/5 | 62.5 | 177.7 | 0.13 | 0.36 |
15 | 949–97/4 | 158.6 | 188.1 | 0.32 | 0.38 |
16bc | 883–97/4 | 2592 | 1337 | 1.02 | 0.52 |
17 | 108–97/3 | 174.7 | 249.8 | 0.35 | 0.50 |
18b | 109–97/3 | 435.4 | 443.3 | 0.87 | 0.89 |
19 | 118–97/5 | 27.7 | 51.4 | 0.06 | 0.10 |
20b | 686–98/6 | 1006 | 1233 | 0.77 | 0.94 |
21c | 113–97/3 | 62.3 | 136.7 | 0.13 | 0.27 |
22 | 129–99/1 | 1382 | 2348 | 0.54 | 0.92 |
23 | 885–99/4 | 293.8 | 600.1 | 0.23 | 0.46 |
24c | 547–99/1 | 7.6 | 57.1 | 0.02 | 0.12 |
25 | 114–97/2 | 56.1 | 120.3 | 0.11 | 0.24 |
26b | 118–97/5 | 27.2 | 55.1 | 0.06 | 0.11 |
27 | 849–97/2 | 90.3 | 119.3 | 0.18 | 0.24 |
28b | 904–97/2 | 240.0 | 246.0 | 0.48 | 0.49 |
29 | 949–97/2 | 193.8 | 194.7 | 0.39 | 0.39 |
30 | 995–97/2 | 28.0 | 62.2 | 0.06 | 0.13 |
31c | 980–97/1 | 135.8 | 183.2 | 0.27 | 0.37 |
32c | 100–98/4 | 392.1 | 406.4 | 0.79 | 0.82 |
33 | 677–98/4 | 47.3 | 105.4 | 0.10 | 0.21 |
34b | 686–99/6 | 1054 | 1203 | 0.81 | 0.92 |
35 | 874–98/2 | 78.5 | 124.4 | 0.16 | 0.25 |
36c | 949–98/6 | 222.5 | 225.8 | 0.45 | 0.45 |
37 | 125–98/1 | 192.5 | 303.1 | 0.39 | 0.91 |
38 | 132–99/3 | 216.1 | 234.6 | 0.43 | 0.47 |
39b | 883–99/5 | 2949 | 1342 | 1.16 | 0.53 |
40 | 933–99/4 | 329.3 | 234.9 | 0.66 | 0.47 |
The remaining 60 spectra, which were not used in the model calibration, were taken as the validation set. The year 2000 IPE plant sample extracts were used only in this step. Some of them were analyzed in triplicate and were also used for further correlations.10,11 The plant extracts of the validation set contained from 29 to 2889 mg kg−1 of aluminum and from 70 to 2371 mg kg−1 of iron, corresponding to 0.06–1.10 mg l−1 and 0.12–1.04 mg l−1 of aluminum and iron, respectively, in the final solution used to record the spectra.
The relative root mean square error (%RMSE), was calculated by eqn. (1),1
![]() | (1) |
Both calibration and data processing were performed using the commercially available MatLab package v.4.2c (Math Works), and the PLS Toolbox v.1.5 for use with MatLab (Eigenvectors Technologies). Laboratory-prepared MatLab-based software was used to perform the sample selection by the Kennard and Stone algorithm.
Our development of PLS models with real samples started with 40 spectra obtained with the plant extracts, as listed in Table 1. A larger number of calibration samples were used, compared with the 26 aqueous standard solutions previously employed,5 as a first attempt to obtain a more representative set of the real sample population which could enable us to select the objects (samples) more properly.
Principal component analysis (PCA)1 of the 40 plant extract spectra of the calibration set was performed, after mean centering. The PCA showed that two principal components (PCs) can explain 99.7% of the variance. The PCA scores plot obtained for PC1 and PC2 is shown in Fig. 1. From this figure (with the help of Table 1), it can be noted that it is possible to discriminate the concentrations but not the year of the IPE sampling, as shown by extract numbers 7, 16 and 39 (Table 1), corresponding to sample 883 (carnation). This observation confirms the absence of chemical alterations in the samples with time,15 enabling us to use these spectra, obtained from the replicates of these samples, as independent objects for model calculation. In addition, since the use of smaller calibration sets could be of value, reducing cost and analysis time, sets with 30, 20 and 10 calibration samples were also investigated.
![]() | ||
Fig. 1 Scores plot for principal component analysis of the samples used for calibration. The composition of the samples is described in Table 1. The Kennard and Stone 10 sample calibration sets are marked with circles (direct distance calculations) and squares (PCA scores distance calculations). |
However, the commonly used scores plot sample selection procedure could not to be recommended, because it would be difficult to select properly the most representative spectra from the scores in Fig. 1. In this case, where the spectra scores are not uniformly distributed, algorithms designed for sample selection, such as the Kennard and Stone alogrithm,13 should be used to ensure the proper selection of the objects for model calibration.
The Kennard and Stone algorithm,3 used in this work to select the spectra to be used as calibration sets, is based on maximizing the Euclidean distance between two consecutive samples. When the distance calculation is performed directly on the samples, it can be said that the direct distance calculation mode is used. In spite of this, when few samples will be selected, to obtain a more uniform coverage of the experimental region, the redundant information should first be eliminated and then the distance calculation performed. This can be done, for example, by using PCA. The distance calculation is then made using the scores and will be called, in this work, the scores distance calculation mode. Both the direct distance and the scores distance calculation modes were tested for sample calibration selection, to ensure the best sampling even when as few as 10 spectra were used for calibration. The samples selected for each 10 spectra calibration set are indicated in Fig. 1. The aluminum and iron contents of the selected samples are indicated in Table 1.
Considering the two calibration sets obtained with 30, 20 and 10 spectra, using the Kennard and Stone algorithm, plus the 40 spectra and the aqueous standard solution calibration set, models using 2–10 latent variables were built. For each calibration tested, the best model was chosen as a compromise between calibration fitting and the prediction ability by comparing %RMSEC and %RMSEV for the 60 spectra validation set. For all the cases studied, the best PLS 2 models were those with four latent variables.
Although the selected spectra were not all the same for the real sample calibration sets when using the two selection modes of the Kennard and Stone algorithm, which went from 25 of the 30 spectra in this calibration set (80% of the spectra) to just two spectra in the 10 spectra calibration set (20% of the spectra), the determination performance of the models can be compared through their %RMSEV values, which are shown in Table 2.
Selection mode | ||||
---|---|---|---|---|
Direct distance calculation | Scores distance calculation | |||
Al | Fe | Al | Fe | |
30 | 12.1 | 8.1 | 10.8 | 7.0 |
20 | 12.1 | 7.8 | 11.4 | 7.4 |
10 | 12.0 | 7.8 | 12.5 | 9.1 |
The prediction ability of the models, shown in Table 2, may be compared against the tabulated F-test values, after obtaining the ratio (%RMSEVi/%RMSEVj),2 where %RMSEVj was the value for the most precise model.16 The model developed using 40 samples for calibration was taken as the more precise because it included the larger number of real sample for calibration, among the sets tested. It also has one of the lowest %RMSEV values and was used for further comparisons unless specified otherwise. The calculated %RMSEV ratios for the aluminum or iron determinations, for all the models developed with real samples, were lower than the tabulated F-value, which is 1.53 for 60 samples, at a confidence level of 95%. Thus, the Al and Fe values obtained with these models are not different from those obtained with the 40 spectra calibration set. Also, according to this criterion, there is no difference between models with the same number of spectra, that differ only in the Kennard and Stone algorithm selection mode.
On the other hand, the validation set results are statistically different for the aluminum determinations obtained with the aqueous standards solutions calibration model and with the 40 spectra set (F calculated = 2.53). These results are also significant in relation to all the other models using real samples for calibration (F calculated values from 1.89 to 2.53). For iron, the validation results are not statistically different (F calculated values were from 1.00 to 0.74), but are sometimes (as when compared with the 30 spectra selected through the direct distance calculation mode) slightly better.
The analysis of such results suggests the preferred use of real sample calibration models for the aluminum and iron determinations. Also, it seems that, with the proper selection, 10 plant samples would be sufficient for the development of models with good predictability. In practice, both these statements should be carefully considered and some interesting information about the PLS models developed was obtained from the evaluation of correlation curves for the ICP determinations vs. PLS predictions for the samples in the validation set, as shown in Fig. 2 for the 40 sample calibration set. These samples contained 29–2889 mg kg−1 of aluminum and 70–2371 mg kg−1 of iron.
![]() | ||
Fig. 2 Correlation curves for the ICP vs. PLS predictions (based on a 40 spectra calibration set) for the 60 spectra in the validation set. (a) Aluminum determinations; (b) iron determinations. |
The correlation curves for the 40 and the 10 extract calibration sets, and the aqueous standard calibration, are shown in Table 3. Confidence limits were calculated for all the regression coefficients, using their standard errors and the t-test (95% confidence) as [regression coeficient ± standard error × t58] and showed that the model developed with the 40 extract calibration set was the most precise in relation to the ICP-AES determinations, since its angular coefficient was not significantly different from 1.00 and its linear coefficient was not significantly different from zero. Also, for the other models in Table 3, the confidence limits calculated for their linear coefficients showed that there was not a systematic error in determinations, as could be expected for the Fe determinations with the 10 samples selected in the scores distance calculation mode or for the Al determinations with the standard solution calibration set, since all the linear coefficients were not statistically significant from zero. In spite of that, based on this same statistical criterion, these models do not compare well with the ICP results, as their angular coefficient confidence limits are different from 1.00. Thus, the results on a plant dry weight basis, obtained with the models calibrated with aqueous standard solutions, were equivalent to those from the 10 real sample calibration model.
Element | Calibration set | Latent variables | Correlation curve | Correlation coefficient (r) |
---|---|---|---|---|
a Samples selected by the Kennard and Stone algorithm using the scores distance calculation mode. b Aqueous standard calibration set. | ||||
Al | 40 | 4 | y = 5.04 + 0.98x | 0.989 |
10a | 4 | y = 6.13 + 0.92x | 0.989 | |
10 | 4 | y = 8.66 + 0.93x | 0.976 | |
26b | 6 | y = −14.53 + 0.93x | 0.982 | |
Fe | 40 | 4 | y = 0.20 + 0.98x | 0.992 |
10a | 4 | y = −18.29 + 1.06x | 0.990 | |
10 | 4 | y = 2.47 + 0.96x | 0.992 | |
26b | 6 | y = 0.26 + 0.93x | 0.988 |
In conclusion, it was observed that the PLS 2 model prepared with aqueous standard calibration has a good global predictive ability, as observed through the correlation curves presented. Despite the high linear coefficients shown in Table 3, there is no evidence of a systematic error due to the use of this calibration and the regression curve is very similar to those obtained for models obtained with 10 real samples. The fact that the %RMSEV values were significant different from those obtained with the real sample calibration models may be explained by the sum of higher prediction errors obtained for samples with increased silica contents. Thus, when only an estimate of Al concentration is acceptable, such as for screening determinations or for agricultural purposes, the PLS models calibrated with aqueous standard solutions can be used.
Although the best results compared with ICP-AES were obtained using the 40 spectra calibration set, the reductions in time and cost obtained by the use of fewer calibration samples are very interesting for routine purposes. Then, with the proper selection, it would be possible to use a total of 15 plant extract spectra (10 for calibration and 5 for validation), representative of the sample population, to prepare a model for the simultaneous spectrophotometric determination of aluminum and iron in plant extracts, with good predictability. As a reference for model calculation, the concentrations of aluminum and iron determined with an analytical reference, such as ICP-AES, should be used. The main advantages of this procedure are the prevention of deviations in results due to matrix effects in unknown samples and the possibility of using characteristic plant material for calibration in routine work.
This journal is © The Royal Society of Chemistry 2002 |