J. M.
Andrade
*,
M. J.
Cal-Prieto
,
M. P.
Gómez-Carracedo
,
A.
Carlosena
and
D.
Prada
Department of Analytical Chemistry. University of A Coruña, Campus da Zapateira, E-15071 A Coruña, Spain. E-mail: andrade@udc.es; Fax: +34-981-167065
First published on 26th June 2007
Coupling multivariate regression methods to atomic spectrometry is an emerging field from which important advantages can be obtained. These include lower workloads, increased laboratory turnarounds, economy, higher efficiency in method development, and relatively simple ways to take account of complex interferences. In this paper four typical regression methods (ordinary multiple linear regression, principal components regression, partial least squares and artificial neural networks) are presented in a practice-oriented way. The main emphasis is placed on explaining their advantages, drawbacks, how to solve the latter and how atomic spectrometry can benefit from multivariate regression. Finally, a retrospective review considering the last sixteen years is made to present practical applications on: flame-, hydride generation-, electrothermal-atomic absorption spectrometry; inductively coupled plasma spectrometry and laser-induced breakdown spectrometry.
Nowadays chemometric methods offer efficient and powerful alternatives to complement/refine classical working procedures to either optimise a method or/and develop calibration models. This tutorial will focus only on the latter issue, as optimisation using experimental designs or evolutionary algorithms (simplex, evop, etc.) has become routine in many fields within analytical chemistry. On the contrary, despite the idea of combining multivariate regression and atomic absorption spectrometry not being new, and proving to yield cheap analytical methodologies, being environmentally friendly (because of a lower consumption of reagents, atomisers, modifiers, etc.), relatively fast and without trial and error assays, not many published papers can be found on this issue.
Calibration is defined as the operation that determines the functional relationship between measured values (e.g., signal intensities, absorbances (at particular signal positions), wavelengths) and analytical quantities characterizing types of analytes and their amount (content, concentration). Calibration includes the selection of the model, the estimation of the model parameters as well as their errors, and their validation.3
Several aspects must be underlined from the definition.
(i) Calibration is an empirical process and, accordingly, ad hoc.
(ii) Calibration leads to a functional relationship; this means that a given function (model) can be excellent for us and disastrous for other colleagues. Clearly, issues (i) and (ii) preclude the existence of universally good solutions. Further, as Mark pointed out in his practice-oriented book,4 calibrations do not exist by themselves: they are intimately intertwined with the nature of the samples and with the characteristics of the spectrophotometers that analysts use.
(iii) Selection of the proper function (model) is, often, a highly difficult matter. There are so many possible options that the analyst can feel overwhelmed and it might be tempting to just press a button in any software and forget the very basic principles underlying (correct application of) those models. Using Mark’s words again, such mystique related to the calibration process yielded SCSEAN (“super-calibrate-statistic-expeditions-algorithm-ad-nauseous”). This means that one should apply the simplest models possible (in mathematics, this is called parsimony) and only dig into more complex approaches when it is required by the problem at hand. According to Meloun et al.5 all regression diagnostics should examine the “regression triplet” (data, model, method) to assess the data quality for a proposed model, the model quality for a dataset and the fulfilment of the method assumptions.
(iv) Validation is a critical issue since it is the step of the model-development process that assesses model performance. First, and far from trivial, it has to be verified that the assumed model, often a straight line, is correct, i.e. that the experimental standards do define a true straight line. Some people would argue this is quite simple because “only” (?) the correlation coefficient and a scatter plot of the samples (e.g. absorbance versus concentration) are required. Nevertheless, remember that a high correlation coefficient does not necessarily imply that the points follow a straight line, and that many calibration plots seen these days are based on a scarce number of experimental points and, therefore, any decision is risky. Just consider how many “regression lines” have the form of a parabola on inspecting the residuals. Validating a proposed model is far from trivial and there is an overwhelming number of different statistics that may be applied in different situations; two excellent reviews covered some of them.5,6
Then, once the chemist has evidence enough to support the straight line behaviour, the average error of the regression line has to be compared to his/her (client) needs. Is it low enough? Has it to be reduced? If it is fine, then he/she should test/validate the assumption. This means that he/she should analyse known samples (CRMs would be welcome) and test how good results are with that particular regression line (model). If they are fine, great!, then the model can be used routinely (provided it is tested periodically to verify that it remains valid).
If things are not simple when working with univariate data, they can be a nightmare in multivariate regression. It is worth noting that when a model/calibration (even the simplest one) has proved valid it is so only for the type of samples used for validation. In effect, any calibration is only acceptable as long as it is applied to samples for which it proved to yield good results.6 Some typical questions are: was the calibration set large enough to bracket future unknowns?; was its chemical variability large enough?; was variability at the validation set large enough to represent what can be obtained in routine samples?; will all samples be treated equally?; are the sample matrices always “similar”? All these have to be considered under the term validated calibration (validated model).
The aim of this paper is to present an introductory tutorial where the very basics of several well-known and broadly applied multivariate calibration methods are presented to atomic spectroscopists. They are explained sequentially in a practice-oriented way. Each new methodology is presented as a consequence of the shortcomings of the previous one. Mathematics have been kept to a minimum in order to simplify readability by unskilled (on chemometrics) analysts. Whenever possible, other tutorials or reviews are referred to in order to cope with more technical aspects and much emphasis is placed on explaining the advantages, requirements and limitations of the calibration methods studied hereinafter. Different from other tutorials on similar chemometric subjects, the present paper gives a review with practical applications where atomic spectrometry measurements (FAAS, ETAAS, ICP, etc.) were combined with multivariate calibration, restricted to the last sixteen years (1990–2006).
Many real situations do not lead to positive answers to all previous questions and, so, will present serious problems when classical calibration is performed. Just as an example, can errors on the concentrations be neglected when preparing ppb or ppt standards? Can concomitants be controlled/avoided? One solution (employed by different atomic spectroscopists, as will be presented later) consists in including known amounts of the interferents in the standards during their preparation and evaluating their participation in the signal being measured. This can be made by multiple ordinary linear least squares regression, MOLSR, which is the natural extension of OLSR and (typically) corresponds to the generalized Lambert–Beer–Bouguer’s law.
Interestingly, note that measuring a signal at different wavelengths is multivariate analysis (and the spectral data has to be represented by vectors, also termed first-order data7). In principle, as many instrumental wavelengths as interferents contributing to the measured signal are required, but since the interfering effects can be complex it may be advisable to measure absorbances at some additional wavelengths. This is called redundancy and, sometimes, it can be advantageous in order to evaluate interaction between interferents, significance of the coefficients, etc.
Multiple ordinary least squares regression (MOLSR) may be a good option in atomic spectrometry because quite often the samples undergo a previous treatment (acid digestion, ashing, etc.) which can somewhat “simplify” the final matrix. However, important drawbacks of this approach are identification of all concomitants in the unknown samples and/or their interactions, which might be almost impossible, and, of course, the increased experimental workload in order to quantify them (mainly on monoelemental techniques yielding transient signals, such as ETAAS).
Fortunately, these pitfalls can be overcome easily by generalizing the idea of “multiple wavelengths”. In most circumstances an atomic peak is registered and, indeed, such a peak is multivariate in nature (absorbance versus time) so it can be treated as any other typical molecular spectrum, where multivariate regression methods are applied frequently. Thus, it is the overall peak (its shape, height, total area, etc.) that becomes affected directly by concomitants and it should be possible to model it. The term “model” is used here in a loose sense but it immediately links the atomic peak with the initial definition of calibration. Recall that in univariate regression an empirical model relates a physical measurement (absorbance) to a cause (concentration). It follows that the whole atomic peak can be related to the concentration of the analyte and, maybe, of the concomitants. The only point here is to realize that there are no conceptual differences between the model given by c = 0.53 + 0.8 × A (λ) (a unique absorbance measured at a given λ) and the model given by c = 0.63 + 0.6 × A (t), where now the absorbance is measured at time t of the atomic peak (registered at a given λ).
The main advantage of the latter approach is that the use of the whole atomic peaks constitutes a straightforward way to implement multivariate regression and it generalizes the Lambert–Beer–Bouguer’s law in atomic spectrometry. According to Fig. 1 and using a classical formulation, we can write:
a(t1) = c1 × s1(t1) + c2 × s2(t1) + … + ck × sn(t1) a(tj) = c1 × s1(tj) + c2 × s2(tj) + … + ck × sn(tj) |
a = c1 × s1+…+cn·sn = S·c, or a = S·c + e | (1) |
![]() | ||
Fig. 1 Decomposition of an atomic peak on its different components and related basic terms used throughout the text. |
Since, in general, more than one standard is used to calibrate, eqn (1) becomes:
A = C × S + E | (2) |
This formulation is usually termed the “classical” regression model8 because the absorbance is defined as a function of the pure spectra (unit concentrations) of the absorbing species and their concentrations, and therefore the concentrations of an analyte in several unknown samples are sought by measuring their corresponding atomic peaks. Therefore, once matrix S, containing the pure temporal profile for each absorbing species, unit concentration, was calculated in the calibration stage the concentrations on the unknowns are predicted using eqn (3).
C = S+ × a | (3) |
(i) Formulae get simplified and interpretations are simpler (not a trivial advantage in complex models).
(ii) Inverse models treat directly analyte concentrations as a function of spectral measurements6,8 (not the opposite, as classical calibration does).
(iii) It was shown in several studies dealing with molecular spectroscopy (with a uniform distribution of the calibration standards representing well future unknowns) that prediction uncertainties were lower using the inverse approach (see, for example, Martens and Martens9 and Brereton6 for more discussions). Whether this can occur in atomic spectroscopy remains to be studied formally.
(iv) It has been proved that inverse models cope better with noise in the spectra so their predictions are better than those of “classical” models.8 The spectral variables now act as the “signals” from where the concentration is sought, hence random errors are permitted, as opposed to with classical calibration. Inverse regression is, accordingly, a way to comply with the critical requirement of the least squares criterion that there should not be errors (not even random) in the “independent” variables (whatever they are). This can be understood more easily if it is considered that nowadays, instruments are becoming so precise that random errors (uncertainty) on standards preparation might be of the same order as those from signal measurement. For instance, around 1% uncertainty can be expected in standards preparation (weighting and diluting) and around 5% in ETAAS measurement uncertainty of aqueous standards (around 1% for ICP techniques), so 1% is not negligible compared with 5%. Therefore, the “classical” calibration formulation (where error on the abscissa is not allowed) may not be correct because a more appropriate assumption would be that errors are present in the concentrations6 since they cannot be neglected. This has to be studied for each particular case. Unfortunately, the equations to evaluate the standard error of the predicted values are still (good) approximations.
(v) A major advantage of inverse methods is that concomitants do not participate explicitly in the equations (contrary to the “classical” applications of the Lambert–Beer–Bouguer’s law), which enables us to calibrate considering accurate concentrations only for the single analyte that we are interested at.8 Nevertheless, despite the fact that exact concentrations of the concomitants are not required it is necessary to vary them throughout the standards to increase the real variance modelled by the regression. The fact that only the concentrations of the analyte are needed may be quite surprising because c1, c2,…, ck in the generalized Lambert–Beer–Bouguer’s law (eqn (1) and related explanations) are explicity considered on the equations. Fortunately, algebraic transformations can modify such a dependence to one based on the si(tj) values (analogous to the ε values when working with the Beer’s law, see Mark4 for a simple understandable example), which are the regression coefficients.
Therefore, hereinafter, equations will refer to the inverse models (unless otherwise stated). As an example, the inverse calibration model for the example given in eqn (1) above is:
c = M × b + e | (4) |
In addition to the five reasons above justifying preference of the “inverse” regression over the “classical” one (more technical details can be found elsewhere, e.g. see Olivieri et al.8 and Mark4 for a simple introduction) two practical motivations can be further underlined.
(i) Except for a theoretical perfect regression (i.e., only Gaussian random errors in the measured signal) the regressions “absorbance on concentration” and “concentration on absorbance” are not equivalent.10 In univariate regression this can be solved by weighting both the dependent and independent variables and performing iterative calculations11 or applying bivariate least-squares regression.12 In multivariate regression this problem has been addressed as well, although focused in principal components regression (a technique to be explained in Section 4.1).13 Inverse regression yields the solution in terms of the experimental atomic peak that we can register. See Brereton6 for a nice general-purpose example.
(ii) Spurious spectral effects not easy to (explicitly) account for, which may be present in the measurements, can be corrected for. In mathematical terms, it is said that the true signal and the noise are orthogonal and this would be implicit in the coefficients yielded by the models.8
B = (MT × M)–1 × MT × c | (5) |
Recording the atomic peak using a large number of predictors, currently anything between 30 and 100 (even more), leads to a practical problem because in most cases many fewer standards are measured for calibration. In traditional terms, we would say that fewer equations are available than predictors and, so, fewer equations than unknown regression coefficients (think, for instance, of classical calibration using the generalized Lambert–Beer–Bouguer’s law). This is an overestimated regression model.5 In that case, MT × M cannot be inverted and hence, even though the matrix was not singular, no unique solution would be obtained.14
To conclude, variables are collinear if there are high correlation or other near or exact linear relations among them.9 This problem cannot be solved even when the number of standards is increased and it is common to both “classical” and “inverse” formulations. It also leads to the pragmatic conclusion that MOLSR inverse models do not exhibit real advantages over classical models.6 Overestimation can be overcome by increasing the number of standards to, at least, the number of wavelengths or by selecting a very reduced set of uncorrelated predictors (“independent” variables) to perform MOLSR. Although the idea is simple, there is not a broadly accepted way to reach that set. In many commercial softwares backward and forward procedures, where the variables are being discarded or introduced successively, are available but they still have problems. See Caulcutt15 for more detailed explanations.
There are several standard ways to handle both problems in the chemometrics field. All can be covered by the “data compression” heading. Two main approaches became popular:9 to use MOLSR on a set of a few carefully selected variables (pointed out above) and to regress the analyte concentrations onto a few linear combinations of the original predictors (those combinations are called components, factors or latent variables). In both cases the collinearity problem is solved and the final prediction equation is stable. In general, the second alternative is preferred. According to Martens and Martens9 the basic idea is that by using these new variables (linear combinations or factors) in the appropriate regression equations irrelevant and unstable information is avoided and only useful spectral information is employed. In addition, many times, a sound chemical interpretation is also possible with the aid of simple plots.
For the purpose of this tutorial only two classical, broadly applied methods will be discussed briefly: principal components regression and partial least squares regression. Both overcome multicollinearity problems by concentrating the atomic peak spectral information onto a reduced number of factors (this is termed rank reduction). The inverse formulation will be presented as it is typical of the chemometrics literature (the advantages reviewed in Section 3.2 become real).
Description of PCA (principal components analysis) is outside the scope of this paper. Here, only a recapitulation is given to simplify readability and interested readers are encouraged to either references 9,14 and 16 or some recent tutorials from Brereton,6 Geladi17,18 or Woldet al.19 PCA is a mathematical process by which a data matrix is decomposed on a reduced set of two simpler and independent matrices. One contains information related to the samples (the scores matrix, T) whereas the other contains information related to the variables (the loadings matrix, P). Each row on T (i.e., ti) and its corresponding column on P (i.e., pTi) constitute a principal component, PC. Hence, for a mean centred data matrix of atomic peaks, M = T × PT + E, where the symbols M and E have the same meanings as in previous sections. Mean centring the data matrix is a common data pre-treatment which consists in subtracting the average value of a column from all values in that column. It is used to avoid the PCs being affected by the different metrics and/or ranges that can be associated to each variable (e.g., a variable whose absorbances range between 0.05–0.15 is not necessarily less important than another one varying from 0.50 to 0.60) and it has to be applied before proceeding with any multivariate technique.
Graphically, the PCs can be visualized as a new set of (convenient) new variables with several important properties: (i) they are orthogonal, i.e., the information explained by a PC is not present in the other PCs although this somewhat depends on the scaling being used;6 (ii) they are obtained successively so that the first PC explains most of the information, the second less (but more than the 3rd), and so forth; (iii) in general, a very reduced number of PCs (2, 3, 4,…) can explain almost all the information we had in the original set of atomic peaks. It derives from the latter property that random noise and unrelevant spectral artifacts are left out for the last PCs, which is highly advantageous.
PCA, and so PCR, yields two essential benefits: the number of variables is reduced dramatically (it can well happen that the initial set of 100 variables defining each atomic peak reduces to only 3 or 4 PCs) and they will be uncorrelated. Therefore, once the PCs are extracted the regression coefficients (b, see eqn (6)) can be calculated. Finally, the analyte concentration for an unknown sample can be predicted using eqn (7), where x is the (mean centred) atomic peak measured for the unknown.
B = P × (TT × T)–1 × TT × c | (6) |
cunknown = x × b | (7) |
Unlike MOLSR, it is advantageous to use PCR when there are many fewer samples than variables in the regression problem. In the case where all PCs were retained in the model (which is very rarely the case), the predictions would be identical to those from MOLSR (at least in the case of more samples than variables). In some sense, it can be seen that the PCR model ‘converges’ to the MOLSR model as PCs are added16 (the more PCs, the less reduction in the number of variables and, so, the more similar the information used for both models).
Disappointingly, not all problems can be removed by applying PCR. The main issue is that although PCA do explain very well the spectral atomic peaks, it is not mandatory that the first PCs are useful to describe the concentration of the analyte. For instance, if we have a severe spectral artifact or an incomplete/interfered atomization on several samples, it may appear on the first PCs, but it will not be very useful to predict the analyte. What if a minor PC happens to be really important for predicting the concentration? In general, the first problem is not too risky as one will only have to increase the complexity of the model (i.e., add more PCs to the PCR model) to get good predictions, but the second problem is more complex because, in general, the latest PCs are discarded as they are supposed to contain only noise, hence the really useful information becomes lost unexpectedly.20 A classical and reader-oriented paper from Sutter et al.21 deals with the selection of the proper number of components. More recently, Meloun et al.5 have reviewed the generalized principal component regression method that may be a potential solution to considering only relevant PCs for predicting the concentrations of analyte of interest. Further, Martínez et al.13 presented a PCR method where the uncertainties in the reference concentration and in molecular spectroscopic measurements were taken into account.
The idea underlying PLSR is simple once PCR has been discussed, although the correct mathematical formulation is far more complicated. It turned out that PCR might either introduce “unuseful” components to evaluate the concentration of interest or let important ones out of the model. To avoid this, why not to calculate some new components (instead of the classical PCs) with the requisite that they are maximally correlated to the concentration of the analyte? In this way, we can simultaneously extract not only the most important information of the atomic peaks but the information which is really useful to get good predictions on the unknowns. This is done in an iterative process which has been described in detail many times (see, for example, McLennan and Kowalski,29 Brereton6 or Woldet al.19 for a very simple approach, or any of the textbooks cited throughout this work). Here, it was decided not to develop a detailed formulation for the sake of simplicity, just to present the final equation for the regression coefficients (eqn (8)) and how to predict the analyte concentration on the unknown sample (eqn (9)).
B = W × (PT × W)–1 × (TT × T)–1 × TT | (8) |
cunknown = x × b | (9) |
One very important fact in PLS is that it supports errors both in the X- and Y-blocks.6 Besides, PLSR models lead to relatively simple graphs which may yield an enormous amount of visual information that, many times, can be interpreted chemically (this is not always fully exploited). Also, anomalous samples, curved distribution of samples, etc., can be visualized. Some examples are presented next.
A first critical step in any model development is to check for outlier samples since they can strongly bias the model. Many efforts should be devoted to assessing this issue. In general, a preliminary model is obtained and visual inspection of some typical plots and statistics diagnostics is carried out. A recent review details many statistics5 and they will not be repeated here (despite the fact that they were defined for MOLSR, they are of general applicability). Here, we present some useful and typical plots which can be obtained from most software packages. From the PLSR calculations, the “scores” (T matrix) contain the information about the samples, their similarities and how they distribute. It is straightforward, accordingly, to visualize if any of the samples separate. Fig. 2 reveals that samples #13 and #20 present a quite different behaviour in the factor 1–factor 2 (t1 versust2) and factor 1–factor 3 (t1 versust3) subspaces, respectively. This means that their most prominent spectral features do not agree totally with the other standards. In principle, samples #13 and #20 might be considered for elimination from the calibration set, and the model repeated.
![]() | ||
Fig. 2 PLS scores plots to evaluate the presence of outlier samples in the X-block. |
Fig. 3 represents how the scores (samples) relate to the concentration of the analyte. This is called a “tversusu” plot (scores of the X-block versus scores of the Y-block) and, for each factor, it represents how linear (in a straight line sense) such relation is. It allows us to determine whether the corresponding factor is modelling linear effects or whether a non-linear phenomenon is being included into the model. In the example, it can be seen that the two first factors hold good linearity and that all samples follow a similar pattern (otherwise, the sample(s) falling apart should be considered with some more detail to assess whether there were outliers). As expected, the first factor (by definition the most important one) shows the nicest “regression appearance”.
![]() | ||
Fig. 3 PLS “t versus u” plots to evaluate whether linear effects are being modelled by each factor. |
Another very useful plot (not shown here because it is broadly used) is that involving plotting the actual (real) concentrations of the calibration (testing) set against those predicted for the same samples by the model under consideration. Departure from the “theoretical fit” (i.e., 45° line, slope = 1) points a probable biased regression. In addition, samples far from the regression line can be identified (they may be anomalous either in the X-block or in the actual concentration value). More examples and explanations can be found elsewhere.6,18,19
Two diagnostic statistics are commonly plotted against each other to further evaluate the samples, i.e., the “leverage” and the “studentized residuals”. They are nicely defined elsewhere5,29 and we will discuss only their typical plot (see Fig. 4). The studentized residuals are the scaled residuals (predicted concentration–actual concentration) so that they have unity variance. Following, a sample with a high studentized residual means that it has not properly been predicted by the model. The sample leverage is a measure of the influence that a sample has on the model. Samples that have unique values of the predictor variables tend to have high leverage on the model. Fig. 4 highlights three areas where the different behaviour of a sample can be attributed to a known reason (being an anomalous sample on the X- or Y-blocks). Influential samples like #10 have been modelled well. It behaves as an “influential” sample because, commonly, it corresponds to the lowest/highest concentration value of the training set. In the absence of other diagnostics pointing out an anomalous behaviour it should not be considered as an outlier.
![]() | ||
Fig. 4 PLS plot of the sample studentized residuals versus sample leverage. |
Fig. 5 can be used for the chemical interpretation of the PLSR model. The upper line (a) shows an atomic peak from an aqueous standard containing the analyte of interest and several potential concomitants, measured by ETAAS, and registered using 110 times. The arrows point out two smooth spectral “shoulders” (which suggest that other phenomena than the “pure” atomization peak are occurring). Lines (b) and (c) represent the PLSR “loadings” for factors 1 and 3, respectively (those corresponding to the 2nd factor could not be interpreted). Line (b) is, clearly, an “average” atomic peak and this is almost always observed.19 It represents the influence of the size of the overall measured peak on the concentration. Line (c) is more complex because it gives positive and relatively large loadings for those measuring times (predictors/independent variables) associated with the spectral shoulders. Hence, it seems that the 3rd factor is modelling any physical phenomena giving rise to those underlying, totally overlapped, peaks affecting the main atomization peak of the analyte. Finally, line (d) depicts which variables have the overall largest regression coefficients. They show that not only are the predictors defining the main peak useful to predict the analyte concentration, but the much smaller and almost hidden side-peaks. An acceptable interpretation of this fact may be that the frontal peak is related to easily volatilizable chemical species where the analyte is bounded. The last peak points towards more refractory species containing part of the analyte.
![]() | ||
Fig. 5 Interpretation of the PLS model can be made comparing the original spectra (a), loadings of the first factors (b) and (c), and the regression coefficients (d). |
For the sake of simplicity and brevity, selection of the proper number of PLS factors (components or latent variables) was deliberately omitted from the previous discussions. This critical issue still has not a universally-accepted way of being determined. Most reviews cited here explain several approaches, the most common one being cross-validation,6,17,19,20,29 whereby some samples are left out of the calibration set, the model obtained and the left-out samples predicted. The process is repeated a number of times considering different subsets of samples until all of them have been left-out. The number of factors is that yielding the lowest errors on the predictions. Several diagnostic plots can be obtained and all of them should be studied in much detail. Unfortunately this approach—despite it is being broadly applied—is not perfect and may lead to a wrong number of components and some other ways should be studied, mainly when experimental designs cannot be deployed because of the types of samples being studied. Just as an example, classical cross-validation was compared with several recent approaches in molecular spectrometry.30
To summarise, PLSR is a method for constructing predictive models when the underlying factors are many and highly collinear. It is worth noting that the emphasis is on predicting the concentrations and not necessarily on understanding the underlying relationship between the variables, despite the fact that many times this is also possible. A final important advantage is that PLSR is capable of yielding good predictive models even when the underlying system is not linear, which can be frequent in spectroscopic measurements. This fact gives rise to another important point since the good properties of PLSR models can reduce the amount of work to be carried out in the laboratory. These issues will be considered further in the next sections where different published papers are reviewed.
ANN is a relatively loose term referring to computation models which have some kind of “distributed architecture”. This means that instead of a unique equation or systems of equations the ANN consist of highly interconnected “processing nodes” (in general termed neurons, analogous to biological neurons) to which multiple connections are linked (analogous to biological dentrites and axons).31Fig. 6 shows a typical ANN scheme where a set of neurons receive the chemical signals from the external world (input layer) and transfer them either to a final set of neurons (output layer) or to one or several intermediate layers which process the signals (hidden layers) and, finally, transfer their final result to the output layer. All layers and neurons are fully interconnected by the so-called weights (dentrites in real cells), which constitute the mathematical model itself.
![]() | ||
Fig. 6 Scheme of (a) an artificial neuron compared with a biological neuron and of (b) an ANN. |
The graphical arrangement of processing nodes and connections is termed the “architecture” of the ANN and it has to be optimised for each problem at hand, which is not trivial. As a general rule, the input layer is easily set by the chemist since as many neurons as predictor spectral variables are fixed. The output layer is also fixed in advance since, currently, a node is considered for regression problems. The connections or dentrites not only conduct the “signals” (numerical values) they receive from one/several neuron/s towards other neuron/s but distribute them by the weights. All these weights are optimised when the overall ANN is optimising itself.
What surprises in the ANN is that the neurons perform very simple calculations (in some cases they only fix a value, either 0 or 1). As for the brain, the complex regression model is composed of all weights and simple processing units which “organize themselves” to get good results. Each neuron decides first if the input signal (numerical value) is high enough to activate it and, if positive, transforms such an input to another numerical value (e.g., using the slope of a linear function) and, finally, transfers the new result to another neuron.
In simple terms each node i uses a transfer function fi of the form (a profound discussion is beyond the scope of this paper, and the readers are encouraged to review one of the most famous books on ANN in chemometrics32):
![]() | (10) |
There are numerous types of ANN to address many types of chemical problems, such as pattern recognition, sample classification, multivariate regression, etc.34–37 In any case, the general (important) rule is that three sample sets are required to properly develop an ANN: a calibration set, an external testing (validation) set and the set of true unknowns. The first set is obviously used to optimise (calibrate, train or learn) the architecture of the net: the number of hidden layers/nodes and the numerical values of the weights. The testing set (new, different samples with known values for the property of interest) is needed to verify that the ANN did not memorize the calibration samples. This is called over-fitting or over-training and, unfortunately, occurs quite easily. Over-training leads to a final model which is so focused on the calibration samples—it yields almost perfect predictions for them—that new samples are predicted wrongly. Therefore a compromise has to be reached between training and prediction (which should be potentiated). The testing set is constituted by new samples, uncorrelated to the previous ones, representing current samples to be measured but whose concentrations are already known. This set should never be used to optimise the ANN, just to test (verify or validate) it. Only after the external validation set has been satisfactorily predicted can the ANN be used to predict unknown samples. Of course, great care is required to get good calibration and testing and validation samples (absence of anomalous samples, bracketing of the working range, representativeness, etc.), but this is the case for any regression model.
The most common way to train the ANN is “backpropagation” (BPN). This is an algorithm conceived by Rumelhart et al.38 to inform the net how wrong its predictions are after a model was developed using the calibration data. Most likely, the ANN will not predict properly after only considering the calibration data for the first time. Therefore, backpropagation causes the ANN model to be reviewed automatically and the weights updated to get a lower overall final prediction error. The algorithm entails: running the training set through the ANN; calculating the error incurred by the difference in actual and target network outputs; propagating the errors back, in the same manner as the signal propagation but in the opposite direction, i.e., from the output to the input layer, hence the term backpropagation; and, finally, re-adjusting all the weights according to the error (each weight is modified in proportion to its value). This process is repeated iteratively until the average (or total) error across the training set is below a specified maximum. Each such iteration is known as an epoch. The main advantages of the ANNs are:
(i) there is no need to assume an underlying data distribution, as it is usually the case in parametric modelling (methodologies described in the previous sections);
(ii) they are applicable to multivariate linear and non-linear problems (sometimes the latter are difficult to handle with other approaches);
(iii) high tolerance to noise;
(iv) ability to classify patterns on which they have not been trained previously;
(v) high adaptability, which means that they auto-organize themselves to learn how to get some outputs from a given set of inputs;
(vi) no specific formal equations have to be defined for each problem.
Nevertheless, they also present several disadvantages that difficult their application, among them:
(i) they are so good at learning that overfitting is a common problem and avoiding it requires a great number of tests (where a good validation sample set is required);
(ii) in general, the relationship between the predictors and the concentration cannot be interpreted chemically so that the model tends to be a “black-box” approach without a clear meaning;
(iii) the sample size has to be large, and three sets are needed to develop a proper model (calibration, testing and validation sets)—sometimes this increases workload;
(iv) ANN optimization requires changing the number of neurons in the hidden layers, the number of hidden layers, several parameters such as the learning rate (to avoid the net to be stacked on a local minimum), sometimes the transfer function, etc.—this makes optimization a laborious and iterative process where many tests have to be carried out before a satisfactory net is obtained;
(v) ANN have many advantages in clear non-linear systems but many times their behaviour cannot outperform that of typical linear methods (e.g.PLSR39) when the problem is linear (or even when there are only minor non-linearities).
Grotti et al.43 combined MOLSR and experimental designs to correct for the (simultaneous) effects of Na, K, Mg and Ca nitrates on Mn atomisation from sea-water by ETAAS. The same authors extended their previous work by comparing MOLSR models with other classical alternatives such as matrix-matched standards and standard additions, which systematically yielded excess quantitations.44 They drew conclusions from the different cross-product terms included in the regression models (e.g. an interaction between Mg and Ca). They addressed similar problems with Te45 and the complex interferences of Cl–, Ca2+, Na+, K+ and Mg2+ on Pb, Cd, Ni, Cr and Mn ETAAS determinations.46
Despite MOLSR models being helpful and allowing much insight into the atomic measurements, several major experimental pitfalls can be underlined. First, as MOLSR needs to model all possible spectral and matrix effects47 each standard and unknown sample must be “matched” so that the matrix composition is approximately the same.
Besides, in order to strictly apply the MOLSR equations the concentration of the concomitants in the unknowns should be known, which is not always the case, and this would imply too much analytical work. Nevertheless, when multi-elemental techniques can be used, such as ICP, this problem can be alleviated.48
Some pioneering authors in ANNs on chemometrics49 diagnosed calibration problems related to the use of AAS spectral lines. As they focused on classifying potential calibration lines, Kohonen neural networks were used instead of typical error back-propagation feed forward ANNs (BPN). Many times Kohonen nets are best suited to performing classification tasks, whereas BPN are preferred for calibration purposes, as explained in Section 5.
Despite PCR solving many of the statistical problems MOLSR possesses, it has scarcely been applied in atomic spectrometry. Some papers applied it as “another” option among several and, mainly, as a matter of comparison.42,50,51
PLSR (partial least squares regression) seems by far the most popular multivariate regression tool employed by atomic spectroscopists. Flores et al.52 employed PLSR to quantify Cd in marine and river sediments measured by direct solid sampling flame atomic absorption spectrometry.
There are three pioneering works that, in our opinion, are a must for interested readers. Baxter and Ǒhman53 discussed the fact that background correction in ETAAS has several problems and, so, incompletely corrects the interferences, and that such a problem can be overcome by PLSR. In order to do so, they also applied the generalised standard addition method (GSAM) to develop the calibration set and, then, the final model. Among the different studies, they combined signal averaging of the replicates and signal smoothing before implementing the PLSR model.
ETAAS was also combined with PLSR by Baxter et al.54 to determine As in marine sediments. In this work, they demonstrated that classical standard additions methods do not correct for spectral interferences (their main problem) because of mutual interactions between the two analytes of interest (As and Al). PLSR-2 block was applied to quantify simultaneously both elements (PLSR-2 block is a PLSR procedure that predict several analytes simultaneously; it is not widely applied, except when the analytes are correlated).
Quite surprisingly, ETAAS has not attracted much attention with regard to PLSR. Except for the Baxter et al. paper53,54 it was until Felipe-Sotelo et al.55 that other work was found. These authors considered a problem where a major element (Fe) caused spectral and chemical interferences on a minor one (Cr), which had to be quantified in natural waters. They demonstrated that linear PLSR handled (eventually) non-linearities since polynomial PLSR and locally weighted regression (non-linear models) did not outperform its results. Furthermore, it was found that linear PLSR was able to model three typical effects which currently occur in ETAAS: peak-shift, peak enhancement (depletion) and random noise.
More complex is the situation where several major concomitants affect the atomic signal of the trace element(s) to be measured (whose concentrations are several orders of magnitude lower). PLSR was very recently demonstrated to give good results56 when a proper experimental design was developed to quantify Sb in water samples by ETAAS. Instead of the traditional approach, where only an experimental design is deployed to establish the calibration (and validation) set, a saturated experimental design considering the concomitants was deployed at each of the concentration levels considered for the analyte. Once more, polynomial PLSR performed worse than or equal to linear PLSR, demonstrating that linear models are good enough. Further, multivariate figures-of-merit were calculated following IUPAC and ISO guidelines and, also, bracketing both the α- and β-errors to 5%. In these two later papers the authors studied graphically the factor loadings and regression coefficients to gain insight on how the models behave.
Very recently, Felipe-Sotelo et al.39 modelled complex interfering effects on Sb when soil, sediments and fly ash samples were analysed by slurry sampling-ETAAS (SS-ETAAS). Sometimes, spectral and chemical interferences cannot be totally solved on slurries using chemical modifiers, ashing programs, etc., because of the absence of a sample pre-treatment step to eliminate/reduce the sample matrix. Hence, the molecular absorption signal is so high and structured that background correctors cannot be totally effective. In addition, alternative wavelengths may not be a solution due to their low sensitivity when trace levels are measured. In order to circumvent all these problems, the authors employed current PLSR, second-order polynomial PLSR and ANNs to develop predictive models on experimentally-designed calibration sets. Validation with five CRMs showed that the main limitations of the models were related to the SS-ETAAS technique, i.e., the mass/volume ratio and the low content of analyte of some solid matrices (which forced the introduction of too much sample matrix into the atomizer). Both PLSR and ANN gave good results since they could handle severe problems like peak displacement, peak enhancement/depletion and peak tailing. Nevertheless, PLSR was preferred because the loadings and correlation coefficients could be interpreted chemically. Moreover, this approach allowed a reduction on the laboratory workload to optimise the analytical procedure by around 50%.
In addition to the work from Felipe-Sotelo et al.39 in which they employed the complete atomic peaks, other authors applied ANN as multivariate regression tools as well, although without using the overall information in the peaks.
Hernández-Caraballo et al.57 applied BPNs and aqueous standards to develop a calibration model capable of enlarging the (typically) short linear dynamic range of Cd curves determined by ETAAS. They employed a univariate approach since only those absorbances at the maxima of the atomic peaks were regressed against Cd concentrations. Unfortunately, they did not apply the model to real samples and they did not consider that the atomic maxima could appear at slightly different times. In a later work58 they compared BPNs to other mathematical functions, with good results for the BPNs.
Several papers can be traced back to the early 1990s: Glick et al.60 compensated for spectral and stray light interferences in a ICP-OES-photodiode array spectrometer; Ivaldi et al.61 extracted ICP-OES information using least squares regression; Danzer et al. used PCR62 and multi-line PCR and PLS63 on ICP-OES spectra. Two other applications are those from Van Veen et al.,64 applying Kalman filtering to ICP-OES spectra; the same authors reviewed several procedures to perform background correction and multi-component analysis.65 Sadler and Littlejohn66 applied PLS to detect uncorrected additive interferences. Venth et al.67 compared PLSR and canonical correlation analysis to solve isobaric and polyatomic ion interferences in Mo–Zr alloys measured by ICP-MS. Pimentel et al.68 applied PLS and PCR to simultaneously measure five metals (Mn, Mo, Cr, Ni and Fe) in steel samples, using a low-resolution ICP with diode array detection.
Rupprecht and Probst69 corrected ICP-MS spectral and non-spectral interferences by different multivariate regression methods. They studied MOLSR, PCR and PLSR and compared them with OLSR. Further, they tested different data pre-treatments, namely, mean centring, autoscaling, scaling 0 to 1 and internal standardization. The best model was developed using PLSR, mean centred data (also internal standard would be fine) and variable selection according to the regression coefficients (studied in a previous model). A method for the quantification of spectral interferences in ICP-MS, based on empirical modelling and experimental design, was developed by Grotti et al.70 They used MOLSR to deduce the relationship between interfering effects and matrix composition. They applied a similar approach to study the interferences caused by complex matrices containing Na, K, Ca, Al and Fe47 on ICP-AES measurements.
Moberg et al.71 used ICP-MS to determine Cd in fly ash and metal allows; severe spectral overlaps arose and multivariate regression outperformed other univariate approaches. They also studied whether day-to-day ICP-MS recalibration could be avoided so that the calibration set could be constructed on several runs.
Haaland et al.72 developed a so-called multi-window classical least squares for ICP-AES measurements (CCD, charge-coupled device, detector arrays). Essentially, it consisted in performing a classical least squares regression in each of the spectral windows which were measured and combining the concentration predictions (for a given analyte). The methodology was compared with PLSR and it proved superior and capable of handling interferences from several concomitants.
Griffiths et al.73 quantified Pt, Pd and Rh in autocatalyst digests by ICP (CCD detector array). They compared univariate techniques (pure standards, pure standards with inter-element correction factors and matrix matched standards) and PLSR, the latter being superior in general although less effective at low concentrations due to spectral noise. They also studied the effect of using the gross signal or background corrected ones, being more successful the former option.
In a following work, Griffiths et al.74 studied how to reduce the ICP-AES (segmented-array charge-coupled device detector) raw variables (5684 wavelenghts per spectrum). This application holds many similarities with classical molecular spectrometry from where they selected two advanced algorithms, applied in three steps: (i) application of an uninformative variable elimination PLSR algorithm (UVE-PLSR), which identifies variables with close-to-zero regression coefficients; (ii) application of an informative variable degradation-PLSR, which ranked variables using a ratio calculated as the regression coefficient divided by its estimated standard error; and (iii) selection of the variables according to that ratio. Interestingly, they must autoscale data instead of the more frequent mean centring pre-treatment.
Kola et al.48 corrected spectral interferences on ICP measurements of S when Ca was present. They discussed that linear terms (each associated with a given wavelength) in the MOLSR equation and cross-product terms would account for matrix effects. In order to develop the regression equation they had to restrict the working range to concentration levels where correlation between S an Ca was negligible (a typical problem of MOLSR, as discussed in the corresponding section).
Other multivariate methods were applied to ICP spectra for quantitative measurements. As an example, they include: multi-component spectral fitting (which is incorporated in several commercial instrument softwares);61 matrix projection, which avoids measurement of background species;75,76 generalised standard addition77 and Bayesian analysis.78
Two other papers presented routine LIBS applications, without too many details on the multivariate models.81,82
Martín et al.83 presented a recent work where LIBS was employed for the first time for wood-based materials where preservatives containing metals had to be determined. They applied PLSR-1 block and PLSR-2 block (because of the interdependence of the analytes) on multiplicative scattered-corrected data (a data pre-treatment option of most use when diffuse radiation is employed to get spectra). These authors studied the loadings of a PCA decomposition to identify the main chemical features that grouped samples. Unfortunately, they did not extend this study to the PLSR factors. However, they analysed the regression coefficients to determine the most important variables for some predictive models.
a | Denote a number (scalar) |
a | Bold font denotes a vector |
A | Capital letter, bold font, denotes a matrix |
E, e | Matrix and vector of random errors |
C | The matrix of concentrations with k rows (= number of standards) and n columns (= absorbing species, i = 1, 2,…, n) |
T | The scores matrix, with k rows (standards) and z columns (factors) |
P | The loadings matrix, with z rows (factors) and j columns (variables, measuring times) |
S | The matrix of the pure unit-concentration spectra or “regression coefficients” with n rows (= pure spectral atomic peak for each absorbing species, i = 1, 2,…, n) and j columns ( measuring times) |
X, X-block | In PLS, the matrix of measured atomic peaks for all standards (analyte/s plus concomitant/s) (k rows and j columns) |
Y , y, Y-block | In PLS, the matrix (or vector) with the concentrations of the analyte(s) for each standard (k rows) |
W | In PLS, vector of weights |
ANN | Artificial neural networks |
BPNs | Backpropagation neural networks |
CCD | Charge-couple device detectors |
ETAAS | Electrothermal atomic absorption spectrometry |
FAAS | Flame atomic absorption spectrometry |
GSAM | Generalised standard addition method |
HG-AAS | Hydride generation atomic absorption spectrometry |
ICP | Inductively coupled plasma |
ICP-AES | Inductively coupled plasma atomic emission spectrometry |
ICP-OES | Inductively coupled plasma optical emission spectrometry |
ICP-MS | Inductively coupled plasma mass spectrometry |
LIBS | Laser-induced breakdown spectrometry |
MOLSR | Multiple ordinary linear least squares regression |
OLSR | Ordinary least squares regression |
PC | Principal component |
PCA | Principal components analysis |
PCR | Principal components regression |
PLSR | Partial least squares regression |
SS-ETAAS | Slurry-sampling-electrothermal atomic absorption spectrometry |
UVE-PLSR | Uninformative variable elimination partial least squares regression algorithm |
This journal is © The Royal Society of Chemistry 2008 |