Saliha
Şahin
,
Esra
Sarıburun
and
Cevdet
Demir
*
University of Uludag, Faculty of Science and Arts, Deparment of Chemistry, 16059, Bursa, Turkey. E-mail: cevdet@uludag.edu.tr; Fax: +90-224-2941899; Tel: +90-224-2941727
First published on 14th October 2009
The multivariate calibration methods—moving window selection partial least squares regression (MWPLSR) and net analyte signal (NAS)—were employed for simultaneous determination of a mixture of C.I. Disperse Blue 183, C.I. Disperse Blue 79, C.I. Disperse Red 82, C.I. Disperse Red 65, C.I. Disperse Yellow 211 and C.I. Disperse Orange 25 by UV-vis spectrophotometry. The absorption spectra of the six disperse dyes were recorded between 320 and 680 nm. A modified changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) were proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by MWPLSR. Different wavelength regions were selected by taking into account different spectral parameters including the starting wavelength, the ending wavelength and wavelength interval. It was found that wavelength selection improved the performance of the corresponding net analyte signal-partial least squares (NAS-PLS) model, in terms of root mean square error (RMSE), compared with the results obtained using whole spectra or direct combination of informative regions for each dye. The importance of calibration design was also investigated by calculating the prediction and validation errors. The influence of using independent validation sets were emphasized. The proposed calibration method gave better results in combination and informative spectral regions for determination of the six disperse dyes without prior separation.
Disperse azodyes have been continuously used in the textile industry.6 These dyes can be applied to synthetic fibres such as polyester, nylon, acetate, cellulose and acrylic.7 The concentration of disperse dyes could be in the µg/L level in waste water.2 Therefore, a pre-concentration step will be necessary for better detection and quantification limits of disperse dyes.
Recently, determination of dyes in waste water has been performed successfully by high performance liquid chromatography (HPLC), liquid chromatography and mass spectrometry (LC-MS), capillary electrophoresis (CE), and gas chromatography and mass spectrometry (GC-MS).2 However, chromatographic determination of dyes in a mixture takes much more time and also a prior separation is needed because of spectral and chromatographic overlapping with matrix components. Therefore, UV-vis spectrophotometric determination is preferred to chromatographic techniques since it is possible to obtain high accuracy and reproducibility in complex matrices.
Multivariate calibration methods such as principal component regression (PCR) and partial least squares (PLS) have been applied to overlapping spectra and chromatograms successfully.8–11 These methods offer an advantage of speed in the determination of components of matrices, because sample preparation is eliminated or minimized and a preliminary separation step in complex matrices is avoided.12,13 PLS and PCR cover a full spectral region for calculating a calibration model and the use of the whole spectral region does not yield optimal results. Thus, a wavelength selection method is still important and necessary for quantifying highly complicated samples. A new method of spectral interval selection called moving window partial least squares regression (MWPLSR) has been proposed for solving problems to improve quality of model.14–16 The advantage of applying MWPLSR is to search for informative spectral regions for the multi-component overlapped spectral analysis. MWPLSR develops PLS calibration models in every window that moves over the whole spectral region and then informative regions, in terms of the least complexity of PLS models reaching the calculated lowest sum of residuals, are located. Although MWPLSR is a powerful method in selecting informative regions, each informative region obtained by MWPLSR does not supply the best predictive results and these regions may be unsatisfactory for obtaining the optimum results. When complicated samples such as environmental matrices were analyzed, one informative region may contain several other regions because of the significant interferences. A combination of informative regions can be used to overcome interference problems to collect more useful information from the spectra for improving the prediction ability of a PLS model. Each informative region is optimized with the combination of separate best windows in the whole spectral region. Searching for an optimized sub-region for each selected informative region and the optimized combination of informative regions by changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) methods have been applied in literature.14,17,18 The CSMWPLS procedure changes the window size and moves the window over a selected informative region with each window size. The SCMWPLS aims at looking for an optimized combination of informative regions by performing the CSMWPLS procedure for every informative region step by step.
Recently, comparative studies about advantages and limitations of net-analyte signal (NAS) based methods and PLS calibration in mixture analysis have been performed.19,20 The use of signal filtering algorithms such as NAS may help simplify calibration models and construct models with an adequate predictive ability. NAS calibration method, previously described by Lorber,21 has been used for the reduction of noise (i.e., to isolate the analyte signal) and describing the part of a spectrum that the model relates to the predicted quantity. A part of mixture spectra is directly related to the concentration of analyte. The NAS vector was calculated and used in the corresponding PLS model to predict the unknown concentration for informative and combination spectral regions in our data.
In the present work, MWPLSR and NAS multivariate calibration methods were applied to the simultaneous determination of C.I. Disperse Blue 183, C.I. Disperse Blue 79, C.I. Disperse Red 82, C.I. Disperse Red 65, C.I. Disperse Yellow 211, and C.I. Disperse Orange 25 by UV-vis spectrophotometry. The absorption spectra of the six disperse dyes were recorded between 320 and 680 nm and the best informative wavelength regions were selected by MWPLSR for each dye separately. A modified changeable size moving window and searching combination moving window wavelength selection strategies were employed to enhance the predictions of multivariate calibration methods, and to investigate the effect of wavelength selection on the performance of the NAS-PLS method. Root means square errors were calculated for each dye as comparison criteria. To see how well the calibration set predicts the concentration of six dyes; two independent validation sets were generated. It was found out that NAS-PLS, MWPLSR and the known validation set results were compatible. The results also demonstrate that MWPLSR and NAS multivariate calibration methods can be applied successfully to a highly complex mixture of samples.
PLS is an extension of the multiple linear regression model. In its simplest form, a linear model specifies the relationship between the spectra in the window (X), and the concentrations of an analyte (c), so that;
c = Xb + e | (1) |
In this study, an informative spectral window starting at the ith spectral channel and ending at the (i + h− 1)th spectral channel was constructed. The fixed window size, h, is selected as 7 through the spectral region (Fig. 1). The window was moved over the whole spectral region between 320 and 680 nm. For every window, a PLS model with a selected PLS component using cross-validation was constructed and the model was evaluated by the root mean square error (RMSE). Through this process, the informative regions having peak-like shapes with a low value of the RMSE can easily be found.
![]() | ||
Fig. 1 Scheme for explanation of MWPLSR. |
![]() | ||
Fig. 2 Scheme for explanation of SCMWPLS. |
In SCMWPLS, the region with the smallest RMSE was always selected as the base-region. A rational base-region selected should construct such a PLS model that the RMSE of the model is expected to reach an acceptable error level with a relatively small number of PLS components. Therefore, a maximum number of PLS component is constrained in this algorithm to avoid selecting the smallest RMSE with a relatively high number of PLS components; i.e., the selected number of PLS components by cross validation must not be larger than the maximum number of PLS components. The number of PLS components was determined to be the number where the RMSE begins to decrease insignificantly with the increase of PLS components. This number of PLS components was considered to be the maximum PLS component number.
r* = [I − XT−k (XT−k) + ]r | (2) |
Different algorithms have been proposed for NAS calculations.13,23–31 In this work, we used the NAS algorithm in order to obtain the NAS for multivariate calibration. The methodology for NAS calculation proposed was depicted in Fig. 3. In the proposed procedure, first matrix XA was calculated by reconstructing from A principal components with standardizing of the original data. Second, the NAS vectors were calculated according to Lorber et al.21 Then the calculated NAS vectors and concentration values (Y) were used for the PLS calibration. The spectra for the prediction samples were centered by subtracting the average spectrum for the calibration samples and reconstructed from A principal components. Later, the NAS vectors of prediction samples were calculated and used in the PLS model for prediction of unknown concentration.
![]() | ||
Fig. 3 Scheme of the methodology for NAS calculation. |
![]() | ||
Fig. 4 UV-vis spectra of six disperse dyes (A; C.I. Disperse Blue 183 (10 mg L−1), B; C.I. Disperse Blue 79 (10 mg L−1), C; C.I. Disperse Red 82 (5 mg L−1), D; C.I. Disperse Red 65 (5 mg L−1), E; C.I. Disperse Orange 25 (10 mg L−1) and F; C.I. Disperse Yellow 211 (10 mg L−1). |
The informative regions by the first step obtained by MWPLSR for C.I. Disperse Blue 183, C.I. Disperse Blue 79, C.I. Disperse Red 82, C.I. Disperse Red 65, C.I. Disperse Yellow 211 and C.I. Disperse Orange 25 are shown in Fig. 5–6. Informative regions in the 455–468, 470–484 and 546–567 nm for C.I. Disperse Blue 183; 425–450, 542–557, 576–609 and 618–649 nm for C.I. Disperse Blue 79; 539–587 nm for C.I. Disperse Red 82; 491–515, 534–568 and 573–585 nm for C.I. Disperse Red 65; 320–436 and 431–451 nm for C.I. Disperse Orange 25; 494–517 nm for C.I. Disperse Yellow 211 regions can be provided by MWPLSR. It is clearly shown in Fig. 5–6 that informative regions obtained well as in C.I. Disperse Blue 79 are in accordance with absorption bands of each of the disperse dyes. The minimum RMSE values were obtained at the maximum absorbance regions of C.I. Disperse Blue 79. This indicates that informative regions have no interference by other dyes in the mixtures. A clear informative region of 576–609 nm (Fig. 5b) was observed for C.I. Disperse Blue 79, which can easily be attributed to the absorption band in the same region. The absorption band in the visible region located at 580 nm is due to the azo linkage of C.I. Disperse Blue 79.
![]() | ||
Fig. 5 Selection of informative regions obtained by the first step of MWPLSR for (a) C.I. Disperse Blue 183, (b) C.I. Disperse Blue 79, (c) C.I. Disperse Red 82. |
![]() | ||
Fig. 6 Selection of informative regions obtained by the first step of MWPLSR for (a) C.I. Disperse Red 65, (b) C.I. Orange 25, (c) C.I. Disperse Yellow 211. |
C.I. Disperse Blue 183 has three optimum informative regions suggested by CSMWPLS, one direct combination of these regions selected by SCMWPLS and a whole region as listed in Table 1 (See ESI † ). It is clear that the best individual region with the lowest RMSE values are 0.095, 0.601 and 0.567 for calibration and validation sets respectively with six PLS components located in the combination region. The optimized combination improves the prediction results by using the SCMWPLS. In spectroscopic data, it is expected to get the same components as compounds present in the mixture in the case of non-highly overlapped spectra in reality when standards are used during the calibration step. The results confirm that the calibration was well modeled by the number of components selected during the validation. The optimum informative region in the 546–567 nm shows higher RMSE error than the other two regions in the calibration set with three PLS components due to the higher interference of C.I. Disperse Blue 79. However, better validation errors were obtained in this region and this indicates that the performance of the prediction was better even in the case of more overlapped spectra.
For the second disperse dye, C.I. Disperse Blue 79 (Table 2, See ESI † ), four optimum informative regions and one combination region were found by CSMWPLS and SCMWPLS. The optimum informative regions in the 542–557 and 576–589 nm show the smallest RMSE errors than other optimum informative regions for calibration set, and better prediction was obtained in the 542–557 nm for two validation sets. The lowest error was obtained for the combination region selected by SCMWPLS. On the other hand the number of PLS components is three for all four optimum informative regions, but for combination regions the number of PLS components is six. It is clear that the informative region in the 576–589 nm and combination region in the 320–648 nm are the most optimum informative regions selected by CSMWPLS and SCMWPLS for C.I. Disperse Blue 79. On the contrary to our previous study,2 higher RMSE was obtained for C.I. Disperse Blue 79 by conventional PLS calibration due to the narrow spectral region used in this study. It is possible that information was spread on the whole spectral range and a variable selection per interval could automatically reduce the information and induce an increase of RMSE compared with full-spectrum PLS.34
C.I. Disperse Red 82, C.I. Disperse Orange 25 and C.I. Disperse Yellow 211 exhibit different behavior to other dyes in that they have only one optimum informative region by the second step obtained by SCMWPLS (Tables 3, 5, 6, see ESI † ), and also each optimum informative region of these disperse dyes can provide better prediction errors than the whole spectral region. The number of PLS components for C.I. Disperse Orange 25 and C.I. Disperse Yellow 211 are five for optimum informative region, but it is seven for C.I. Disperse Red 82. The reason for this might be that the spectral points are highly overlapped. C.I. Disperse Orange 25 has the lowest RMSE in the informative region of 431–451 nm which can be attributed to the absorption bands in the same region (Fig. 6b). C.I. Disperse Red 82 and C.I. Disperse Yellow 211 have maximum absorption bands in the 476–523 nm (Fig. 5c) and 413–448 nm (Fig. 6c) regions respectively. However, these compounds are highly overlapped in these regions, so that the corresponding information regions show high RMSE values.
C.I. Disperse Red 65 also shows three optimum informative regions and one combination region as in C.I. Disperse Blue 183. Optimum informative, combination regions and the number of PLS components were illustrated in Table 4 (See ESI † ). The combination region suggested by SCMWPLS is the most optimum informative region because the model, including this individual region, provides the smallest RMSE errors and the number of PLS components was higher than the three regions. Models including the two individual regions in the 491–498 and 534–546 nm ranges, respectively, show high RMSE errors due to the more overlapping points between these wavelengths. The informative region in the 573–585 nm range demonstrates better PLS model building since better calibration and validation errors were obtained in this region. On the other hand the informative region in the 491–515 nm range show the maximum absorption bands in the same region which has the second lowest RMSE (Fig. 6a). It is clear that SCMWPLS can decrease the prediction error of the PLS model significantly (Table 4, see ESI † ). If more than one informative region is available, a combination of regions may be more important.
The comparisons of prediction and validation results of six dyes in environmental mixtures clearly demonstrates the potential of SCMWPLS. All these SCMWPLS results, as also proved in literature,14,15 provide the best prediction results for the PLS calibrations of C.I. Disperse Blue 183, C.I. Disperse Blue 79, C.I. Disperse Red 82, C.I. Disperse Red 65, C.I. Disperse Yellow 211 and C.I. Disperse Orange 25 in highly overlapped spectra of mixtures.
As can be seen from Tables 1–6 (see ESI † ), the prediction results of validation set 1 provided by the model were very similar to validation set 2 as obtained in CSMWPLS. The RMSEs calculated by CSMWPLS in the optimum informative regions 455–468, 470–481 and 546–567 nm were smaller than the RMSEs calculated by NAS-PLS for validation sets, whereas the RMSE calculated by SCMWPLS in the region of 320–567 nm was higher than the RMSE calculated by NAS-PLS for calibration and validation sets (Table 1, see ESI † ). It is clear that the optimum region was obtained by SCMWPLS for C.I. Disperse Blue 183 for NAS-PLS. However, only the RMSE error calculated by NAS-PLS in the informative region of 425–445 nm was higher than the RMSEs calculated by CSMWPLS for validation sets (Table 2, see ESI † ). The RMSEs of the other optimum informative regions by CSMWPLS for C.I. Disperse Blue 79 were higher than the RMSEs calculated by NAS-PLS. The results are as expected when NAS pretreatment was applied to informative regions.
The RMSEs results obtained by NAS-PLS for C.I. Disperse Red 82, C.I. Disperse Orange 25, C.I. Disperse Yellow 211 in informative regions and C.I. Disperse Red 65 including combination region were smaller than the RMSEs by CSMWPLS (Table 3,5,6, see ESI † ). C.I. Disperse Red 65 only has combination region among these dyes (Table 4, see ESI † ). The number of PLS components is 7, 5, 5 and 5 for C.I. Disperse Red 82, C.I. Disperse Orange 25, C.I. Disperse Yellow 211 and C.I. Disperse Red 65, respectively. Smaller RMSE was obtained for C.I. Disperse Red 82, which has 7 PLS components.
As a result, the combination regions for C.I. Disperse Blue 183, C.I. Disperse Blue 79 and C.I. Disperse Red 65, the informative regions in the 539–582 nm for C.I. Disperse Red 82, 320–390 nm for C.I. Disperse Orange 25 and 494–517 nm for C.I. Disperse Yellow 211 were found to be the most optimum informative region by NAS-PLS. The prediction capability of the NAS-PLS was proven when compared with PLS prediction in the case of whole spectral region. In order to evaluate whether there are significant differences between the concentrations found for each dye and each calibration method, the F-test (at the 95% confidence level) was employed to compare the RMSE values. The results showed no significant (F0.95 < Fcrit) differences with any of the calibration methods for C.I. Disperse Blue 183 and C.I. Disperse Red 65 determination. However, NAS-PLS method was much better for predicting the concentrations of C.I. Disperse Blue 183 and C.I. Disperse Red 65 in the calibration and validation sets. For the rest of the dyes, there were significant differences between NAS-PLS and conventional PLS calibration methods. Since the NAS-PLS method gave the lowest RMSE values, this was the method that adopted for predicting dye concentrations in real samples.
Experimental design and the nature of the validation sets play an important role when assessing the quality of calibration models. Two experimental design sets with r12 = 0.0 and r12 = 1.0 in validation of the PLS model gave similar low errors as 0.429 and 0.414, respectively. It was shown that any of the validation sets can be used to see how well the calibration set predicts the concentrations of each of the compounds in the mixture. Comparing the results obtained by using the whole spectral region, optimum informative and combination regions for compounds, NAS improves the prediction ability in terms of corresponding PLS calibration. The PLS models yield better prediction results by using the NAS pretreated spectra in the combination region rather then informative and whole spectral regions. The present study has demonstrated that SCMWPLS can select optimum combination of informative regions successfully, even for highly overlapped spectra mixtures and NAS-PLS can improve the performance of PLS calibration models for quantitative determination of components in complicated environmental samples.
Footnote |
† Electronic supplementary information (ESI) available: Tables 1–6 show the selected PLS components and optimum RMSEs of the predictions by PLS calibration methods for a calibration set and two validation sets for each of the six dyes. Table 7 summarizes the calibration results. See DOI: 10.1039/b9ay00009g |
This journal is © The Royal Society of Chemistry 2009 |