Gang Liab,
YongShun Luoac,
Zhe Liab,
ZeYun Liab and
Ling Lin*a
aState Key Laboratory of Precision Measurement Technology and Instruments, Tianjin University, Tianjin 300072, China. E-mail: linling@tju.edu.cn
bTianjin Key Laboratory of Biomedical Detecting Techniques & Instruments, Tianjin University, Tianjin 300072, China
cCollege of Mechanical and Electronic Engineering, Guangdong Polytechnic Normal University, Guangzhou, 510635, China
First published on 7th April 2016
The pathlength is a special factor in near-infrared (NIR) spectroscopy analysis. It is a physical variable that is similar to the sample concentration, and is a variable combined with the geometric dimension deviation of the sample pool, installation error of the light source and incident optical fiber. The decrease of accuracy caused by the pathlength changes is a difficult problem in quantitative spectrometric analysis. The proposed “M + N” theory is a measurement theory in which the quantity impacting the measurement accuracy is classified as M elements and N factors. The theoretical connotation and three application methods of N factors are proposed and analyzed based on the definitions of the theory. “M + N” theory provides a set of guidelines to find a variety of ways to solve the negative impact of N factors on the measurement accuracy. In order to verify the effectiveness of three application methods, an experiment was done to predict the concentration of intra-lipids by single pathlength and multi-pathlength modelling. The experimental results showed that the prediction accuracy of the multi-pathlength was higher than that of the single pathlength. The experimental results also showed that the distribution of the calibration set should be larger than that of the predication set. In this case, the predication accuracy will be improved significantly. The connotation provides a basis for the measurement, and three methods of N factors can be used to improve the accuracy of quantitative spectrometric analysis. Using the multi-pathlength method to build a model, the errors caused by the replacement of the sample pool and instrument installation are decreased.
The accuracy of spectrometric detection is affected by pathlength,7 temperature,8 humidity,9 filament voltage of the light source,10 instruments and other environmental factors.11 Among these, the pathlength error and temperature variation are the two most influential factors. In the research of reducing the temperature effect, the condition of the accuracy declining with the temperature effect was improved by selecting an effective algorithm.12,13 At present, in NIR spectroscopy measurement-related research, there are three study areas on pathlength calibration: (i) minimizing the pathlength error through detecting, and selecting the optimal pathlength by evaluating the pathlength on the differential pathlength factor (DPF),7,14 (ii) reducing the impact of errors by using different pretreatment methods,15,16 and (iii) obtaining better measurement accuracy with homogenization modelling on a multiple-pathlength.17 However, there is a common problem that these studies ignored, which is the impact of the error of the pathlength itself on the measurement accuracy. The pathlength error is a synthetic error combined with geometric errors of the sample dish, instrument installation error and other geometric or installation errors. Indeed, these errors are difficult to precisely measure and calculate. If the measurement technique or modelling method can be changed to reduce the effect of the pathlength error, the increased cost caused by using a high precision instrument and precisely adjusting the pathlength location with difficulty will be avoided.
The connotation of “M + N” theory18 has not been studied in detail. The correlation of M elements and N factors was to be defined. The connotation methods of N factors have also been provided, and the objectives of the experiment were to: (i) analyze the influence of the pathlength as a non-linear factor in spectrometric detection and (ii) show that the three connotations were feasible.
The premise of “M + N” theory is that both M elements and N factors are measurable. Measurable means that if the M elements or N factors change, this will inevitably lead to the spectral data changing. The changes of spectral data reflect that some of the M elements or N factors are existent, and the data values represent the degree of the changes. Only when they are measured can they be considered as M elements or N factors. Some of the N factors cannot be reflected in the model, because their effect on the system generates random error, and is eliminated during the modelling. The relationship of M elements or N factors and spectral data is reflected in the model. No matter whether the relationship is non-linear or linear, it can be described in a model. M elements or N factors in a model are similar in mathematics.
The properties of M elements and N factors in the measurement system are equivalent, and it is only their effect mechanism and the weight that are different. The influence degree of M elements and N factors on the measurement accuracy is different, but the influence nature is equivalent. Measure error generated separately by them is system error and random error. The error generated can be given the corresponding weight in accordance with their influence on the measurement system. And a linear model and non-linear model can be established on the mechanism of influence. In addition, these models can eventually be attributed to the same relational model. For example, the impact of temperature and the instrument is different, and their relationship models are also different. But their models can be synthesized to be a model in the end. Similar to the effect of temperature on the absorbance of solution components, there must be other N class factors that have the same effect on one or more kinds of solution components.
Some of the N factors must have impact on some of the M elements, and the impact can be expressed in the model. Known from the “M + N” theory, N factors are all factors which affect the light absorption, and some of them have impact on the physical or chemical properties of some M elements. From the energy point of view, these N factors change the capacity of light absorption, and the real spectra of the sample. Due to these N factors, as the physical properties or chemical properties of the sample change, the relation model is also accordingly changed. When the external environment changes, the influence degree of the N factors also changes, and the spectra are eventually changed.
The impact of M elements and N factors on measurement can be exchanged under different environment and measurement requirements. When the measurement environment changes, the component of the N factors also changes; some changes to be obvious, and some become weak. For example, if the environment is within a constant temperature, the temperature impact on the measurement is weak, and N factors cannot contain the temperature factor. When the measurement environment changed, some of the M elements change to be a background component, and the error caused by the influence of some N factors on the M class elements changes from a system error to random error. For instance, when a new sample has been measured, the degree of absorption, scattering and reflection of light of the current component is different from those of the previous sample. The type and size of the final error changed, and error may change from random error to system error, or from system error to random error. So the relationship is changed in mode. In a small error range, the relationship is linear, but in a large error range, the relationship is non-linear.
Applying the “M + N” theory to the measurement, it is possible to change the adverse effects of N factors into an advantage to improve the measurement accuracy. Lambert–Beer’s law is the basic theory of spectrometric analysis. The molar concentration and molar absorption coefficient reflect the properties of the sample, so they belong to the M class element. The pathlength belongs to the N class factors. The effect of the three quantities on the absorbance is equal.
![]() | (1) |
At present, in the spectrometric detection on the concentration of solution components, the pathlength is the only controlled interference quantity. It was precisely fixed after strict calculation, and regarded as an optimal constant. The pathlength error is a complex quantity, and mainly contains a thickness difference of the sample pool, position error of the sample pool, installation error of the light source, and geometric dimension error of fixture. These errors are difficult to accurately measure and adjust, so if only through these practices, it is impossible to reduce the error to satisfactory results. This study explores the methods to improve the measurement accuracy caused by pathlength error based on experiment. And through these practices, the application of N factors was found.
Scheme | Calibration set | Prediction set |
---|---|---|
1 | 8.2 mm | 7.8 mm |
2 | 8.2 mm | 8.0 mm |
3 | 8.2 mm | 8.2 mm |
4 | 8.2 mm | 8.4 mm |
5 | 8.2 mm | 8.6 mm |
6 | 8.0 mm and 8.2 mm | 5 pathlengths |
7 | 5 pathlengths | 5 pathlengths |
8 | 5 pathlengths | 8.0 mm and 8.2 mm |
The purpose of the experiment is to observe the influence of pathlength error on the prediction accuracy, and the effects of using multi-pathlength modelling on improving the forecasting accuracy. Previous studies showed that partial least squares regression (PLSR) is a better method to deal with the spectrum data.11 The following parameters on characterizing the predictive ability of eight schemes were computed: (i) principal component score, N; (ii) correlation coefficient of the calibration set, Rc2; (iii) correlation coefficient of the prediction set, Rp2; and (iv) root mean square error of prediction, RMSEP. The results of the analysis are shown in Tables 2 and 3.
Parameter | Scheme 1 | Scheme 2 | Scheme 3 | Scheme 4 | Scheme 5 |
---|---|---|---|---|---|
N | 2 | 2 | 1 | 2 | 2 |
Rc2 | 0.9661 | 0.9661 | 1 | 0.9661 | 0.9661 |
Rp2 | 0.5248 | 0.9592 | 0.9598 | 0.9602 | 0.9516 |
RMSEP | 4.9836 | 1.3576 | 0.6104 | 1.4105 | 1.9995 |
Parameter | Scheme 2 | Scheme 4 | Scheme 6 | Scheme 7 | Scheme 3 | Scheme 8 |
---|---|---|---|---|---|---|
N | 2 | 2 | 7 | 4 | 1 | 7 |
Rc2 | 0.9661 | 0.9661 | 0.9988 | 0.9374 | 1 | 0.9926 |
Rp2 | 0.9592 | 0.9602 | 0.8997 | 0.9206 | 0.9598 | 0.9922 |
RMSEP | 1.3576 | 1.4105 | 1.1796 | 1.0101 | 0.6104 | 0.2992 |
In quantitative spectrometric analysis, the model information includes relational characteristic information and range information. Relational characteristic information refers to the characteristics of the relationship between the two parts connected by the model. Range information refers to the range of uncertainty in the calibration set, which determines the scope of the model to be applied. The spectral information contains deterministic information and uncertainty information. Deterministic information is the real spectrum characteristic information, and the uncertainty information is the background information which is superimposed on the sample spectrum. Uncertain information is composed of some of M elements and N factors. The range information determines the robustness of the model.
The coefficient RMSEP of scheme 3 is the smallest, that of schemes 2 and 4 were larger than that of scheme 3, and that of schemes 1 and 5 were larger than that of schemes 2 and 4. Schemes 1, 2, 4, and 5 were the error states of scheme 3, and the prediction accuracy became worse with the increase of pathlength error.
The component information of M elements and the interference information of N factors are the only two kinds of information in the relation model. When the sample was prepared, the component information of M elements becomes a deterministic information. So the customary method is to assume that some of the N factors were constant. The specific approach is to adjust the N factors to be the best states, and to keep them in the state precisely. The states were considered to be continuous. The following relation model is formed with the information of the first application level. There is only one kind of information in the model:
MI = RSI | (2) |
The calibration samples should be large enough to cover all cases where the sample may occur. This rule was verified by experiment. There are three evidences in the results: (i) the coefficient RMSEP of scheme 8 is the highest, because it has five kinds of calibration samples and two kinds of predication samples, and the calibration sets include all kinds samples; (ii) the coefficient RMSEP of schemes 7 and 3 is smaller than that of schemes 2, 4, and 6, but less than that of scheme 8, because their calibration sample kinds were just equal to those kinds of all samples; and (iii) the coefficient RMSEP of scheme 6 is higher than that of schemes 2 and 4, and the reason is that it has a larger amount of sample than those two do.
The coefficient RMSEP of scheme 7 is higher than that of scheme 3 in Table 3. They all have a common feature, and the calibration sets are same as the prediction sets. But the sample kinds of scheme 7 are larger than those of scheme 3, and the effect of scattering is more serious in the multi-pathlength condition. These two reasons have led to the different measurement accuracy.
In most cases, it is difficult to measure and compensate for the error caused by some of the N factors precisely, such as filament voltage and lamp aging. So if the modelling set consists of the samples in all the range of N factors, and reflects all changes, then the trained model can respond to all states of N factors in the whole measurement process. Since the samples are complete, the prediction accuracy is relatively high. Expanding the scope of samples is equivalent to adding range information in the model. The uncertain information was changed into deterministic information, and the calibration model is trained by the full samples. The following relation model is for the application, and consists of the relational information and the range information:
MI = RSI + RGI | (3) |
MI = RSI + RGI + FI | (4) |
The experiment proved that in terms of eliminating the influence of pathlength error, there were three methods used to improve measurement accuracy. According to the accuracy from a low to high level, the methods are calibration samples covering all variation, adding calibration samples, and adding more information of N factors in the model, respectively. These methods were used based on the “M + N” theory. Multi-pathlength modelling can be used to improve prediction accuracy.
On the basis of “M + N” theory, the connotation of “M + N” theory was presented, and three application methods of N factors in improving measurement precision are proposed. As a N factor, the pathlength was used as an experiment analysis object to verify the effects of pathlength with errors on the measurement accuracy. The results show that the first two methods are effective, and the accuracy of the second method can meet the general needs. The last one needs to be verified by experiments. In the spectrometric detection, the pathlength is one of the key N factors, and represents the thickness of the sample pool, position error of incident light source and receiving optical fiber, and other uncertainty factors.
This journal is © The Royal Society of Chemistry 2016 |