Multi-pathlength method to improve the spectrometric analysis accuracy based on “M + N” theory

Gang Li; YongShun Luo; Zhe Li; ZeYun Li; Ling Lin

doi:10.1039/C6RA04323B

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C6RA04323B (Paper) RSC Adv., 2016, 6, 38849-38854

Multi-pathlength method to improve the spectrometric analysis accuracy based on “M + N” theory

Gang Li^ab, YongShun Luo^ac, Zhe Li^ab, ZeYun Li^ab and Ling Lin*^a
^aState Key Laboratory of Precision Measurement Technology and Instruments, Tianjin University, Tianjin 300072, China. E-mail: linling@tju.edu.cn
^bTianjin Key Laboratory of Biomedical Detecting Techniques & Instruments, Tianjin University, Tianjin 300072, China
^cCollege of Mechanical and Electronic Engineering, Guangdong Polytechnic Normal University, Guangzhou, 510635, China

Received 17th February 2016 , Accepted 5th April 2016

First published on 7th April 2016

Abstract

The pathlength is a special factor in near-infrared (NIR) spectroscopy analysis. It is a physical variable that is similar to the sample concentration, and is a variable combined with the geometric dimension deviation of the sample pool, installation error of the light source and incident optical fiber. The decrease of accuracy caused by the pathlength changes is a difficult problem in quantitative spectrometric analysis. The proposed “M + N” theory is a measurement theory in which the quantity impacting the measurement accuracy is classified as M elements and N factors. The theoretical connotation and three application methods of N factors are proposed and analyzed based on the definitions of the theory. “M + N” theory provides a set of guidelines to find a variety of ways to solve the negative impact of N factors on the measurement accuracy. In order to verify the effectiveness of three application methods, an experiment was done to predict the concentration of intra-lipids by single pathlength and multi-pathlength modelling. The experimental results showed that the prediction accuracy of the multi-pathlength was higher than that of the single pathlength. The experimental results also showed that the distribution of the calibration set should be larger than that of the predication set. In this case, the predication accuracy will be improved significantly. The connotation provides a basis for the measurement, and three methods of N factors can be used to improve the accuracy of quantitative spectrometric analysis. Using the multi-pathlength method to build a model, the errors caused by the replacement of the sample pool and instrument installation are decreased.

1. Introduction

Quantitative spectrometric analysis is a measuring method which is non-destructive, pollution-free, simple, and convenient. It was widely used in medical treatment,^1,2 foodstuff,^3,4 and agriculture,^5,6 and has a wide range of application prospects in the future. Spectrometric analysis accuracy has been improved by weakening the influence of interference factors and increasing the accuracy of measuring instruments. But in fact, error will never be eliminated, and in any case minute error has a great influence on the measurement accuracy in the detection of solution composition.

The accuracy of spectrometric detection is affected by pathlength,⁷ temperature,⁸ humidity,⁹ filament voltage of the light source,¹⁰ instruments and other environmental factors.¹¹ Among these, the pathlength error and temperature variation are the two most influential factors. In the research of reducing the temperature effect, the condition of the accuracy declining with the temperature effect was improved by selecting an effective algorithm.^12,13 At present, in NIR spectroscopy measurement-related research, there are three study areas on pathlength calibration: (i) minimizing the pathlength error through detecting, and selecting the optimal pathlength by evaluating the pathlength on the differential pathlength factor (DPF),^7,14 (ii) reducing the impact of errors by using different pretreatment methods,^15,16 and (iii) obtaining better measurement accuracy with homogenization modelling on a multiple-pathlength.¹⁷ However, there is a common problem that these studies ignored, which is the impact of the error of the pathlength itself on the measurement accuracy. The pathlength error is a synthetic error combined with geometric errors of the sample dish, instrument installation error and other geometric or installation errors. Indeed, these errors are difficult to precisely measure and calculate. If the measurement technique or modelling method can be changed to reduce the effect of the pathlength error, the increased cost caused by using a high precision instrument and precisely adjusting the pathlength location with difficulty will be avoided.

The connotation of “M + N” theory¹⁸ has not been studied in detail. The correlation of M elements and N factors was to be defined. The connotation methods of N factors have also been provided, and the objectives of the experiment were to: (i) analyze the influence of the pathlength as a non-linear factor in spectrometric detection and (ii) show that the three connotations were feasible.

2. “M + N” theory

“M + N” theory is a measurement methodology based on the overall consideration of a measured object and interference factor, and the measurement accuracy will be improved by using it. In quantitative spectrometric analysis, the measured objects contain a total of M kinds of components, and were expressed as m₁, m₂, m₃ and so on. The M species group is regarded as the M elements, also known as the M class elements when they represent the component information. In addition to the M elements, factors affecting the spectral information include the pathlength, the error of the instrument, the installation error, the light source error and environmental disturbance. These errors of the measurement system itself and environment disturbance are considered as N factors (n₁, n₂, n₃ and so on), and are also known as N class factors when they represent the component information.¹⁸ From a system point of view, both M elements and N factors can cause the sample performance spectrum to change, and eventually the spectral data are changed. Especially, some M elements and N factors should be completely reflected in the model. From a measurement error point of view, measure error caused by M elements and N factors is divided into systematic error and random error. If the error is a systematic error, M elements or N factors have a clear correlation with the spectral data. If measurement error is random error, there is no certain correlation between the M elements (or N factors) and the spectral data. So it would not reduce their impact by modelling. But accuracy also can be improved through using superposition average and normalization processing. “M + N” theory is based on modelling measurement, and used in quantitative spectrometric analysis.¹⁰ The research group has further perfected the theoretical connotation. There are four points of the connotation of “M + N” theory.

The premise of “M + N” theory is that both M elements and N factors are measurable. Measurable means that if the M elements or N factors change, this will inevitably lead to the spectral data changing. The changes of spectral data reflect that some of the M elements or N factors are existent, and the data values represent the degree of the changes. Only when they are measured can they be considered as M elements or N factors. Some of the N factors cannot be reflected in the model, because their effect on the system generates random error, and is eliminated during the modelling. The relationship of M elements or N factors and spectral data is reflected in the model. No matter whether the relationship is non-linear or linear, it can be described in a model. M elements or N factors in a model are similar in mathematics.

The properties of M elements and N factors in the measurement system are equivalent, and it is only their effect mechanism and the weight that are different. The influence degree of M elements and N factors on the measurement accuracy is different, but the influence nature is equivalent. Measure error generated separately by them is system error and random error. The error generated can be given the corresponding weight in accordance with their influence on the measurement system. And a linear model and non-linear model can be established on the mechanism of influence. In addition, these models can eventually be attributed to the same relational model. For example, the impact of temperature and the instrument is different, and their relationship models are also different. But their models can be synthesized to be a model in the end. Similar to the effect of temperature on the absorbance of solution components, there must be other N class factors that have the same effect on one or more kinds of solution components.

Some of the N factors must have impact on some of the M elements, and the impact can be expressed in the model. Known from the “M + N” theory, N factors are all factors which affect the light absorption, and some of them have impact on the physical or chemical properties of some M elements. From the energy point of view, these N factors change the capacity of light absorption, and the real spectra of the sample. Due to these N factors, as the physical properties or chemical properties of the sample change, the relation model is also accordingly changed. When the external environment changes, the influence degree of the N factors also changes, and the spectra are eventually changed.

The impact of M elements and N factors on measurement can be exchanged under different environment and measurement requirements. When the measurement environment changes, the component of the N factors also changes; some changes to be obvious, and some become weak. For example, if the environment is within a constant temperature, the temperature impact on the measurement is weak, and N factors cannot contain the temperature factor. When the measurement environment changed, some of the M elements change to be a background component, and the error caused by the influence of some N factors on the M class elements changes from a system error to random error. For instance, when a new sample has been measured, the degree of absorption, scattering and reflection of light of the current component is different from those of the previous sample. The type and size of the final error changed, and error may change from random error to system error, or from system error to random error. So the relationship is changed in mode. In a small error range, the relationship is linear, but in a large error range, the relationship is non-linear.

Applying the “M + N” theory to the measurement, it is possible to change the adverse effects of N factors into an advantage to improve the measurement accuracy. Lambert–Beer’s law is the basic theory of spectrometric analysis. The molar concentration and molar absorption coefficient reflect the properties of the sample, so they belong to the M class element. The pathlength belongs to the N class factors. The effect of the three quantities on the absorbance is equal.


	(1)

where A is absorbance, I is transmitted light intensity, I₀ is incident light intensity, ε is the molar absorption coefficient, b is the pathlength, and c is molar concentration.

At present, in the spectrometric detection on the concentration of solution components, the pathlength is the only controlled interference quantity. It was precisely fixed after strict calculation, and regarded as an optimal constant. The pathlength error is a complex quantity, and mainly contains a thickness difference of the sample pool, position error of the sample pool, installation error of the light source, and geometric dimension error of fixture. These errors are difficult to accurately measure and adjust, so if only through these practices, it is impossible to reduce the error to satisfactory results. This study explores the methods to improve the measurement accuracy caused by pathlength error based on experiment. And through these practices, the application of N factors was found.

3. Experimental

3.1 Instruments

A super continuum laser light source (NKTPhotonics, superk TM compactSuper, Denmark), was employed as the light source. All NIR spectra were measured from 1000 nm to 2000 nm using an AvaSpec-NIR256-2.5(TEC) near-infrared spectrometer (Avates, Apeldoorn, the Netherlands). The spectra are digitalized with ca. 3 nm intervals resulting in 256 data points. Each sample was scanned for 200 ms, and the mean spectrum from ten measurements was used for analysis. The solution is packed in an 18 × 18 × 16 mm quartz plate. The schematic diagram of the experimental equipment is shown in Fig. 1. The incident optical fiber of the spectrometer is inserted into the solution. The laser light source is fixed at the 2 mm position from the plate bottom, and the light transmits through the solution. The incident optical fiber is fixed on the precision sliding platform, and moved by controlling the micro displacement. The NIR spectra were collected by Avasoft7.6 software. In order to avoid the interference of ambient light, shading treatment was made in the experiment.


	Fig. 1 Schematic diagram of the multi-pathlength measurement system.

3.2 The samples and measurements

The initial sample is 30% intra-lipids produced by Huarui Pharmaceutical Co. Ltd. Because they have strong scattering properties, the spectra changed obviously with the variation of the pathlength. There are twelve samples of different concentrations from 30% to 19%, and the spacing is 1%. The initial position of the incident optical fiber is at a 4.4 mm distance from the bottom of the quartz plate. There are five pathlengths from 7.8 mm to 8.6 mm, and the spacing is 0.2 mm.

3.3 Pretreatment of the optical data

The spectral data were pretreated with a normalized method, and modelled with partial least squares regression (PLSR). The normalization process is used to eliminate the noise, reduce the difference of the absorbance, and restrain the influence of non-linearity. The sample processing scheme is shown in Table 1. In order to observe the effects of pathlength error on the spectrum measurement, the 8.2 mm location was selected as the center. The deviations are ±0.2 mm and ±0.4 mm, and the pathlengths are 7.8 mm, 8.0 mm, 8.2 mm, 8.4 mm and 8.6 mm. The spectrum data of position 8.2 mm are used as the calibration set, and the spectrum data of five pathlengths are prediction sets in schemes 1–5 (Fig. 2). The prediction effect of a single sample is observed. These experiment methods can be a simulation of the pathlength error condition. The multi-pathlength sample distribution is shown in schemes 6–8, the number of calibration sets is less than the number of prediction sets in scheme 6 and their numbers are equal in scheme 7, and the number of calibration sets is greater than the number of prediction sets in scheme 8 (Fig. 2).

Table 1 The experiment scheme

Scheme	Calibration set	Prediction set
1	8.2 mm	7.8 mm
2	8.2 mm	8.0 mm
3	8.2 mm	8.2 mm
4	8.2 mm	8.4 mm
5	8.2 mm	8.6 mm
6	8.0 mm and 8.2 mm	5 pathlengths
7	5 pathlengths	5 pathlengths
8	5 pathlengths	8.0 mm and 8.2 mm


	Fig. 2 Sample distribution of eight experiment schemes.

4. Results and discussion

The spectra of 25% intra-lipids at 7.8 mm, 8.0 mm, 8.2 mm, 8.4 mm, and 8.6 mm are shown in Fig. 3. The relationship was non-linear, and the spectra increment was increased with the pathlength reducing.


	Fig. 3 Spectra of 25% concentration intra-lipids at five pathlengths.

The purpose of the experiment is to observe the influence of pathlength error on the prediction accuracy, and the effects of using multi-pathlength modelling on improving the forecasting accuracy. Previous studies showed that partial least squares regression (PLSR) is a better method to deal with the spectrum data.¹¹ The following parameters on characterizing the predictive ability of eight schemes were computed: (i) principal component score, N; (ii) correlation coefficient of the calibration set, R_c²; (iii) correlation coefficient of the prediction set, R_p²; and (iv) root mean square error of prediction, RMSEP. The results of the analysis are shown in Tables 2 and 3.

Table 2 The experiment results of schemes 1–5

Parameter	Scheme 1	Scheme 2	Scheme 3	Scheme 4	Scheme 5
N	2	2	1	2	2
R_c²	0.9661	0.9661	1	0.9661	0.9661
R_p²	0.5248	0.9592	0.9598	0.9602	0.9516
RMSEP	4.9836	1.3576	0.6104	1.4105	1.9995

Table 3 The experiment results of different sample distributions

Parameter	Scheme 2	Scheme 4	Scheme 6	Scheme 7	Scheme 3	Scheme 8
N	2	2	7	4	1	7
R_c²	0.9661	0.9661	0.9988	0.9374	1	0.9926
R_p²	0.9592	0.9602	0.8997	0.9206	0.9598	0.9922
RMSEP	1.3576	1.4105	1.1796	1.0101	0.6104	0.2992

In quantitative spectrometric analysis, the model information includes relational characteristic information and range information. Relational characteristic information refers to the characteristics of the relationship between the two parts connected by the model. Range information refers to the range of uncertainty in the calibration set, which determines the scope of the model to be applied. The spectral information contains deterministic information and uncertainty information. Deterministic information is the real spectrum characteristic information, and the uncertainty information is the background information which is superimposed on the sample spectrum. Uncertain information is composed of some of M elements and N factors. The range information determines the robustness of the model.

4.1 The decrease of prediction accuracy for pathlength error

The correlation coefficient R_c² reaches 0.9661 due to the same calibration sets in Table 2. The correlation coefficient R_p² was very high except for scheme 1. The R_p² value of scheme 1 is larger than that of the others, because the pathlength is shorter than the others, and the scattering phenomenon is more serious. These two correlation coefficients show that the sets were reasonable.

The coefficient RMSEP of scheme 3 is the smallest, that of schemes 2 and 4 were larger than that of scheme 3, and that of schemes 1 and 5 were larger than that of schemes 2 and 4. Schemes 1, 2, 4, and 5 were the error states of scheme 3, and the prediction accuracy became worse with the increase of pathlength error.

The component information of M elements and the interference information of N factors are the only two kinds of information in the relation model. When the sample was prepared, the component information of M elements becomes a deterministic information. So the customary method is to assume that some of the N factors were constant. The specific approach is to adjust the N factors to be the best states, and to keep them in the state precisely. The states were considered to be continuous. The following relation model is formed with the information of the first application level. There is only one kind of information in the model:


MI = RSI	(2)

where MI is all the information in the relation model, and RSI is the relational information of the N factors. Some of the N class factors include the light source voltage, light intensity of stable light, ambient temperature, pathlength, sample plate thickness, and so on. The effect of the pathlength error on accuracy is shown in Table 2, and the error always existed. There needs to be more ways to increase the accuracy.

4.2 Calibration samples covering all variation to increase accuracy

The correlation coefficients R_c² and R_p² are high in Table 3, and the relationship between the spectrum and the concentration was expressed by the calibration model and the prediction model. But R_p² of scheme 7 is less than that of the others. The reason was that the number of calibration samples is less than that of the other schemes. The accuracy of prediction was affected by whether the quantity difference between the calibration samples and the whole samples were positive.

The calibration samples should be large enough to cover all cases where the sample may occur. This rule was verified by experiment. There are three evidences in the results: (i) the coefficient RMSEP of scheme 8 is the highest, because it has five kinds of calibration samples and two kinds of predication samples, and the calibration sets include all kinds samples; (ii) the coefficient RMSEP of schemes 7 and 3 is smaller than that of schemes 2, 4, and 6, but less than that of scheme 8, because their calibration sample kinds were just equal to those kinds of all samples; and (iii) the coefficient RMSEP of scheme 6 is higher than that of schemes 2 and 4, and the reason is that it has a larger amount of sample than those two do.

The coefficient RMSEP of scheme 7 is higher than that of scheme 3 in Table 3. They all have a common feature, and the calibration sets are same as the prediction sets. But the sample kinds of scheme 7 are larger than those of scheme 3, and the effect of scattering is more serious in the multi-pathlength condition. These two reasons have led to the different measurement accuracy.

In most cases, it is difficult to measure and compensate for the error caused by some of the N factors precisely, such as filament voltage and lamp aging. So if the modelling set consists of the samples in all the range of N factors, and reflects all changes, then the trained model can respond to all states of N factors in the whole measurement process. Since the samples are complete, the prediction accuracy is relatively high. Expanding the scope of samples is equivalent to adding range information in the model. The uncertain information was changed into deterministic information, and the calibration model is trained by the full samples. The following relation model is for the application, and consists of the relational information and the range information:


MI = RSI + RGI	(3)

where RGI is the range information of the N factors. Compared with formula (2), as the number of samples becomes larger, the former contains more information of the N class factors. So the measurement accuracy is higher. In addition, if we want a higher accuracy, adding more information of N factors and M elements is an effective method.

4.3 Adding calibration samples to increase accuracy

The coefficient RMSEP of scheme 8 is smallest in Table 3. Its calibration samples are 2.5 times as much as the prediction samples, and the prediction accuracy of the model trained with large samples is much higher. A large number of samples ensures that the quantity of the sample is enough. Other N factors like the pathlength also have a similar feature. So expanding the sample scope is an effective method to achieve a higher accuracy than usual, on the basis of decreasing unit variation of N class factors purposefully.

4.4 Adding more information of N factors in the model to increase accuracy

If the model is simply established with the linear model defined by Lambert–Beer’s law, it cannot express the non-linear relation generated by M elements and N factors. At the same time the measurement influence from other factors except for the pathlength also cannot be expressed. Pathlength is the only one of the N class factors in Lambert–Beer’s law. Except for the pathlength more relational information is needed, and when the measurement accuracy is required to be higher than normal, the information includes both linear and non-linear information. The following relation model forms the information under this condition, and it consists of the relational information, the range information, and the feature information:


MI = RSI + RGI + FI	(4)

where FI is the feature information. For example, the deliberate action of changing the temperature and pathlength in a certain range was done, and the spectral data were obtained under a different temperature and pathlength. Then the model information was constituted together with the spectral data and the feature information of temperature and pathlength. Before measurement, experiments was designed according to this method. The precision of the established model with large and multiple types of samples was the highest.

The experiment proved that in terms of eliminating the influence of pathlength error, there were three methods used to improve measurement accuracy. According to the accuracy from a low to high level, the methods are calibration samples covering all variation, adding calibration samples, and adding more information of N factors in the model, respectively. These methods were used based on the “M + N” theory. Multi-pathlength modelling can be used to improve prediction accuracy.

5. Conclusions

The “M + N” theory is a measurement theory applied in quantitative spectrometric analysis, and suggests that all the components and their relationships should be considered in the measurement system. The target of the theory is to improve the final measurement accuracy, and looks for more effective and convenient measurement methods. At present, people used to improve the accuracy of the instrument and select algorithms to improve the measurement accuracy, and a feasible method was lacking to improve the measurement accuracy. The methods of using N factors can significantly improve the accuracy.

On the basis of “M + N” theory, the connotation of “M + N” theory was presented, and three application methods of N factors in improving measurement precision are proposed. As a N factor, the pathlength was used as an experiment analysis object to verify the effects of pathlength with errors on the measurement accuracy. The results show that the first two methods are effective, and the accuracy of the second method can meet the general needs. The last one needs to be verified by experiments. In the spectrometric detection, the pathlength is one of the key N factors, and represents the thickness of the sample pool, position error of incident light source and receiving optical fiber, and other uncertainty factors.

Acknowledgements

This project was supported by Tianjin Application Basis & Front Technology Study Programs (No. 14JCZDJC33100 and No. 11JCZDJC17100).

Notes and references

K. Maruo and Y. Yamada, J. Biomed. Opt., 2015, 20, 047003 CrossRef PubMed.
M. S. Bergholt, W. Zheng, K. Lin, K. Y. Ho, M. Teh and K. G. Yeoh, Biosens. Bioelectron., 2011, 26, 4104 CrossRef CAS PubMed.
G. Bazar, R. Romvari, A. Szabo, T. Somogyi, V. Eles and R. Tsenkova, Food Chem., 2016, 194, 873 CrossRef CAS PubMed.
S. Saranwong and S. Kawano, J. Near Infrared Spectrosc., 2008, 16, 389 CrossRef CAS.
S. R. Delwiche and R. A. Graybosch, Talanta, 2016, 146, 496 CrossRef CAS PubMed.
M. Zude, M. Pflanz, L. Spinelli, C. Dosche and A. Torricelli, J. Food Eng., 2011, 103, 68 CrossRef CAS.
E. Gussakovsky, O. Jilkina, Y. Yang and V. Kupriyanov, Anal. Biochem., 2008, 382, 107 CrossRef CAS PubMed.
H. Xu, S. Alur, Y. Wang, A.-J. Cheng, K. Kang, Y. Sharma, M. Park, C. Ahyi, J. Williams, C. Gu, A. Hanser, T. Paskova, E. A. Preble, K. R. Evans and Y. Zhou, In Situ Raman Analysis of a Bulk GaN-Based Schottky Rectifier Under Operation, J. Electron. Mater., 2010, 39(10), 2237–2242 CrossRef CAS.
L. Miñambres, M. N. Sánchez, F. Castaño and F. J. Basterretxea, Infrared Spectroscopic Properties of Sodium Bromide Aerosols, J. Phys. Chem. A, 2008, 112(29), 6601–6608 CrossRef PubMed.
L. Gang, L. Zhe, L. Xiao-xia, L. Ling, Z. Bao-ju and W. We, Study on Effect of Source Voltage on the Accuracy of Quantitative Spectral Analysis Based on “M + N” Theory, Spectrosc. Spectral Anal., 2013, 33(6), 1457–1461 Search PubMed.
I. V. Kovalenko, G. R. Rippke and C. R. Hurburgh, Determination of Amino Acid Composition of Soybeans (Glycinemax) by Near-Infrared Spectroscopy, J. Agric. Food Chem., 2006,(54), 3485–3491 CrossRef CAS.
X. Zhang and M. Chang, Infrared Phys. Technol., 2010, 53, 177 CAS.
T. Chen and E. Martin, J. Chemom., 2007, 21, 198 CrossRef CAS.
P. S. Jensen and J. Bak, Appl. Spectrosc., 2002, 56, 1600 CrossRef CAS.
T. Tanveer, J. H. Moore and S. G. Diamond, J. Biomed. Opt., 2013, 18, 056001 CrossRef PubMed.
W. Yahong, D. Daming, Z. Ping, Z. Wengang, Y. Song and W. Wenzhong, Spectrosc. Spectral Anal., 2014, 34, 2863 Search PubMed.
L. Gang, Z. Zhe, L. Rui, W. Huiquan, W. Hongjie and L. Ling, Spectrosc. Spectral Anal., 2010, 30, 2381 Search PubMed.
L. Gang, L. Zhe, W. Xiaofei, L. Gui-li and L. Ling, Journal of Beijing Information Science and Technology University, 2013, 28, 9 Search PubMed.

Click here to see how this site uses Cookies. View our privacy policy here.