A green method for the quantification of polysaccharides in Dendrobium officinale

Yong-Huan Yun; Yang-Chao Wei; Xing-Bing Zhao; Wei-Jia Wu; Yi-Zeng Liang; Hong-Mei Lu

doi:10.1039/C5RA21795D

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C5RA21795D (Paper) RSC Adv., 2015, 5, 105057-105065

A green method for the quantification of polysaccharides in Dendrobium officinale

Yong-Huan Yun† ^a, Yang-Chao Wei†^a, Xing-Bing Zhao^b, Wei-Jia Wu^b, Yi-Zeng Liang^a and Hong-Mei Lu*^a
^aCollege of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China. E-mail: hongmeilu@csu.edu.cn; Tel: +86 731 88830831
^bHunan Longshishan Dendrobium Candidum Wall.ex Lindl Base Co., Ltd, Changsha 410205, PR China

Received 19th October 2015 , Accepted 18th November 2015

First published on 26th November 2015

Abstract

Polysaccharides are one of the active components of Dendrobium officinale (D. officinale) and its content is used as one of the main quality assessment criteria. The existing methods for polysaccharide quantification involve sample destruction, tedious sample processing, high cost, and non-environmentally friendly pretreatment. The aim of this study is to develop a simple, rapid, green and nondestructive analytical method based on near infrared (NIR) spectroscopy and chemometrics methods. A set of 84 D. officinale samples from different origins was analyzed using NIR spectroscopy. Potential outlying samples were initially removed from the collected NIR data in two steps using the Monte Carlo sampling (MCS) method. Spectral data preprocessing was studied in the construction of a partial least squares (PLS) model. To eliminate uninformative variables and improve the performance of the model, the pretreated full spectrum was calculated using different wavelength selection methods, including competitive adaptive reweighted sampling (CARS), Monte Carlo-uninformative variable elimination (MC-UVE) and interval random frog (iRF). The selected wavelengths model met the following three points: (1) improved the prediction performance; (2) reduced the number of variables; (3) provided a better understanding and interpretation, which proves that it was necessary to conduct wavelength selection in the NIR analytical systems. When comparing the three wavelength selection methods, the results show that CARS has the best performance with the lowest root mean square error of prediction (RMSEP) on the independent test set and least number of latent variables (nLVs). This study demonstrates that the NIR spectral technique with the wavelength selection algorithm CARS could be used successfully for the quantification of the polysaccharide content in D. officinale.

1. Introduction

Dendrobium officinale (D. officinale) is one of the most precious and famous traditional Chinese medicinal materials in China. It is claimed to have the function of maintaining gastric tonicity, nourishing Yin and enhancing the production of body fluid.^1,2 It also has been used as a therapeutic agent for curing cataracts, throat inflammation, fever and chronic superficial gastritis.³ Many studies have suggested that these properties were related to its polysaccharides, one of the main active components of D. officinale.^4–7

The content of polysaccharides is used as one of the quality assessment criteria (no less than 0.2500 g of glucose per g dry weight) in Chinese pharmacopoeia.⁸ It varies with geographical origin and the time of harvest. By far, quantification of the polysaccharides in D. officinale is mainly performed using a colorimetric method, such as the phenol-sulphuric acid method or the anthrone-sulphuric acid method. However, these methods involve sample destruction, tedious sample processing, high cost, and non-environmentally friendly pretreatment, because they require the severe conditions of high temperature and a strong acid. Therefore, a simple, rapid, green and nondestructive analytical technique is in great demand to determine the polysaccharide content in D. officinale.

Nowadays, as a rapid, green, cost-effective and nondestructive analytical technique, near infrared (NIR) spectroscopy has been widely applied to qualitative and quantitative analysis in agriculture, pharmaceuticals, polymer production and food quality evaluation.^9–18 Recently, NIR spectroscopy has been employed to study traditional Chinese herbs.¹⁹ Some studies on the quantitative analysis of total polysaccharides using NIR have been reported.^20–22 NIR spectra assess chemical structures through the analysis of the molecular bonds (e.g. C–H, N–H and O–H, which are the primary structural components of organic molecules) in the NIR region, and their characteristic spectra comprise different overtones and combinations of vibrations that are attributable to the make-up of the molecules .²³ As a powerful technique, NIR spectroscopy has gained wide acceptance in many fields by virtue of its advantages over other analytical techniques, such as being highly efficient, economical, the ease of operation, and the most salient is its ability to record spectra for solid and liquid samples without any sample preparation. However, NIR spectroscopy usually encounters a collinearity problem because of the strongly overlapped and broad absorption bands.²⁴ To address this problem, partial least squares (PLS)²⁵ has been proposed to create a calibration model with NIR data. Typically, the establishment of a calibration model usually covers all of the measured wavelengths. It is obvious that such a full spectrum model may contain useless or irrelevant information, which may worsen the predictive ability of the developed model. Liang et al. have demonstrated the importance and necessity of wavelength selection in a NIR analytical system.^26,27 Many papers have also proven that it is very important and essential to conduct wavelength selection to gain better prediction performance.^28–31 The aim and significance of wavelength selection can be summarized in three points: (1) improving the prediction performance of the calibration model, (2) providing faster and more cost-effective predictors by reducing the curse of dimensionality, (3) providing a better understanding and interpretation of the underlying process that generated the data.^32,33

In this work, the first work is to establish the PLS calibration model between the NIR full spectrum data of D. officinale and its polysaccharides. Then, the prediction results of wavelength selection methods and the full spectrum are compared. Three recent and often-used wavelength selection methods, including competitive adaptive reweighted sampling (CARS),³⁴ Monte Carlo-uninformative variable elimination (MC-UVE)³⁵ and interval random frog (iRF),³⁶ were employed to compare. Finally, the best wavelength selection is determined based on the prediction performance and model complexity to develop a calibration model for the prediction of the polysaccharide content in D. officinale.

2. Materials and methods

2.1. Sample collection and reagents

A total of 84 D. officinale samples were collected from different locations in China in the period from April 2012–April 2014, which are shown in Table 1. It provided a representative set of the D. officinale consumed in China, which comprised enough variation to make the quantitative model robust. Analytical grade D-glucose was purchased from Sigma-Aldrich (Sigma, St. Louis, MO, USA). Water was purified using a Milli-Q academic water purification system (Milford, MA, USA). Sulphuric acid of guaranteed reagent grade was purchased from Sinopharm Chemical Reagent Co., Ltd. (Shanghai, China). Other reagents including phenol and ethanol were of analytical grade.

Table 1 D. officinale sample information

Sample no.	Origin	Collection time
1–6	Yunnan	Feb. 2013–Mar. 2013
7–12	Zhejiang	Apr. 2012–Oct. 2012
13–14	Hunan	Sep. 2012–Jul. 2013
15–16	Zhejiang	Jul. 2013–Aug. 2013
17–20	Henan	Jul. 2013–Aug. 2013
21–32	Hunan	Dec. 2013
33–49	Hunan	Feb. 2014
50–53	Yunnan	Feb. 2014
54–61	Yunnan	Mar. 2013
62–67	Zhejiang	Apr. 2012–Jul. 2012
68–84	Hunan	Apr. 2014

2.2. Sample preparation and quantitative analysis

All of the samples were dried at 55 °C in a forced-draught oven from Shanghai Pharmacy Machine Co. (Shanghai, China). After brushing off soil dust from the surface, the samples were ground to fine pieces with a blender and screened through a 60-mesh sieve (particle size ≤ 0.2 mm). These sieved powders were used for further analysis.

The D. officinale polysaccharide content was firstly measured with the phenol-sulphuric acid method provided by Chinese pharmacopoeia (State Pharmacopoeia Committee 2010). A glucose calibration curve was firstly prepared. The glucose (0.255 g) dried to constant weight at 105 °C was placed in a 250 ml volumetric flask, and water was added to obtain a 100 μg ml⁻¹ solution. Glucose solution volumes of 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0 ml were accurately drawn and added to 10 ml test tubes with lids, and water was added to make the volume 1 ml in each case. Then 1 ml of a 5% phenol solution was added, mixed, and 5.0 ml of sulphuric acid was quickly added, shaken, bathed in 90 °C water for 20 min, then put in an ice bath for 5 min. A BTT miniature array spectrophotometer (B&W Tek, Newark, DE, USA) equipped with glass or quartz cells of 1 cm path length was used for the measurement of absorbance spectra. A Lenovo personal computer was used to control the spectrometer and collect data via BWSpec4 Software. The absorbance unit was recorded at a wavelength of 488.02 nm. The calibration curve was made according to the absorbance unit and glucose concentration.

Polysaccharide measurements were conducted as follows. An accurately weighed, powdered D. officinale sample (0.3 g) was loaded into a standard apparatus set and refluxed for 2 h with 200 ml of water. Subsequently, the sample was cooled to room temperature and transferred to a 250 ml volumetric flask, water was added up to the volume mark, then it was shaken and filtered. Then 2 ml of filtrate was precipitated using ethanol (10 ml) at 4 °C, followed by centrifugation for 30 min at 4000 rpm. The precipitate was washed twice with 8 ml of 80% ethanol. The precipitate obtained after filtering was dissolved in water and collected in a 25 ml volumetric flask. The following operation was based on the aforementioned calibration curve of glucose. The results were expressed as grams of glucose equivalents per gram of dry weight (g glucose per g DW) through the calibration curve with glucose. The content of each sample was determined in triplicate, and the mean of the three measurements was used for further analysis.

2.3. NIR spectroscopy measurements

With the Integrating Sphere module of the Antaris II Fourier transform near infrared (FT-NIR) analyzer (Thermo Scientific, Madison, USA), NIR diffuse reflection spectra were collected from 10 [thin space (1/6-em)]

000 to 4000 cm⁻¹ (1557 wavelength points). The reference spectrum is the gold foil. Each sample was scanned 32 times with a resolution of 8 cm⁻¹ using a background of air and the average of the spectrum from the 32 scans was taken as one result. The environment temperature was controlled at 25 ± 1 °C with an air conditioner.

A standard sample cup was used to collect the spectra of the D. officinale samples. It was the standard accessory sample holder, specifically designed by Thermo Electron Co. About 0.5 g of the sample in powder form was filled into the sample cup in the standard procedure. In order to avoid errors from uneven samples, the sample cup was rotated 120° to record another spectrum after each measurement. Each sample was measured three times. The mean of three spectra which were collected from the same sample was used for the following analysis.

A set of 84 D. officinale samples from different origins in China was analyzed using NIR spectroscopy. The generated spectra of the 84 samples are shown in Fig. 1(a).


	Fig. 1 (a) The raw NIR spectra of 84 D. officinale samples; (b) preprocessed spectra using SNV + SG 1st derivative of 75 D. officinale samples.

2.4. Outlier detection and spectral data preprocessing

Constructing a high-quality model depends on the execution of several steps. One important step is outlier detection. The step of outlier detection should be prior to establishing the calibration model. Outliers are abnormal results in some sense. They may present non-representative samples that could introduce great errors to a model. In this work, a novel strategy which was termed the Monte Carlo sampling (MCS) method was used for the outlier detection. According to the method, there may be three types of outliers.³⁷ The first type of outlier is in the dependent variable y direction. These break away from the normal distribution of y and will cause a large error sum of squares. The second type of outlier is in the predictor or independent variable x direction. This sort of outlier is far away from the main body of the samples. The third type of outlier, so called outliers towards the model, can be found only after building the regression model. They represent a different relationship between x and y. In the MCS method, the number of latent variables (nLVs) was firstly determined using cross-validation in PLS. With the help of the MCS method, the whole data set was randomly divided into two parts, the calibration set and independent test set, respectively. After that the calibration set was used to establish the model using the optimal nLVs. The independent test set was used for prediction. A prediction error would be obtained for each test sample. This cycle was executed 1000 times. Finally, the prediction error distribution for each sample was obtained. The histograms of these distributions were plotted and their statistical features were used to detect the outliers.

In addition to useful information, spectral signals contain systematic noise, such as baseline variation, sample background, light scattering and so on.³⁸ In order to build a robust and reliable model, some preprocess must be undertaken to weaken and eliminate interference in the spectra. In this study, eight different signal pre-treatment methods were evaluated and compared, including multiplicative scattering correction (MSC), standard normal transformation (SNV), first and second derivatives computed using the Savitzky–Golay (S–G) method, and the combinations of MSC (or SNV) with the derivatives. MSC is an important procedure for the correction of scattered light caused by different particle sizes. It is also used to correct the additive and multiplicative effects in the spectra. SNV is a mathematical transformation method of the log(1/R) spectra used to remove slope variation and to correct for scatter effects.^39,40 Compared to SNV, first and second derivatives are used to reduce peak overlap and remove constant and linear baseline drift, respectively. Thus, they are often used to eliminate baseline drifts and enhance small spectral differences between samples.⁴¹

2.5. Multivariate calibration methods

2.5.1. Partial least squares (PLS) regression. PLS is a commonly used multivariate calibration method. It investigates the fundamental relations between the response vector (the properties of interest), y, and the spectral data matrix, X. In this method, data is compressed into orthogonal factors, which have similar properties to PCs in principal component analysis (PCA).^42,43 Here, the purpose of PLS is to establish a regression model to make the prediction of chemical constituent concentrations. It extends and improves the potential applications of the spectroscopic technique in the food industry through extracting features from spectra.⁴⁴

Three different wavelength selection methods combined with PLS, including competitive adaptive reweighted sampling (CARS), Monte Carlo-uninformative variable elimination (MC-UVE) and interval random frog (iRF) were employed to compare and determine the effective wavelengths.

CARS³⁴ is a novel variable selection algorithm, which is similar to the “survival of the fittest” principle in Darwin’s Theory of Evolution. The wavelengths with large absolute coefficients that are selected by CARS were defined as the key wavelengths. In each sampling run, CARS contains four successive steps: (1) use of the MC sampling method to select modeling samples randomly; (2) employ an exponentially decreasing function (EDF) to remove the wavelengths which are of relatively small absolute regression coefficients by force; (3) adopt adaptive reweighted sampling (ARS) to realize a competitive selection of wavelengths; (4) employ cross-validation to evaluate the subset and finally to choose the subset with the lowest root mean squared error of cross validation (RMSECV). For CARS, the number of sampling runs was set to 100.

MC-UVE³⁵ is a useful variable selection algorithm, which combined a Monte Carlo (MC) strategy with the uninformative variable elimination (UVE) method. The MC-UVE method builds a large number of PLS sub-models with randomly selected calibration samples at first, and each variable is evaluated with a stability of the corresponding regression coefficient. Variables with poor stability are known as uninformative variables and are eliminated. The number of MC sampling runs was set to 1000 in this study.

iRF³⁶ is a wavelength interval selection method that considers the continuity of spectra. It is based on random frog⁴⁵ that employs a reversible jump Markov Chain Monte Carlo (RJMCMC)-like search algorithm in the model space through both fixed-dimensional and trans-dimensional between different models. The objective function is to find the subset which has the maximum regression coefficient. Spectra are first divided into sub-intervals of the whole spectra using a moving window of a fixed width and thus it can obtain all of the possible continuous spectral intervals. Each interval is regarded as the variable and then is inputted into the RJMCMC algorithm. A pseudo-MC MC chain is used to compute the selection probability of each interval, and then rank all of the intervals based on the selection probability. Afterwards, the best intervals with the lowest RMSECV are chosen. In this work, with 1557 full spectral points, the width of the interval was set to 20 resulting in 1538 intervals in total and each interval had 20 variables.

2.6. Data division and model performance evaluation

After sample outlier detection and the best pretreatment selection, the next step was to divide the whole data set into calibration and independent test sets, which were used to build and validate the model, respectively. To assure that the division of the calibration set and independent test set were well proportioned, a procedure based on the Duplex algorithm was used to split the data set.^46,47

In this work, selection was performed using a splitting ratio of 2 [thin space (1/6-em)] :1 (50 samples formed the calibration set, and the remaining 25 samples served as the independent test set). The statistical values of the polysaccharide content in the calibration and independent test sets are listed in Table 2. After the division, the content values in the calibration and independent test sets covered a wide range, which is helpful for developing a robust model.

Table 2 The D. officinale polysaccharides content measured with the phenol-sulphuric acid method and the number of D. officinale samples used in dataset

Data set	Number	Max (g glucose per g DW^b)	Min (g glucose per g DW)	Mean ± S.D^a (g glucose per g DW)
a S.D is the standard deviation.b DW is the dry weight.
Total	75	0.7063	0.1863	0.4006 ± 0.1329
Calibration set	50	0.7063	0.1863	0.4111 ± 0.1302
Test set	25	0.6952	0.1925	0.3796 ± 0.1385

The calibration set was used for building a PLS model and wavelength selection, and the independent test set was used for external validation. The optimal nLVs on the calibration set were determined using a 10-fold cross validation as the maximum nLVs was set to 15. The built model was then used to predict the calibration set and test set, generating a root mean squared error of fitting on the calibration set (RMSEC) value and a root mean squared error of prediction on the independent test set (RMSEP) value. Thus, RMSEC, R²_cal, RMSEP and R²_pre (R² on the test set), were employed to assess the performance of the generated model. The RMSECV and R²_cv were used to determine the spectral data preprocessing method.

2.7. Software

NIR spectra were collected using an Antaris II FT-NIR spectrometer. The instrument was equipped with spectral acquisition software called ‘‘Results’’. After the NIR spectra were collected, the spectra were imported directly into MATLAB (Version 2013A, the MathWorks, Inc) on a general-purpose computer with an Intel® Core® i5 3.2 GHz CPU and 3GB RAM, with the operating system Microsoft Windows XP. The spectral data preprocessing and multivariate calibration were implemented using the written codes in MATLAB, which can be downloaded freely from the website: http://www.libpls.net/.

3. Results and discussion

3.1. Polysaccharide content measurements

The polysaccharide content in all of the 84 samples was determined using the reference method (see Section 2.2). The glucose calibration equation was Y = 0.0094X + 0.0016, R² = 0.9998, which showed a good linear relationship between 0.0 and 0.1 mg ml⁻¹ glucose content and the absorbance unit. After outlier removal (see Section 3.2.1), there were 75 samples for PLS modeling. The polysaccharide content in the 75 D. officinale samples was calculated according to the glucose calibration equation and absorbance unit, and are shown in Table 2. Overall it was 0.4006 ± 0.1329 g glucose per g DW. The polysaccharide content of some of the samples was less than 0.2500 g glucose per g DW, the threshold value restricted by the pharmacopoeia. Therefore, it was necessary to monitor the quality of D. officinale.

3.2. Model building

3.2.1. Deletion of outlying samples. The results of outlier detection using the MCS method are shown in Fig. 2. From Fig. 2(a), the three samples (12, 28 and 29) in the top left area are outliers in the x direction which have a large standard deviation of prediction errors, and the lower right area contains two outliers (57 and 70) in the y direction, which have a large mean value of prediction errors. As mentioned above, the division of the samples is based on the MCS method, so the first result may not really show all of the outliers. In order to further detect the potential outliers, the MCS method was run for the remaining samples once again after the last outlier detection. Similar to Fig. 2(a), Fig. 2(b) shows the results for the data set including two different types of outliers. From this plot, it can be seen that the entire data are clearly divided into three parts, and different types of outliers compactly cluster together. The results show that two samples (69 and 71) in the lower right area are outliers in the y direction, and the top right two samples (27 and 47) are outliers both in the x and y directions. From Fig. 2(b), the four samples which are not shown significantly in the first step are far away from the main body of the data with higher mean values or higher deviations of prediction errors. The MCS method was first used in two steps to reveal the potential outliers in this study. After removal of the outliers, the remaining 75 samples were used for the following analysis.


	Fig. 2 The results of the variance of residuals versus the mean of residuals for the polysaccharide content of D. officinale samples. (a) The first step of MCS; (b) the second step of MCS.

3.2.2. Selection of the spectral data preprocessing methods. A PLS full spectrum model was developed with different data preprocessing methods. A 10-fold cross-validation was used to select the nLVs and the most suitable spectral data preprocessing using the whole sample set (75 samples). The spectral preprocessing was optimized based on the lowest RMSECV, highest R²_cv and few nLVs. According to Table 3, the best one was found when built with data pretreated using SNV combined with the SG 1st derivative (11 points, 3rd order polynomial) and has the lowest RMSECV, 0.0543, highestR²_cv, 0.8309, and only 6 nLVs, which is consistent with the work from Ref. 21. When there are overlapping peaks in the original NIR spectra, the SNV 1st derivative for the data pretreatment is usually useful to enhance the resolution, correct for scatter effects and for the baseline correction. The reason might be that the SG 1st derivative calculation removed both the additive and multiplicative effects in the spectra. The preprocessed spectra are shown in Fig. 1(b). It can be seen that most of the absorbance values were approximately zero, and the overlapping peaks and baseline effect were removed. The spectral differences of the samples were observed in several different regions at around 4000–4300 cm⁻¹ and 5750 cm⁻¹.

Table 3 The 10-fold cross-validation results using PLS with different data preprocessing methods

Pretreatment	nLVs	RMSECV	R²_cv
Original	14	0.0558	0.8211
Smooth + MSC	11	0.0539	0.8330
Smooth + SNV	6	0.0585	0.8036
SG 1st	12	0.0540	0.8330
SG 2nd	4	0.0651	0.7571
MSC + SG 1st	6	0.0543	0.8308
MSC + SG 2nd	6	0.0619	0.7800
SNV + SG 1st	6	0.0543	0.8309
SNV + SG 2nd	6	0.0619	0.7802

3.2.3. Full spectrum and wavelength selection models. There were 1557 variables in the NIR full spectral data. The full spectrum calibration model on the calibration set was developed and then used to make a prediction for validation on the independent test set. In addition, iRF, CARS and MC-UVE were employed to select wavelengths. All of the methods were conducted 100 times to get the best one because the Monte Carlo sampling they used would lead to different results each time.

When compared to the full spectrum model, the selected wavelengths model should meet the three following points: (1) improve the prediction performance; (2) reduce the number of wavelengths; (3) provide a better understanding and interpretation. The calibration and validation results of the full spectrum and wavelength selection methods are shown in Table 4.

Table 4 Results of D. officinale polysaccharide content using PLS models based on different wavelength selection methods

	Full spectrum	CARS	MC-UVE	iRF
a N.W is the number of wavelengths.
N.W^a	1557	39	339	364
nLVs	10	8	10	9
RMSECV	0.0549	0.0156	0.0260	0.0423
R²_cv	0.8397	0.9872	0.9640	0.9048
RMSEC	0.0101	0.0096	0.0010	0.0025
R²_cal	0.9946	0.9952	0.9999	0.9997
RMSEP	0.0542	0.0468	0.0533	0.0486
R²_pre	0.7978	0.8495	0.8044	0.8373

For the prediction of the full spectrum model, RMSEP and R²_pre were 0.0542 and 0.7978, respectively. The nLVs is 10. It can be observed that all of the wavelength selection methods perform better than the full spectrum PLS model based on the RMSEP, R²_pre and nLVs, which satisfies the first point of improving the prediction performance. Moreover, the number of selected wavelengths using CARS, MC-UVE and iRF, were 39, 339 and 364, which are also much less than the full spectrum with 1557 wavelengths. Thus, it demonstrates that the model can obtain a good prediction performance when eliminating the variables that are uninformative and have irrelevant information.

CARS and MC-UVE are the discrete wavelength selection methods, while iRF is a wavelength interval selection method. All of them are based on the PLS regression coefficient. Here we do not aim to prove whether discrete wavelength selection or the wavelength interval selection method is better. The performances of all of the wavelength selection methods are data dependent. In this work, for the determination of the polysaccharide content in D. officinale, by comparison of the three wavelength selection methods, the overall results indicated that CARS obtains the best prediction performance with the lowest RMSEP and R²_pre. The least nLVs also indicates that CARS can establish the most parsimonious PLS model. The reason may be that there are too many irrelevant variables in the full spectral data. CARS is an effective procedure to eliminate uninformative variables and improve the predictive precision of the model. Based on the exponentially decreasing function, CARS firstly eliminated a large number of wavelengths in the first stage and then in a refined way to select the wavelength. Although CARS runs fast, it is not stable. Thus, CARS should be conducted many times to obtain the best result.

As polysaccharides belong to carbohydrates, they contain aliphatic cyclic groups with attached OH groups and ether linkages. In order to understand and interpret the selected wavelengths in all of the wavelength selection methods for polysaccharides, they are displayed in Fig. 3. The wavelengths selected by MC-UVE are very scattered, resulting in MC-UVE performing a little better than the full spectrum model. CARS and iRF have a lot of common selected regions. As CARS performs the best in this work, the interpretation of the selected wavelengths focuses on CARS. We can see that the wavelengths selected by CARS are mostly concentrated in the regions 4000–4200 cm⁻¹, 4300–4450 cm⁻¹, 4700–5250 cm⁻¹, 5750–7300 cm⁻¹, 7900–8950 cm⁻¹ and 9000–10 [thin space (1/6-em)] 000 cm⁻¹. The absorption at 4000–4200 cm⁻¹ is related to C–H stretching and a C–C and C–O–C stretching combination.⁴⁸ The absorption at 4300–4450 cm⁻¹ corresponds to C–H stretching and a CH₂ deformation combination, while that at 4700–5100 cm⁻¹ corresponds to O–H bending, O–H stretching, a C–O stretching combination and an HOH bending combination.⁴⁸ The absorption at 5750–7300 cm⁻¹ is related to the first overtone of C–H stretching.⁴⁸ The absorption at 7900–8950 cm⁻¹ could be attributed to the first overtone of the O–H in polysaccharides,⁴⁹ while that at 9000–1000 cm⁻¹ corresponds to the second overtone of O–H.⁵⁰


	Fig. 3 The distribution of the selected variables obtained using different wavelength selection methods.

From the above points, it can be proven that wavelength selection is necessary and essential in multivariate calibration for the NIR analytical system.

Fig. 4 shows the correlation between the values determined using the phenol-sulphuric acid method and the valves predicted using the NIR full spectrum model (Fig. 4(a)) and CARS (Fig. 4(b)). The blue and red circles correspond to the calibration and independent test set, respectively. The diagonal line represents the ideal results. The closer the points are to the diagonal line, the better the model is. It can be found that the samples are distributed more closely to the diagonal line in Fig. 4(b), which shows a good spectral analysis performance for CARS. The results demonstrate the feasibility of using NIR spectroscopy combined with CARS for the determination of the polysaccharide content of D. officinale.


	Fig. 4 The correlation between the predicted values and measured values of polysaccharide content based on (a) the full spectra PLS model; (b) 39 selected wavelengths using CARS.

4. Conclusions

In this study, a rapid, cost-effective and non-destructive technique, namely NIR, coupled with a multivariate calibration method, PLS, for the determination of the polysaccharide content in D. officinale was demonstrated. The integrated step including outlier detection, data preprocessing and the establishment of a calibration model were introduced. Compared with the full spectrum model, three recent and often-used wavelength selection methods, including MC-UVE, CARS and iRF, were employed to demonstrate the good prediction performance, a reduction of the number of variables and a better understanding and interpretation of the selected wavelengths. Thus, wavelength selection is necessary in the multivariate calibration model in the NIR analytical system. When comparing the three wavelength selection methods, CARS performs the best with the lowest RMSEP, highest R²_pre and the fewest number of latent variables.

Therefore, NIR spectroscopy could provide a fast and green alternative to classical reference methods, as it dramatically reduces analysis time without any chemical reagents. The established method will significantly improve the efficiency of quality control. Furthermore, future work is the development of similar NIR spectroscopy calibration models coupled with a CARS algorithm for predicting the quantity of additional components in D. officinale, such as alkaloids, sesquiterpenoids and aromatic compounds. It should be noted that more attention should be paid to the robustness of calibration models through collecting more samples and introducing more wavelength selection methods.

Acknowledgements

The authors gratefully thank the National Natural Science Foundation of China for support of the projects (No. 21175157, 21375151 and 21275164), the China Hunan Provincial science and technology department for support of the project (No. 2012FJ4139), the Central South University for special support of the basic scientific research project (No. 2010QZZD007), and also the support by the Fundamental Research Funds for the Central Universities of Central South University (Grants No. 2014zzts014).

References

X. Chen, F. Wang, Y. Wang, X. Li, A. Wang, C. Wang and S. Guo, Sci. China: Life Sci., 2012, 55, 1092–1099 CrossRef CAS PubMed.
T. Ng, J. Liu, J. Wong, X. Ye, S. Wing Sze, Y. Tong and K. Zhang, Appl. Microbiol. Biotechnol., 2012, 93, 1795–1803 CrossRef CAS PubMed.
X. Q. Zha, J. P. Luo, P. Wei and S. Af, J. Bot., 2009, 75, 276–282 CAS.
X. Xing, S. W. Cui, S. Nie, G. O. Phillips, H. Douglas Goff and Q. Wang, Bioact. Carbohydr. Diet. Fibre, 2013, 1, 131–147 CrossRef CAS.
L. Xia, X. Liu, H. Guo, H. Zhang, J. Zhu and F. Ren, J. Funct. Foods, 2012, 4, 294–301 CrossRef CAS.
L.-H. Pan, X.-F. Li, M.-N. Wang, X.-Q. Zha, X.-F. Yang, Z.-J. Liu, Y.-B. Luo and J.-P. Luo, Int. J. Biol. Macromol., 2014, 64, 420–427 CrossRef CAS PubMed.
L.-Z. Meng, G.-P. Lv, D.-J. Hu, K.-L. Cheong, J. Xie, J. Zhao and S.-P. Li, Molecules, 2013, 18, 5779–5791 CrossRef PubMed.
S. P. Committee, Chinese Pharmacopoeia, People’s Medical Publishing House, Beijing, 2010 Search PubMed.
Y. Ozaki, W. F. McClure and A. A. Christy, Near-infrared spectroscopy in food science and technology, John Wiley & Sons, 2006 Search PubMed.
P. Williams and K. Norris, Near-infrared technology in the agricultural and food industries, American Association of Cereal Chemists, Inc., 1987 Search PubMed.
O. Escuredo, M. Carmen Seijo, J. Salvador and M. Inmaculada González-Martín, Food Chem., 2013, 141, 3409–3414 CrossRef CAS PubMed.
M. J. de la Haba, A. Garrido-Varo, J. E. Guerrero-Ginel and D. C. Pérez-Marín, J. Agric. Food Chem., 2006, 54, 7703–7709 CrossRef CAS.
J. Sarembaud, G. Platero and M. Feinberg, Food Anal. Methods, 2008, 1, 227–235 CrossRef.
E. Fernández-Ahumada, A. Garrido-Varo, J. Guerrero-Ginel, A. Wubbels, C. van der Sluis and J. van der Meer, J. Near Infrared Spectrosc., 2006, 14, 27–35 CrossRef.
L. León, A. Garrido-Varo and G. Downey, J. Agric. Food Chem., 2004, 52, 4957–4962 CrossRef PubMed.
M. Manley, G. du Toit and P. Geladi, Anal. Chim. Acta, 2011, 686, 64–75 CrossRef CAS.
D. Wu, H. Shi, S. Wang, Y. He, Y. Bao and K. Liu, Anal. Chim. Acta, 2012, 726, 57–66 CrossRef CAS PubMed.
W. Li, Y. Wang and H. Qu, Vib. Spectrosc., 2012, 62, 159–164 CrossRef CAS.
Q. Luo, Y. Yun, W. Fan, J. Huang, L. Zhang, B. Deng and H. Lu, RSC Adv., 2015, 5, 5046–5052 RSC.
H. Yan, B.-X. Han, Q.-Y. Wu, M.-Z. Jiang and Z.-Z. Gui, Spectrochim. Acta, Part A, 2011, 79, 179–184 CrossRef CAS PubMed.
Y. Chen, M. Xie, H. Zhang, Y. Wang, S. Nie and C. Li, Food Chem., 2012, 135, 268–275 CrossRef CAS.
Y. Wei, W. Fan, X. Zhao, W. Wu and H. Lu, Anal. Lett., 2014, 48, 817–829 CrossRef.
J. R. Lucio-Gutiérrez, J. Coello and S. Maspoch, Food Res. Int., 2011, 44, 557–565 CrossRef.
M. Blanco, J. Cruz and M. Bautista, Anal. Bioanal. Chem., 2008, 392, 1367–1372 CrossRef CAS PubMed.
S. Wold, M. Sjöström and L. Eriksson, Chemom. Intell. Lab. Syst., 2001, 58, 109–130 CrossRef CAS.
H.-D. Li, Y.-Z. Liang, X.-X. Long, Y.-H. Yun and Q.-S. Xu, Chemom. Intell. Lab. Syst., 2013, 122, 23–30 CrossRef CAS.
Y.-H. Yun, Y.-Z. Liang, G.-X. Xie, H.-D. Li, D.-S. Cao and Q.-S. Xu, Analyst, 2013, 138, 6412–6421 RSC.
J. H. Kalivas, N. Roberts and J. M. Sutter, Anal. Chem., 1989, 61, 2024–2030 CrossRef CAS.
D. Jouan-Rimbaud, B. Walczak, D. L. Massart, I. R. Last and K. A. Prebble, Anal. Chim. Acta, 1995, 304, 285–295 CrossRef CAS.
L. Xu and I. Schechter, Anal. Chem., 1996, 68, 2392–2400 CrossRef CAS.
C. H. Spiegelman, M. J. McShane, M. J. Goetz, M. Motamedi, Q. L. Yue and G. L. Coté, Anal. Chem., 1998, 70, 35–44 CrossRef CAS PubMed.
A. Lorber and B. R. Kowalski, J. Chemom., 1988, 2, 67–79 CrossRef CAS.
I. Guyon and A. Elisseeff, J. Mach. Learn. Res., 2003, 3, 1157–1182 Search PubMed.
H. Li, Y. Liang, Q. Xu and D. Cao, Anal. Chim. Acta, 2009, 648, 77–84 CrossRef CAS PubMed.
W. Cai, Y. Li and X. Shao, Chemom. Intell. Lab. Syst., 2008, 90, 188–194 CrossRef CAS.
Y. H. Yun, H. D. Li, L. R. E. Wood, W. Fan, J. J. Wang, D. S. Cao, Q. S. Xu and Y. Z. Liang, Spectrochim. Acta, Part A, 2013, 111, 31–36 CrossRef CAS PubMed.
D. S. Cao, Y. Z. Liang, Q. S. Xu, H. D. Li and X. Chen, J. Comput. Chem., 2010, 31, 592–602 CAS.
T. Naes, T. Isaksson, T. Fearn and T. Davies, A user friendly guide to multivariate calibration and classification, NIR Publications, 2002 Search PubMed.
Y. He, X. Li and X. Deng, J. Food Eng., 2007, 79, 1238–1242 CrossRef.
Q. Chen, J. Zhao and H. Lin, Spectrochim. Acta, Part A, 2009, 72, 845–850 CrossRef PubMed.
Q. Chen, J. Zhao, C. Fang and D. Wang, Spectrochim. Acta, Part A, 2007, 66, 568–574 CrossRef PubMed.
W. Dong, Y. Ni and S. Kokot, J. Agric. Food Chem., 2013, 61, 540–546 CrossRef.
D. Wu, Y. He, P. Nie, F. Cao and Y. Bao, Anal. Chim. Acta, 2010, 659, 229–237 CrossRef CAS PubMed.
L. Xie, X. Ye, D. Liu and Y. Ying, Food Res. Int., 2011, 44, 2198–2204 CrossRef CAS.
H.-D. Li, Q.-S. Xu and Y.-Z. Liang, Anal. Chim. Acta, 2012, 740, 20–26 CrossRef CAS PubMed.
R. D. Snee, Technometrics, 1977, 19, 415–428 CrossRef.
M. Bevilacqua, R. Bucci, A. D. Magrì, A. L. Magrì and F. Marini, Anal. Chim. Acta, 2012, 717, 39–51 CrossRef CAS PubMed.
J. Workman Jr and L. Weyer, Practical guide to interpretive near-infrared spectroscopy, CRC Press, 2007 Search PubMed.
The Technology of Modern Near Infrared Spectral Analysis, ed. W. Z. Lu, H. F. Yuan, G. T. Xu and D. M. Qiang, China Petrochemical Press, Beijing, 2000 Search PubMed.
X. B. Zou, J. W. Zhao, M. J. W. Povey, M. Holmes and H. P. Mao, Anal. Chim. Acta, 2010, 667, 14–32 CrossRef CAS PubMed.

Footnote

† The first two authors contributed equally to this work.

Click here to see how this site uses Cookies. View our privacy policy here.