Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data

Roman M. Balabin; Sergey V. Smirnov

doi:10.1039/C2AN15972D

Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data†

Roman M. Balabin*^a and Sergey V. Smirnov^b

* Corresponding authors

^a Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich, Switzerland
E-mail: balabin@org.chem.ethz.ch
Tel: +41-44-632-4783

^b Unimilk Joint Stock Co., 143421 Moscow region, Russian Federation

Abstract

Modern analytical chemistry of industrial products is in need of rapid, robust, and cheap analytical methods to continuously monitor product quality parameters. For this reason, spectroscopic methods are often used to control the quality of industrial products in an on-line/in-line regime. Vibrational spectroscopy, including mid-infrared (MIR), Raman, and near-infrared (NIR), is one of the best ways to obtain information about the chemical structures and the quality coefficients of multicomponent mixtures. Together with chemometric algorithms and multivariate data analysis (MDA) methods, which were especially created for the analysis of complicated, noisy, and overlapping signals, NIR spectroscopy shows great results in terms of its accuracy, including classical prediction error, RMSEP. However, it is unclear whether the combined NIR + MDA methods are capable of dealing with much more complex interpolation or extrapolation problems that are inevitably present in real-world applications. In the current study, we try to make a rather general comparison of linear, such as partial least squares or projection to latent structures (PLS); “quasi-nonlinear”, such as the polynomial version of PLS (Poly-PLS); and intrinsically non-linear, such as artificial neural networks (ANNs), support vector regression (SVR), and least-squares support vector machines (LS-SVM/LSSVM), regression methods in terms of their robustness. As a measure of robustness, we will try to estimate their accuracy when solving interpolation and extrapolation problems. Petroleum and biofuel (biodiesel) systems were chosen as representative examples of real-world samples. Six very different chemical systems that differed in complexity, composition, structure, and properties were studied; these systems were gasoline, ethanol–gasoline biofuel, diesel fuel, aromatic solutions of petroleum macromolecules, petroleum resins in benzene, and biodiesel. Eighteen different sample sets were used in total. General conclusions are made about the applicability of ANN- and SVM-based regression tools in the modern analytical chemistry. The effectiveness of different multivariate algorithms is different when going from classical accuracy to robustness. Neural networks, which are capable of producing very accurate results with respect to classical RMSEP, are not able to solve interpolation problems or, especially, extrapolation problems. The chemometric methods that are based on the support vector machine (SVM) ideology are capable of solving both classical regression and interpolation/extrapolation tasks.

Analyst

Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data

Social activity

Search articles by author

Spotlight

Advertisements