Sebastian
Mirz
*,
Robin
Groessle
and
Alexander
Kraus
Karlsruhe Institute of Technology, PO Box 3640, 76021 Karlsruhe, Germany. E-mail: sebastian.mirz@kit.edu; Tel: +49 721 608 26694
First published on 3rd June 2019
Spectral pre-processing, especially baseline approximation, is a crucial part in quantitative spectroscopic applications, such as Raman or FTIR spectroscopy. Filters used for this task need to be optimized for their application, in order to achieve a sufficient baseline approximation while minimizing the distortion of the spectral lines. We propose a combined method that optimizes a rolling circle filter and quantifies the residual systematic influence on the spectral lines by a Monte Carlo approach that simulates and subsequently analyses spectra with known line properties and known maximum baseline curvature.
An excellent overview of baseline-removal techniques is given by H. G. Schulze et al.,5 containing e.g. polynomial or spline approximation methods, band-pass filters or derivative methods. Fully automated pre-processing methods have been implemented using the Savitzky–Golay6 or second derivative methods.7 The Savitzky–Golay filter8 is a smoothing-filter that obtains results similar to a moving average. Therefore, the filter tends to deliver uneven baselines. Modifications to improve this behaviour, e.g. as proposed by H. G. Schulze et al.,6 are time consuming just like the smoothing splines used by the second derivative method presented by C. Rowlands and S. Elliott.7
In this work, we use a geometrical rolling circle filter for baseline estimation. This filter was originally published by I. K. Mikhailyuk and A. P. Razzhivin.9 They demonstrated its application to 1-dimensional data in the form of Raman spectra and chromatograms and 2-dimensional electrophoresis patterns. The rolling circle filter for baseline-removal is commonly used in spectroscopic techniques based on Raman spectroscopy, such as composition analysis via micro-Raman spectroscopy of algae10,11 or the investigation of the structural phase transition of vanadium dioxide.12 For the continuous in-line concentration monitoring of hydrogen isotopologue mixtures, James et al.13 implemented a fully automatic spectral analysis based on a slightly modified rolling circle filter in combination with a Savitzky–Golay filter. Our application is the development of a real-time and inline measurement system for the concentration of liquid hydrogen isotopologues for the integration in a cryogenic distillation column in the fusion fuel cycle.3,4,14 We use the original rolling circle filter and apply it on the FTIR spectra of liquid hydrogen isotopologues, which delivers a sufficient baseline-removal without the necessity of an additional Savitzky–Golay filter for the characteristic background curvatures and peak shapes of these spectra.
In FTIR spectroscopy, a division by a reference spectrum, recorded with only the solvent in the measuring cell is typically used for baseline removal. However, in the case of gases or pure liquids as the sample, a smooth baseline remains to be removed after this division, which implies the use of a filter for pre-processing before the actual analysis. The parameters of the filter need to be optimized, depending on its application, to minimize the distortion of the spectral lines and preserve the important parameters for spectra analysis, namely the peak position, peak intensity (height or integral) and peak width. Since this optimization cannot completely avoid the distortion effects, the systematic influence of the filter on the peak parameters needs to be known, to be used either as a systematic uncertainty or for correction. We present a method, demonstrated on the example of a rolling circle filter, that both optimizes the filter and quantifies its systematic influence on the peak parameters namely width, intensity and position.
Since the circle radius strongly defines this baseline and needs to be adapted to the curvature of the input spectrum, it is indispensable to optimize this parameter depending on the application of the filter.
This plateau is created by the following effects:
• for small radii r, the filter-circle rolls into the signal peaks, therefore χ2 is small.
• for intermediate radii r – the plateau – the filter eliminates the background. Its radius is larger than the peak width, but smaller than the background curvature, therefore χ2 increases only slowly with increasing filter radius.
• For large radii r, the baseline detaches from the spectrum, as the filter radius becomes larger than the spectrum's curvature, leading to a rapid increase in χ2.
We also use the rolling circle filter for the presented optimization, however we apply three modifications in comparison to Mikhailyuk and Razzhivin.
First, we use the rolling circle filter for transmission spectra in absorption spectroscopy and therefore negative signals.
Second, we normalize the spectra to their value at the wavenumber 2500 cm−1.
Third, we use an ellipse instead of a circle, and therefore characterize our filter by two parameters. This allows us to minimize the calculation time, as the calculation-time depends on the size of the arrays holding the spectrum and the circle shape.
Since this extends our optimization problem to two dimensions r and rT, χ2 is represented by a matrix (dimension n × m). We can improve the search for kinks and the plateau in this matrix by analyzing the norm of the corresponding Hessian matrices for each element of the χ2 matrix. The Hessian matrices (dimension m × n = 2) are calculated from each element of the chi squared matrix according to . We then calculate the sub-multiplicative max-norm for each Hessian matrix. This results in a (n − 2) × (m − 2) matrix sensitive to the edges of the plateau in the χ2 matrix.
Parameter | Min. | Max. |
---|---|---|
r (cm−1) | 100 | 9000 |
r I | 0.1 | 25 |
(cm−1) | 5800 | 12000 |
Fig. 1 shows χ2 and the corresponding norms of the Hessian matrices for these parameter intervals. The edges of the plateau, which quadratically depend on the and T-radius, are clearly visible in the norm of the Hessian matrix. To illustrate the determination of the ideal rolling circle filter radius, Fig. 2 shows the χ2 dependency on the radius r for a fixed radii rT. Between the -radii of approximately 500 cm−1 and 1350 cm−1 for rT = 4.0 and 500 cm−1 and 2000 cm−1 for rT = 8.0 the previously discussed plateau in the χ2 dependency is visible. This shows that a smaller T-radius can be compensated by an also smaller radius in the direction.
Fig. 1 The χ2 matrix (a) and the corresponding norms of the Hessian matrices of the χ2 matrix ‖H‖ (b). |
For the radius rT = 4.0, χ2 shows a second plateau with the right edge at approximately r = 6000. To investigate its origin, we filtered a spectrum used for this optimization with different filter radii, see Fig. 3. The hereby determined baselines show a beginning detachment from the peak edges at the filter radii of r = 2500 and rT = 8.0.
The baseline created with the radii of r = 6000 and rT = 4.0 corresponds to the right edge of the second plateau in the χ2 dependency at a fixed rT = 4.0. For this larger filter ellipse, the baseline completely detaches from the peaks between = 5800 cm−1 and = 6800 cm−1 and = 7800 cm−1 and = 9100 cm−1. At these radii, the rolling circle filter is unable to create a baseline that compensates for the curvature that is caused by the transmission function of the KBr beam-splitter used in the FTIR spectrometer for the recording of this spectrum.
In conclusion, the right edge of the left plateau with the radii r = 2000 and rT = 8.0, or similarly r = 1350 and rT = 4.0, serves as an ideal radius according to our optimization procedure.
First, the optimal filter parameters enable the determination of a baseline for the filtered spectrum that minimizes the distortion of the spectral lines.
Second, as seen in Fig. 3, the optimal filter parameters prevent a detachment of the baseline from the spectrum in regions with high curvature. Depending on the spectrum, the optimal filter parameters are not necessarily represented by the global maximum, but rather by local maxima, in the matrix of the norms of the Hessian matrices of the χ2 matrix, and therefore, a careful selection of filter parameters is necessary.
Third, regarding the rolling circle filter, the optimal parameter curve in the T- radius space enables us to choose a small radius and therefore allows us to minimize the calculation time.
In a real spectrum, not only the line width and intensity, but also the baseline's curvature varies. Therefore, a simple optimization of a filter for spectral pre-processing cannot be performed on a single line, but has to be obtained for the spectral part that is the subject of the investigation. However, the result of such an optimization leads to a parameter set that is only optimal for an average line. Therefore, some lines are filtered perfectly, for some lines, the baseline doesn't touch the peak edges in the case of neighboring lines and for some lines, the baseline is already subtracting the actual peak area. The first effect is minimized by the division of the sample spectrum by a reference spectrum, in case the baseline's curvature is caused by an instrumental effect. The second effect, however, cannot be prevented if a single filter parameter set is used for lines with different widths. Therefore, the quantification of this effect is indispensable, if the line absorbance and width are the subject of the spectroscopic investigation.
To study the systematic influence of the filter on these, we implemented a method based on the simulation of random peaks on a random background. We set the constraint that the parameters defining the background and peaks are chosen randomly, but in a defined range. These spectra are then treated with the rolling circle filter and the peaks are analyzed to extract their parameters after filtering. There are three main reasons to rely on statistics, rather than doing a specific analysis for this simulation.
First, the correlations between the peak and background parameters and the influence of the filter on these are unknown beforehand. Therefore, since it is not known which input parameters lead to which result, direct non-randomized analysis is not possible.
Second, this method is intended as a general method not only applicable for one filter. For a specific filter, there might be specific methods to investigate possible correlations, a universal method however needs to rely on statistics.
Third, usually in a spectroscopic application, the final filtered spectrum is the result that is used to determine the peak width, integral and position. Therefore, the systematic influence on the peak shape must be known on the basis of the already filtered peak. A method based on statistics provides suitable tools to achieve this.
Line | μ (cm−1) | I | σ |
---|---|---|---|
L35 | 6644.8 | 0.008 | 6.7 |
L43 | 7156.9 | 0.018 | 3.0 |
L52 | 8858.6 | 0.038 | 20.1 |
L62 | 8868.3 | 0.064 | 7.4 |
Parameter | Minimum | Maximum |
---|---|---|
κ | — | 2.15 × 10−6 |
μ | 6050 | 9750 |
Σ | 2 | 50 |
I | 0.001 | 1 |
We place the simulated peaks on a random cubic spline background only defined by its maximum curvature. This kind of background b() is generated by randomly choosing curvature values with a given maximum absolute value. These curvature values are interpolated with cubic splines resulting in the curvature κ(). The non-linear second order differential equation for the background
b′′() = κ()(1 + b′()2)3/2 | (1) |
For simplicity and to avoid overlaps, we restrict the simulation to a single peak per spectrum. An example of a transmission spectrum simulated in this way is given in Fig. 4.
The simulated spectra are then filtered with the rolling circle filter with a radius of r = 2000 in the wavenumber and rT = 8 in the transmission direction and the absorbance is calculated. As input parameters for the following peak fit, the intensity, width and center of mass of the filtered peak are determined numerically. The peak is then fitted with a Gaussian shape using a least squares method with a Levenberg–Marquardt minimizer18,19 to determine its parameters. From the rolling circle filter baseline, the average gradient and curvature in the five-sigma interval surrounding the peak are numerically calculated.
First, broad peaks tend to vanish after filtering with the rolling circle filter, therefore only peaks with an intensity after filtering of I > 10−5, corresponding to 1% of the minimum initial intensity in the simulation, are used for the following analysis.
Second, in the case of a large peak curvature, the background can contain features similar to those of a broad peak, therefore only peaks where the standard deviation of the curvature in the five sigma interval is around the peak of σκ < 10−9 were accepted.
Fig. 5 shows four histograms for negative and positive gradients and curvatures. The histograms show a positive shift of the line position for negative gradients with a mean value of 0.13 cm−1 and a negative shift for positive gradients with a mean value of −0.33 cm−1. On a negatively curved background, the rolling circle filter similarly induces a positive shift of the line position of 0.08 cm−1 on average and a negative shift with a mean value of −0.28 cm−1 for positively curved backgrounds. To translate this to an uncertainty contribution to the line position induced by the rolling circle filter, we calculated 68% and 95% intervals of the histograms shown in Fig. 5. These intervals, serving as a measure for this uncertainty contribution, together with the symmetrical standard deviation of the distributions are shown in Table 4.
Criteria | Δμ (cm−1) | std. dev. (cm−1) | 68% interval (cm−1) | 95% interval (cm−1) | ||
---|---|---|---|---|---|---|
m > 0 | −0.33 | 0.57 | −0.90 | 0.24 | −1.45 | 0.80 |
m < 0 | 0.13 | 0.30 | −0.17 | 0.43 | −0.46 | 0.72 |
κ > 0 | −0.28 | 0.63 | −0.91 | 0.35 | −1.52 | 0.96 |
κ < 0 | 0.08 | 0.29 | −0.20 | 0.37 | −0.48 | 0.65 |
y(σ) = a·σ2 + b·σ3. | (2) |
Fig. 6 Correlation of the width difference Δσ and width after filtering σf. The data are selected according to the intervals given in Table 5. First, the data with all parameters in the first interval (all 1st int.) are shown. Then, only one parameter is switched to its second interval, and these data are labelled with the corresponding parameter and ‘2nd int.’. The solid line shows the fit of eqn (2) with the resulting confidence intervals. |
Fig. 7 Correlation of the intensity difference ΔI and width after filtering σf. The data are selected according to the intervals given in Table 5. First, the data with all parameters in the first interval (all 1st int.) are shown. Then, only one parameter is switched to its second interval, and these data are labelled with the corresponding parameter and ‘2nd int.’. The solid line shows the fit of eqn (2) with the resulting confidence intervals. |
Parameter | 1st interval | 2nd interval | ||
---|---|---|---|---|
Min | Max | Min | Max | |
σ | 5 | 25 | 25 | 50 |
I | 0.3 | 0.6 | 0.6 | 1.0 |
m | −5 × 10−5 | 0 | 0 | 5 × 10−5 |
κ | −5 × 10−7 | 0 | 0 | 5 × 10−7 |
The results of this regression, suitable for quantification and correction of the systematic uncertainty induced by the rolling circle filter, are shown in Table 6.
y() | a | b | red. χ2 |
---|---|---|---|
ΔI | 9.3 ± 4 × 10−5 | 1.7 ± 0.2 × 10−5 | 6.8 × 10−3 |
Δσ | −4.3 ± 0.5 × 10−3 | 5.6 ± 0.2 × 10−4 | 6.1 × 10−2 |
Regarding the peak width and intensity, there is a strong influence of the rolling circle filter, causing broad peaks to vanish completely after the filtering. Using the parametrizations of the systematic effect of the rolling circle filter on the peak intensity and width, it is possible to quantify this effect as a systematic uncertainty and the parametrization can be used to correct it for further analysis. This intensity correction makes the rolling circle filter suitable for investigation of the intensities of spectral lines as it is for line positions.
In our case, line intensities are used for concentration measurement, and this is a crucial result on the way to a calibration that is independent of the instrument used. A transfer of the calibration results to a different experimental setup is only possible if the systematic influences of the respective instrument and analysis procedure are known and can be corrected, if significant. Also, if the measurement samples significantly differ from the calibration samples, the performed quantification and correction of the systematic effects of the filter is the only possibility to obtain reliable measurement values in such a setting. Both issues can be solved with the presented combination of filter optimization and residual systematic uncertainty quantification.
The method can, with small modifications, be applied to different filters, since filters for baseline approximation should always provide a smooth baseline without extreme curvatures or even edges. Under these conditions, χ2 should be a suitable measure for the quality of the baseline approximation. Furthermore, the creation of a filter bank with different filter settings, e.g. radii of the rolling circle filter, for different spectral regions can be a promising content of further studies. This can deliver a significant advantage compared to a single filter setting, in the case of strongly varying line widths or background curvature. The presented method for the optimization of the rolling-circle-filter and the quantification of the remaining systematic effects on the peak shape can be the basis for a study that optimizes and compares different filter methods, such as rolling circle, averaging, wavelet and derivative filters. While such a systematic comparison and evaluation is beyond the scope of the presented work, we will address this in a future publication on this topic.
This journal is © The Royal Society of Chemistry 2019 |