Tiejun
Chen
a,
YoungJae
Son
a,
Changqing
Dong
b and
Sung-June
Baek
*a
aDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, South Korea. E-mail: tozero@jnu.ac.kr; Tel: +82 62-530-1795
bDepartment of Auckland Bioengineering Institute, University of Auckland, 6/70 Symonds street, Grafton, Auckland 1010, New Zealand
First published on 24th April 2025
Raman spectroscopy requires baseline correction to address fluorescence- and instrumentation-related distortions. The existing baseline correction methods can be broadly classified into traditional mathematical approaches and deep learning-based techniques. While traditional methods often require manual parameter tuning for different spectral datasets, deep learning methods offer greater adaptability and enhance automation. Recent research on deep learning-based baseline correction has primarily focused on optimizing existing methods or designing new network architectures to improve correction performance. This study proposes a novel deep learning network architecture to further enhance baseline correction effectiveness, building upon prior research. Experimental results demonstrate that the proposed method outperforms existing approaches by achieving superior correction accuracy, reducing computation time, and more effectively preserving peak intensity and shape.
Raman spectra typically contain two main types of noise: background noise, also known as baseline, and additional noise caused by external conditions and auto-fluorescence.7 The presence of these noise affects the accuracy of subsequent studies and analyses. Consequently, pre-processing steps, particularly baseline correction, have become essential in most Raman spectroscopy applications. Over the years, various methods for baseline correction have been developed, garnering increasing attention from researchers.
Baseline correction methods are generally categorized into two main approaches: traditional mathematical methods and deep learning-based methods. Traditional methods include wavelet transform methods,8–10 polynomial fitting methods,11,12 and penalized least squares methods.13–15 In contrast, deep learning methods primarily utilize deep neural networks to perform regression tasks to predict and correct baselines. Research on deep learning-based baseline correction has focused on designing more efficient network structures or improving existing methods to address their limitations.
Among traditional baseline correction methods, the asymmetrically reweighted penalized least squares (arPLS) method13 and the adaptive smoothness parameter penalized least squares (asPLS) method14 are both based on penalized least squares. The arPLS method is considered one of the most effective methods for baseline correction, as it prevents the overestimation of the baseline and ensures satisfactory curve fitting in non-peak regions. However, it tends to misinterpret the tails of peaks as non-peak regions due to their relatively low intensity. The asPLS method mitigates this limitation by introducing a coefficient vector, which dynamically adjusts the arPLS smoothness parameter. This adjustment assigns a larger value to peak regions and a smaller value to non-peak regions, thereby improving the accuracy of baseline correction in areas with low intensity.
These traditional baseline correction methods share a common characteristic: they require human discretion to obtain optimal corrected results. In other words, these methods typically rely on manually selecting appropriate wavelet bases, polynomial orders, and balance parameters to achieve ideal correction outcomes. Consequently, practical applications demand significant time and effort to fine-tune these parameters based on prior knowledge to enhance correction accuracy.
Deep learning-based baseline correction methods effectively address the limitations of traditional approaches. Once the deep learning model is trained, it can be applied directly without additional adjustments, significantly reducing the time and effort required. Furthermore, deep learning methods generally demonstrate superior baseline correction performance compared to traditional techniques, leading to increased interest in applying these models to Raman spectroscopy.
However, training deep learning models requires large amounts of labeled data, posing challenges for the scarce and often unlabeled Raman spectral data. To address this issue, Liu manually extracted peak and baseline information from actual spectral data and generated training data by randomly combining peaks, baselines, and additive noise.16 However, this approach cannot always guarantee the quality of the generated data. In response, Chen et al. proposed an improved method that uses mathematical models to generate peaks, baselines, and additive noise separately, and constructing training data by randomly combining these components to enhance data quality.17
In this study, we introduce a novel triangular deep convolutional network (TDCN) for baseline correction, which is trained using synthetically generated spectral data.17 Through both qualitative and quantitative analyses of simulated and actual Raman spectra, the proposed model demonstrates superior baseline correction performance when compared to traditional methods, while also significantly reducing inference time, thus enhancing the overall efficiency. Additionally, to evaluate the effectiveness of our model, we compared it with an existing deep learning model, confirming the improvements achieved by the proposed model.
The simulated spectral data contain a pure spectral signal (peak), baseline, and additive noise, expressed mathematically as follows:
| s[i] = (1 − β)p[i] + βb[i] + n[i], | (1) |
The peak p[i] is simulated by using the following Hanning window function:
![]() | (2) |
The value of peak width, N, was set to a range of 5–21, and the minimum height of the peak was set to 0.05 in this experiment. The number of spectral peaks was set between 5 and 15. The baseline was modeled by cubic spline interpolation using 2–7 anchor points generated randomly. Using these anchor points, various baselines were generated for better prediction performance. The additive noise was represented by Gaussian white noise.
The final simulated spectra were generated by mixing the three elements mentioned above. The baseline mixing ratio, β, ranges from 0.1 to 0.8, while the peak mixing ratio is 1 − β. To ensure compatibility with the deep learning model, the data length was set to 512 and the intensity was normalized to a range of 0–1 using the min–max normalization method.
Fig. 1 shows an example of a simulated spectrum and its baseline. Compared to the spectral data generation method proposed by Liu,16 this method produces more natural and high-quality spectra. Additionally, the training data generated by this method are more diverse, enhancing the representativeness and applicability of the dataset. In this study, 128
000 training samples and 32
000 validation samples were generated to train the deep learning model.
One of the most well-known deep CNN architectures, ResNet, employs a residual learning framework to address the vanishing gradient problem that arises from increasing network depth. This framework enables the extraction of more representative features from the input data.22 The residual framework primarily includes two types of shortcut connections: identity shortcuts and projection shortcuts. The projection shortcut connection adjusts the output feature size of the previous layer through downsampling, allowing it to be added to the output features of the subsequent layer.
The UNet architecture was originally designed for medical image segmentation.23 It consists of two main components: a contracting path and an expansive path. The contracting path functions similarly to a conventional deep CNN, extracting features from the input data. The expansive path performs upsampling on the feature maps and integrates them with features from the contracting path through skip connections. This skip connection framework allows UNet to recover lost pixel information, enhancing the accuracy of segmentation.
In this study, we propose a deep learning network architecture called the Triangular Deep Convolutional Network (TDCN), which integrates design concepts from both ResNet and UNet. As illustrated in Fig. 2(a), the network features a triangular-shaped structure composed of multiple convolutional cells. Each cell can be constructed in different ways, as depicted in Fig. 2(b), (c) and (d). To distinguish and describe their characteristics, we refer to these cells as simpleCell, simpleResCell, and ResCell, respectively. A common feature of these cells is the fusion of outputs from surrounding cells through addition, while their primary difference lies in the implementation of the residual block.
![]() | ||
| Fig. 2 Architecture of the proposed TDCN: (a) overall network structure, (b) SimpleCell unit, (c) SimpleResCell unit, and (d) ResCell unit. | ||
Let xi,j represent the output from the convolutional cell Ci,j in a deep learning network, where i denotes the row index and j denotes the column index. The output of each cell xi,j can be computed using the following formula:
![]() | (3) |
Here, the cell, C(·), is an integrated convolution block applied to the fused features from the preceding cell's output. The functions D(·) and U(·) correspond to down-sampling and up-sampling operations, which are implemented using a convolutional layer (Conv1d) and a transposed convolutional layer (ConvTranspose1d), respectively. It is important to note that Fig. 2 only illustrates the structure of the cell outlined by the red dashed line, while the structures of other cells are adapted based on their inputs. For instance, the cell, C0,0, receives the raw spectrum as input, making its structure relatively simple and consisting of a single-input convolution.
The key distinction between SimpleCell and SimpleResCell lies in the use of residual connections. SimpleResCell employs a residual structure, whereas SimpleCell does not. Meanwhile, the difference between SimpleResCell and ResCell lies in the scope of the residual structure: SimpleResCell applies the residual structure only after fusing outputs from different cells, whereas ResCell applies it throughout the entire cell.
We selected ResUNet17 as the deep learning method for comparison. This model combines the advantages of ResNet and UNet. While its overall structure resembles UNet, ResUNet incorporates residual connections, which enhance the performance compared to both ResNet and UNet. To comprehensively evaluate the effectiveness of the proposed approach, we used ResUNet as a benchmark model, enabling a direct comparison with our method and an indirect comparison with ResNet and UNet.
A distinctive feature of the proposed model is its simultaneous execution of contraction and expansion within each cell, unlike UNet, which separates the contracting and expanding paths. This architecture allows the model to receive and integrate multi-dimensional outputs from adjacent cells, effectively capturing spatial relationships within the data.
The experimental results are presented in two parts. The first part utilizes simulated spectra with standard baselines to assess the efficacy of the proposed method. Specific types of simulated spectral signals are employed for qualitative analysis, while a generated test dataset is used for quantitative analysis. The second part is based on actual Raman spectra to demonstrate the baseline correction effectiveness of the proposed method, while its results are compared to those of traditional methods to highlight the differences between their approaches.
To comprehensively evaluate the performance of the proposed method, we chose the existing traditional mathematical method and the deep learning method with proven efficacy for comparative experiments. In the analysis of results, due to the absence of significant qualitative differences between deep learning methods, our focus is on visually illustrating the differences between deep learning and traditional methods through qualitative analysis. Additionally, we conducted a quantitative comparison of the baseline correction performance of each method to show the superiority of the proposed approach.
000 training samples were generated to train the deep learning model, with an additional 32
000 and 20
000 samples generated for validation and quantitative analysis, respectively. To qualitatively evaluate the feasibility and superiority of the proposed method for actual Raman spectra, experiments were conducted using spectra from 10 different substances recorded using two distinct Raman spectroscopy systems. The first system was a Renishaw 2000 Raman microscope equipped with a 514.5 nm argon ion laser and the second was an inVia Inspector portable Raman system utilizing a 632.8 nm He–Ne laser.
In the experiments, the optimization algorithm for the deep learning model was the adaptive moment estimation (Adam) algorithm. The learning rate was set to 5 × 10−4 and the batch size was set to 500. The root mean square error (RMSE) was used as the training loss function, while the mean absolute error (MAE) served as the validation loss function. The RMSE and MAE are defined as follows:
![]() | (4) |
![]() | (5) |
The deep learning model was trained for a total of 1000 epochs. If the validation loss decreased, the current model weights were saved. The training strategy included a learning rate decay, where the learning rate was reduced to 4/5 of its current value if the validation loss value did not change over a span of 75 epochs. Training was terminated if the learning rate was reduced to 1/8 of the original value.
The effectiveness and superiority of the proposed method were verified using simulated data. To facilitate qualitative comparison of the performance of the proposed method and traditional methods under varying peak and baseline conditions, we generated simulated peaks using a Gaussian function and constructed baselines using quadratic and cubic polynomials. The simulated pure signals, which consist of three Gaussian peaks, were constructed as follows:
![]() | (6) |
![]() | (7) |
Fig. 3(b) displays the generated simulated spectral signal alongside the baseline correction results obtained using both the traditional method and the proposed method. To comprehensively evaluate the performance of the proposed method, we introduced additive noise and baselines of varying intensities. The peaks and baselines were combined at different ratios. As shown in the figure, the height of the peaks varies with the combination ratio, even when the same baseline is used.
The proposed method was compared with the asPLS method for baseline correction, which has demonstrated excellent performance in its respective domain. Prior to the experiments, optimal parameters for asPLS were selected through preliminary testing to ensure the best results using the 20
000 test data. The results from all experiments with simulated data were evaluated using the RMSE and the MAE.
The smoothness parameter λ of asPLS significantly influences the baseline correction results. If λ is set too large or too small, it will adversely affect the correction performance. It is recommended that λ varies on a logarithmic scale24 with values ranging from 102 to 108 for optimal parameter selection. As shown in Fig. 3(c), the MAE value is minimized when λ is set to 104 for baseline correction.
From the four baseline correction results, it is evident that the traditional method generally overestimates the baseline compared to the proposed method, particularly in regions with wider peaks. Moreover, for data with high baseline ratios and high-intensity additive noise, the overestimation of the baseline by the traditional method is more pronounced. In the spectra with high-intensity additive noise, the presence of noise significantly impacts the baseline correction performance of the traditional method, particularly in non-peak regions. Overall, qualitative analysis indicates that the proposed method offers significant advantages over the traditional method, particularly under complex conditions involving both additive noise and varying baseline characteristics.
To further evaluate the performance, quantitative comparative experiments were conducted between the proposed method, traditional methods, and other deep learning approaches using a test dataset of 20
000 samples. Table 1 presents the quantitative analysis results for various methods, including the MAE, RMSE, and MAE variance for the test data. It is clear from the table that the deep learning-based baseline correction method outperforms the traditional method in terms of overall performance. Among the proposed network architectures, the SimpleCell model demonstrates strong performance in terms of MAE variance. However, the SimpleResCell model achieves the lowest MAE and RMSE values, indicating superior baseline correction accuracy. While ResUNet also performs well overall, the SimpleResCell model outperforms it across all evaluation metrics.
| Methods | MAE (×10−4) | RMSE (×10−4) | Variance (×10−7) | Param. |
|---|---|---|---|---|
| SimpleCell | 5.145 | 6.719 | 1.004 | 0.877 M |
| SimpleResCell | 4.876 | 6.342 | 1.029 | 0.877 M |
| ResCell | 5.347 | 7.033 | 1.196 | 1.274 M |
| ResUNet | 5.062 | 6.609 | 1.097 | 0.898 M |
| asPLS | 21.22 | 63.86 | 22.542 | N/A |
Deep learning methods are slightly faster than the traditional method, while the proposed model requires marginally more time than ResUNet due to the integration of outputs from surrounding cells through element-wise addition. However, it is important to note that the inference time of the traditional method can vary depending on the maximum number of iterations and the adjustment of termination conditions. Similarly, the inference time of the deep learning methods may vary slightly based on the implementation of the output stage.
The baseline correction results are presented in Fig. 4. To highlight the differences between the deep learning method and traditional method for baseline correction of real Raman spectra, we also included the correction results from the traditional method. Consistent with the analysis of simulated data, the traditional method tends to overestimate the baseline, leading to inaccurate preservation of the corrected peak intensities. In contrast, the corrected spectra obtained using the proposed method appear more natural and desirable.
The real Raman spectra used in this experiment exhibit various baselines. The experimental results demonstrate that the proposed method effectively removes these baselines across different spectra. Since the proposed method performs well not only with synthetic spectra but also with real spectra, it is expected to be a valuable preprocessing technique for a range of spectra, including FT-IR, FT-NIR, and others, in addition to Raman spectra.
Through both qualitative and quantitative analyses of all methods on simulated spectra, we have verified the overall superiority of the proposed method. The experimental results demonstrate that this method accurately estimates the baseline of spectra with varying baselines and additive noise, particularly in peak areas with wider amplitudes. Moreover, compared to traditional methods, the inference time of deep learning methods is significantly reduced. The proposed method also shows robust performance in the presence of higher levels of additive noise, with minimal impact on baseline correction in non-peak regions.
To further validate the effectiveness of the proposed method in practical applications, we conducted experiments using various Raman spectra recorded using different instruments. The results show that the baseline correction achieved by the proposed method is natural and reasonable for real Raman spectra. The experimental results align with the analysis of the simulated spectra, further demonstrating the superiority and broad applicability of the proposed method in baseline correction tasks for Raman spectra. We anticipate that the proposed method can be applied not only to Raman spectra but also to various IR spectra.
However, the proposed model exhibits a higher parameter growth rate as the network depth increases compared to the other models. Additionally, the integration of outputs from surrounding cells through element-wise addition increases the inference time. In the future, we plan to conduct further research to alleviate this computational requirement.
| This journal is © The Royal Society of Chemistry 2025 |