Open Access Article
Shanshan Zhu
a,
Xiaoyu Cuiad,
Wenbin Xub,
Shuo Chen*ad and
Wei Qianc
aSino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, China 110169. E-mail: chenshuo@bmie.neu.edu.cn; Tel: +86 24 83680230
bScience and Technology on Optical Radiation Laboratory, Beijing, China 110854
cCollege of Engineering, University of Texas at El Paso, El Paso, USA 79968
dKey Laboratory of Data Analytics and Optimization for Smart Industry (Northeastern University), Ministry of Education, China
First published on 25th March 2019
Raman spectroscopy is a label-free and non-destructive spectroscopic technique that has been explored for bacterial identification. However, noise often interferes with the interesting Raman peaks because the Raman signal is inherently weak, especially for bacterial samples. Although this problem can be solved by increasing the exposure time or the power of the excitation laser, a longer acquisition time is required or the risk of sample damage is increased. In contrast, short exposure time and low laser power often lead to inadequate acquisition of Raman scattering, in which the Raman spectra with low signal-to-noise ratio (SNR) is difficult to be further analyzed. In order to quickly and accurately characterize biological samples by using low SNR Raman measurements, a weighted spectral reconstruction based method was developed and tested on Raman spectra with low SNR from 20 bacterial samples of two species. Principal component analysis followed by support vector machine was applied on the reference Raman spectra and the spectra recovered from the low SNR Raman measurements by the proposed method, the traditional spectral reconstruction method, and four other commonly used de-noising methods for the discrimination of bacterial species. The results showed that a classification accuracy of 90% was achieved based on our method, which was comparable to that of the reference Raman spectra and showed significant advantages over other spectral recovery methods. Therefore, the weighted spectral reconstruction method can preserve the most biochemical information for the bacterial species' identification while removing the noise from the low SNR Raman spectra, in which the advantages of lesser sample damage and shorter acquisition time would promote wider biomedical applications of Raman spectroscopy.
In recent years, Raman spectroscopy has been explored in the detection and identification of bacteria as a fast, non-destructive, and label-free spectroscopic technique based on measuring the inelastic scattering of photons by vibrating molecules or crystal lattices.8,9 Based on the specific vibrational modes of the molecules, a large amount of qualitative and quantitative biochemical information enables the identification and differentiation of bacterial biochemical components.10,11 Unfortunately, noise often interferes with the interesting Raman peaks because the Raman signal is inherently weak, especially for bacterial samples.12,13 Although this problem can be solved by increasing the exposure time or the power of the excitation laser, a longer acquisition time is required or the risk of sample damage is increased. If a short exposure time and low laser power is applied during the Raman measurement, inadequate acquisition of Raman scattering would lead to low signal-to-noise ratio (SNR) measurements, which makes the further spectral data processing and analysis difficult. Therefore, a method to quickly and accurately discriminate the bacterial species by using low SNR Raman measurements with low laser power and short exposure time can potentially solve the above problems.
In this study, a weighted spectral reconstruction based method was newly developed and tested on Raman spectra with low SNR equal to 0.98 from 20 bacterial samples of two species, i.e., Pseudomonas aeruginosa and Staphylococcus aureus. For identifying the different bacterial species, principal component analysis (PCA) followed by support vector machine (SVM) was applied on the reference Raman spectra (high SNR), low SNR Raman spectra, and low SNR Raman spectra processed by the proposed method, traditional spectral reconstruction method, and four other commonly used de-noising methods, i.e., Savitzky–Golay (SG) algorithm, wavelet transform, finite impulse response (FIR) filtration, and factor analysis. Compared with other methods, the proposed method demonstrated significant improvement in the spectral recovery and higher accuracy in the discrimination of bacterial species, in which the classification accuracy of the Raman spectra recovered by the proposed method was even comparable to that of the reference Raman spectra with high SNR. Therefore, our method demonstrated significant potential in the rapid and accurate bacterial species' identification based on low SNR Raman spectra, wherein the sample damage was lesser and a shorter acquisition time was required.
000 rpm for five minutes to concentrate and purify the bacterial samples. The bacterial samples were rinsed and immersed twice in distilled water to wash away the culture medium, and finally resuspended in 100 μL distilled water. To prepare the samples for Raman measurements, 2 μL of the suspension was repeatedly dropped and air dried at the same location on an aluminum foil, in which the bacterial samples were concentrated and a relatively higher Raman signal could be achieved.14
Traditional spectral reconstruction was performed to retrieve the Raman spectra from the low SNR Raman measurements,15–17 in which a calibration data set was required, as shown in Fig. 1. In the calibration data set, both the reference Raman spectra and the narrow-band measurements derived from the low SNR Raman spectra were included, whereas the test data set contained only the low SNR Raman spectra. The narrow-band measurements were numerically calculated by the inner production of low SNR Raman spectra and the transmittance of the non-negative PC based filters,18 in which the first six non-negative principal component (PC) based filters were used. In the calibration stage, the Wiener matrix W was calculated based on the calibration dataset to extract the relation between the narrow-band measurements Ccal and the reference Raman spectra Rref, as shown in eqn (1).
| W = E(RrefCcalT)[E(CcalCcalT)]−1 | (1) |
The weighted spectral reconstruction was a newly developed method and different from the traditional spectral reconstruction method, in which the ensemble average was replaced by the weighted average when constructing the weighted Wiener matrix Ŵ, as shown in eqn (2).
![]() | (2) |
![]() | (3) |
![]() | (4) |
Besides the traditional spectral reconstruction and weighted spectral reconstruction methods, SG algorithm, wavelet transform, FIR filtration, and factor analysis were applied on the same set of low SNR Raman spectra for comparison. For the SG algorithm, each part of the original spectrum with a selected window size was fitted to a polynomial function for smoothing purpose.20 In contrast, the wavelet transform, FIR filtration, and factor analysis commonly remove noise by filtering techniques. For the wavelet transform,21,22 the spectral data were decomposed into the wavelet domain by various wavelet basis and reconstructed after noise removal by certain thresholds. The FIR filtration is a linear filtration technique, in which a window-based FIR filter is designed based on the frame size and cut-off frequency and was subsequently used for noise removal in this study.23 For factor analysis,24 the original spectral information is projected into the linear combination of a certain number of subspectra, and those subspectra related to the noise can be subsequently removed. The parameters of each de-noising method and the range of the parameters are shown in Table 1.
| Method | Parameter | Parameter range |
|---|---|---|
| Traditional spectral reconstruction | Number of non-negative PC scores based filters | 6 |
| Weighted spectral reconstruction | Number of non-negative PC scores based filters | 6 |
| Power | −0.1 to −10 | |
| SG algorithm | Window size | 3 to 729 |
| Polynomial degree | 1 to 9 | |
| Wavelet transform | Wavelet basis | Common wavelet filters built in Matlab |
| Decomposition level | 1 to 10 | |
| Threshold | Soft threshold or hard threshold: threshold value were selected according to the Birge–Massart strategy | |
| FIR filtration | Frame size | 2 to 243 |
| Cut-off frequency | 1 × 10−10 to 1 | |
| Factor analysis | Number of subspectra | 1 to 20 |
After the low SNR Raman spectra were de-noised, the broad and slowly varying fluorescence background was estimated by using the fifth order polynomial fitting and subtracted from the original spectra.25 Normalization was subsequently performed on each Raman spectrum by dividing the Raman intensity at each wavenumber by the summation of the Raman intensities at all the wavenumbers. In order to evaluate the accuracy of the recovered Raman spectra, the above fluorescence background removal algorithm and normalization were applied on the corresponding reference Raman spectra as well and the mean relative RMSE26 was used as the metric.
| Low SNR Raman spectra | Traditional spectral reconstruction | Weighted spectral reconstruction | SG algorithm | Wavelet transform | FIR filtration | Factor analysis | |
|---|---|---|---|---|---|---|---|
| Mean relative RMSE | 1.98 × 10−1 | 8.21 × 10−2 | 7.86 × 10−2 | 1.47 × 10−1 | 1.45 × 10−1 | 1.54 × 10−1 | 1.48 × 10−1 |
Table 3 shows the comparison of the classification accuracy, sensitivity, and specificity of Pseudomonas aeruginosa and Staphylococcus aureus from the Raman spectra after fluorescence background removal and normalization of the reference Raman spectra, low SNR Raman spectra, and Raman spectra recovered by using the traditional spectral reconstruction method, weighted spectral reconstruction method, SG algorithm, wavelet transform, FIR filtration, and factor analysis. For identifying Pseudomonas aeruginosa and Staphylococcus aureus, the Raman spectra recovered by both traditional and weighted spectral reconstruction methods can achieve a classification accuracy of 90%, which was exactly the same as that of the reference Raman spectra and showed significant advantages over the other commonly used de-noising methods as well as the results of the low SNR Raman spectra. Furthermore, the weighted spectral reconstruction method also successfully demonstrated exactly the same sensitivity and specificity compared to the reference Raman spectra, whereas the traditional spectral reconstruction failed. Although the specificity of the traditional spectral reconstruction is the highest, it sacrifices the sensitivity, as shown in Table 3, and we believe some improper prior information is used during the spectral recovery process of the traditional spectral reconstruction. Thus, the higher spectral recovery accuracy of the weighted spectral reconstruction method is indeed critical for better performance in the following spectral data analysis. In practical applications, the choice of the traditional spectral reconstruction and weighted spectral reconstruction should be mainly dependent on its specific applications, in which the compromise between time efficiency and spectral recovery accuracy should be considered. Interestingly, the classification accuracy did not fully comply with the mean relative RMSE, i.e., the agreement between the recovered Raman spectra and the reference Raman spectra. The reason might be that most of the information is preserved whereas some critical information for bacterial identification is lost during the noise removal, especially for the SG algorithm, FIR filtration, and factor analysis methods. The classification results of these three methods are even lower than that of the low SNR Raman spectra, indicating that more information is lost compared to the information gained during the spectral recovery by these three methods. This can be attributed to the fact that the importance of the information cannot be distinguished by these commonly used de-noising methods. For the SG algorithm, the weak features in the Raman spectra comparable to the noise level can be easily smoothed out during noise removal, resulting in some shifted Raman peaks and the distorted spectral shape.20 By the FIR filtration method, some important spectral shape information is lost simultaneously, while the noise is well removed. Although the Raman spectra de-noised by the SG algorithm and FIR filtration methods retain some important information about the peak locations and the spectral shape, the information regarding the discrimination of the two bacterial samples might be removed during noise removal, resulting in relatively low classification accuracy of only 70%. The classification accuracy of the factor analysis method was the lowest among all the methods, whereas the mean relative RMSE was not the worst. The reason is that factor analysis loses the ability to decompose the noise and signal when the SNR is extremely low, thus, plenty of important Raman peaks for bacterial discrimination as well as the noise are smoothed out simultaneously.24
| Reference Raman spectra | Low SNR Raman spectra | Traditional spectral reconstruction | Weighted spectral reconstruction | SG algorithm | Wavelet transform | FIR filtration | Factor analysis | |
|---|---|---|---|---|---|---|---|---|
| Classification accuracy | 90% | 75% | 90% | 90% | 70% | 85% | 70% | 35% |
| Sensitivity | 90% | 80% | 80% | 90% | 70% | 90% | 50% | 40% |
| Specificity | 90% | 70% | 100% | 90% | 70% | 80% | 90% | 30% |
To further evaluate the performance of the PCA-SVM-based classification model for bacterial species, the ROC curves were generated at different threshold levels for different groups of Raman spectra, respectively. The integrated area under the ROC curve (AUC) is a quantitative indicator used to represent the classifier performance, in which the larger AUC value usually means that the classifier has higher prediction accuracy.32 According to the results in Fig. 4, the integrated areas under the ROC curves (AUC) are 0.96, 0.79, 0.98, 0.99, 0.66, 0.86, 0.78, and 0.32 for the reference Raman spectra, low SNR Raman spectra, and Raman spectra recovered from the low SNR Raman measurements using the traditional spectral reconstruction, weighted spectral reconstruction, SG algorithm, wavelet transform, FIR filtration, and factor analysis, respectively. Thus, the Raman spectra recovered from the low SNR Raman measurements using the weighted spectral reconstruction method demonstrates the strongest ability of bacterial species' identification with high sensitivity and specificity, and even outperforms the reference Raman spectra with high SNR. The reason might be that the spectral reconstruction procedure can remove some of the useless information within the reference Raman spectra, which may provide negative impacts on the bacterial species' identification.
Fig. 5 shows the average Raman spectra based on the reference Raman spectra and Raman spectra recovered by the spectral reconstruction method of Pseudomonas aeruginosa and Staphylococcus aureus, respectively. By the visual inspection of the reference Raman spectra (red curve) and the Raman spectra after spectral reconstruction (blue curve) in Fig. 5(a) and (b), it can be noted that the major Raman features were at 853 cm−1 (tyrosine ring breathing vibration of protein), 1003 cm−1 (phenylalanine ring vibration of protein), 1126 cm−1 (C–N, C–C stretching of protein and C–C lipid stretch), 1447 cm−1 (CH2, CH3 lipid, and protein), and 1556 cm−1 (C
C vibration of protein).33,34 Moreover, the Raman peaks at 725 cm−1 and 751 cm−1 can be identified and assigned to the adenine and thymine ring breathing vibrations of DNA. From Fig. 5(c), it can be seen that the intensity of these bands were consistently lower in Pseudomonas aeruginosa compared to Staphylococcus aureus, indicating a significant reduction in both the DNA and RNA concentrations in Pseudomonas aeruginosa. In addition, the Raman peaks at 1298 cm−1 and 1447 cm−1, which are assigned to the CH2 and CH3 bending modes were found primarily in proteins and lipids. The intensity of these two peaks notably decreased in Pseudomonas aeruginosa compared to those in Staphylococcus aureus, as shown in Fig. 5(c), which may indicate the decrease of the membranous lipids in Pseudomonas aeruginosa.14 The proteins have a prominent peak at 1556 cm−1, which was assigned to the C
C vibration.35 As in Fig. 5(c), the intensity at this wavenumber increased in Pseudomonas aeruginosa compared to that in Staphylococcus aureus, which indicated the difference in the protein level between these two bacterial samples.35 Although the recovered Raman spectra retained the information about most of the Raman peaks, some Raman information were still lost or overlapped, especially for Raman peaks at around 751 cm−1 (thymine ring breathing vibration of DNA) and 1251 cm−1 (amide III of protein and adenine ring breathing vibration of DNA), as shown in Fig. 5(c). It was found that the difference in the spectra of the two bacterial species at these two peaks after spectral reconstruction were much closer to zero than that of the reference spectra, which demonstrated that some Raman information of DNA and proteins was lost or overlapped by the surrounding peaks. Even though the information in these bands and modes of bacterial components were lost or overlapped, it has very minor impact on the final classification results between Pseudomonas aeruginosa and Staphylococcus aureus (see Table 3). In other words, these bands and modes may not be the key biomarkers responsible for the discrimination in these two bacterial species.
Although only twenty pairs of Raman spectra of Pseudomonas aeruginosa and Staphylococcus aureus were tested in this study, many studies based on a large amount of bacterial samples have verified the feasibility of using Raman spectroscopy as a powerful tool for bacterial species' identification with acceptable Raman signals.34,36,37 In this study, we mainly focused on the information preservation by the proposed weighted spectral reconstruction method during the noise removal from the low SNR Raman spectra. The recovered Raman spectra by the proposed weighted spectral reconstruction method show closest relative RMSE, accuracy, sensitivity, specificity, and AUC value compared to those of the reference Raman spectra, which demonstrates the proposed method's excellent preservation of the most useful spectral information such as Raman peaks and the spectral shape for bacterial species' identification. Thus, we believe that the proposed method can still work well or even better for bacterial species' identification when the amount of bacterial samples increases. In the future study, a large size of sample data set and even other bacterial species will be investigated to validate and confirm these conclusions.
| This journal is © The Royal Society of Chemistry 2019 |