Rajendhar
Junjuri‡
*ab,
Ali
Saghi‡
a,
Lasse
Lensu
a and
Erik M.
Vartiainen
a
aLUT School of Engineering Science, LUT University, 53851 Lappeenranta, Finland. E-mail: rajendhar.j2008@gmail.com
bLeibniz Institute of Photonic Technology, Albert-Einstein-Strasse 9, 07745 Jena, Germany
First published on 26th May 2023
The nonresonant background (NRB) contribution to the coherent anti-Stokes Raman scattering (CARS) signal distorts the spectral line shapes and thus degrades the chemical information. Hence, finding an effective approach for removing NRB and extracting resonant vibrational signals is a challenging task. In this work, a bidirectional LSTM (Bi-LSTM) neural network is explored for the first time to remove the NRB in the CARS spectra automatically, and the results are compared with those of three DL models reported in the literature, namely, convolutional neural network (CNN), long short-term memory (LSTM) neural network, and very deep convolutional autoencoders (VECTOR). The results of the synthetic test data have shown that the Bi-LSTM model accurately extracts the spectral lines throughout the range. In contrast, the other three models’ efficiency deteriorated while predicting the peaks on either end of the spectra, which resulted in a 60 times higher mean square error than that of the Bi-LSTM model. The Pearson correlation analysis demonstrated that Bi-LSTM model performance stands out from the rest, where 94% of the test spectra have correlation coefficients of more than 0.99. Finally, these four models were evaluated on four complex experimental CARS spectra, namely, protein, yeast, DMPC, and ADP, where the Bi-LSTM model has shown superior performance, followed by CNN, VECTOR, and LSTM. This comprehensive study provides a giant leap toward simplifying the analysis of complex CARS spectroscopy and microscopy.
ICARS ∝ |χ(3)NR + χ(3)R (ω)|2 | (1) |
Consequently, exploring for other methods to extract the phase relationship without physically removing the NRB is of paramount importance. In this context, numerical approaches such as the maximum entropy method (MEM)13 and the Kramers–Kronig (KK) relation14 have been widely utilized for phase retrieval. Furthermore, other algorithmic methods such as “phase-error correction”,15 “factorized Kramers–Kronig and error correction”,16 and “wavelet prism decomposition analysis”17 are also reported in the literature to mitigate the experimental artefacts and spectral line distortions in the CARS spectra. Recently, Charles et al. have proposed discrete Hilbert transform to remove the NRB.18 However, these numerical techniques require a surrogate reference material, and/or the other simulation parameters need to be tuned to get the best results. All these complications can be overcome by utilizing machine learning algorithms where the model learns from the input CARS data and predicts the Raman signal.19 Deep neural networks (DNNs) have been explored in several applications, such as weather forecasting,20 natural language processing,21 and computer vision.22 Moreover, it is also utilized in different spectroscopies, such as hyperspectral image analysis,23 vibrational spectroscopy,24,25 molecular excitation spectroscopy,26 and laser-induced breakdown spectroscopy.27–29
Various deep learning (DL) approaches have also been recently explored via CARS spectroscopy to tackle the NRB removal problem.30–35 Valensise et al. have utilized a convolutional neural network (CNN) model to retrieve the imaginary part from the CARS spectral data.31 It is the first report on utilizing DL methods for removing the NRB and is referred to as SpecNet. Houhou et al. have used a long short-term memory (LSTM) neural network model to retrieve the Raman signal and their results are compared with the results of the MEM & KK.30 Wang et al. deployed very deep convolutional autoencoders (VECTOR) for removing the NRB, and their model's performance is compared with that of SpecNet.32 They have also shown that the VECTOR model with 16 layers has given optimum results in less computational time.
Our recent works have demonstrated that retraining the SpecNet with a combination of semisynthetic and synthetic data improves its performance.33 We have also applied a transfer learning approach to increase the CNN model efficacy in retrieving the imaginary part of the CARS spectra.34 Furthermore, the noise is also varied at various levels to analyse the sensitivity of the model after transfer learning. Very recently, we have also explored three different NRB types to simulate the CARS data.35 It has been revealed that considering the NRB as a fourth-order polynomial function instead of a product of two sigmoids improves the CNN model's efficiency. These three works have shown superior performance compared to the SpecNet, where spectral lines with minimal intensities are also predicted.33–35 Even though the CNN model trained with polynomial NRB has predicted all the spectral lines of the experimental data, the intensity of a few lines deviated from the true one. Also, similar results were obtained with the LSTM30 and VECTOR models,32 where the performance was found to be sensitive when evaluating the experimental CARS data.
Furthermore, estimating the mean square error (MSE) throughout the spectral range can be considered as a critical parameter for evaluating the model's efficiency. However, no other reports have presented it, excluding our works33–35 to the best of our knowledge. It should be noted that the SpecNet has given a high MSE while predicting the peaks at the ends of the spectrum. It is observed because the model could not extract peaks when it encountered the spectral line that only had a rising or falling part instead of a full line shape. Even retraining SpecNet with semisynthetic data,33 applying transfer learning,34 and training with the CARS data simulated via polynomial NRB35 could not avoid it and challenged the predictive ability of the models. These studies hint that exploring other DL approaches in addition to the CNN, LSTM, and VECTOR models can mitigate the aforementioned limitations.
Hence, in this work, we have explored the Bi-LSTM model for the first time for extracting the imaginary part of the CARS spectra. Also, the NRB is assumed to be a fourth-order polynomial function while producing the CARS training data, which has already shown optimum results.35 Furthermore, a comprehensive study is performed by comparing the performance of four DL models, namely, (1) VECTOR, (2) CNN, (3) LSTM, and (4) Bi-LSTM. This comparative study has been done for the first time to the best of our knowledge, and critically evaluates the trained model's efficiency in retrieving the Raman signal from the CARS data.
S. no | Simulation parameters | Range |
---|---|---|
1 | No of peaks | (1, 15) |
2 | Peak amplitude (Ak) | (0.01, 1) |
3 | Line width (Γk) | (0.001, 0.008) |
4 | Noise η(ω) | (0.0005, 0.003) |
In brief, the vibrational frequencies are sampled over a normalized scale [0, 1]. The NRB is considered as a function of fourth-order polynomial as given in eqn (2)
NRB = aω4 + bω4 + cω4 + d + e | (2) |
The coefficients a, b, and d are randomly selected from the range of values [−10, 10], whereas it is [−1, 1] for c and e coefficients. The uniformly distributed noise η(ω) is added to the chi3 data for generating CARS data. A total of 50000 synthetic training spectra are generated in Python, where each spectrum has 640 data points/wavenumbers. All the simulation parameters are randomly selected from the given range for generating each CARS spectrum, as shown in Table 1. The code to simulate the synthetic spectra is available here.36 The synthetic dataset used for training all the models is the same (640 data points), except for the VECTOR, as its architecture inherently requires a longer data length (1000 data points). Hence, 1D cubic spline interpolation was used to generate 1000 points from 640 points of the synthetic dataset. This technique ensures that there will be no modifications in interpolation data concerning the shapes and intensity, as shown in Fig. S1 in the ESI.†
The CNN architecture used here is SpecNet.35 The typical schematic of the CNN model's architecture is presented in Fig. 1(a). It is composed of five 1-dimensional CLs (128,64,16,16,16) with filters of dimensions (32,16,8,8,8) and three FCLs of (32,16,640) dimensions, all followed by ReLU activation function, while Adam is applied as the optimization function and the loss function is MSE. It aims to remove the NRB, which produces different levels of spectral distortions, from the input broadband CARS spectra.
In this work, we have used the VECTOR-16 architecture proposed by Wang et al.32 We have retrained it without modifying its architecture. It is composed of an eight-layer encoder of fully convolutional (1D) and a symmetrical eight-layer decoder of fully transposed convolutional (1D), and stochastic gradient descent (SGD) was used as an optimizer. MAE is used as the loss function between the input CARS spectra and the clean Raman spectra. In addition, skip connections49 have been used, which connect each layer from the encoder to the corresponding paired layer from the decoder; they avoid the padding phenomenon that usually happens in convolutional layers. These skip connections speed up the training process and improve the model's performance in deeper networks compared to the plain ones. They also help to mitigate the overfitting problem when the model is too complex and therefore improve the model generalization.
The proposed LSTM architecture is adapted from this work.30 Their code is not available for direct reuse, and its architecture is simple. It contains one LSTM layer of 30 units with ReLU as an activation function and sigmoid as a recurrent activation function. The loss function is MSE, the optimizer is Adam, and the learning rate is 0.005. They have simulated the CARS spectra with NRB as weak and strong regions.
The LSTM architecture is usually applied to ordered data without time labels like text classification or to constant time-sampling rates such as stock price predictions. All these are usually observed in irregular time-sampling rates. Hence, a preprocessing method named functional principal component analysis (FPCA) was applied. Therefore, an additional dimension is needed to contain the phase information of the spectrum. Hence, the DL model should have an additional channel to contain it as the input data as well. LSTM does not have this channel, so as the solution, the Bi-LSTM model has been used and the phase information was integrated as the input as well. The proposed Bi-LSTM architecture was obtained from this work.44 It consists of three bidirectional layers, each of them having 30 units, and a time-distributed fully connected layer as output, therefore achieving an output for each time step. MSE and Nadam were used as the loss function and the optimizer, respectively.44
It is noticed that some peaks with lower intensity were observed on either side of the spectra for CNN, which were actually not present in the true Raman signal. These spurious lines can degrade the performance of the CNN compared to other models. In the case of LSTM, the predicted spectral line intensities are a little higher; on the contrary, it is slightly lower for the VECTOR prediction. Fig. 2(c) illustrates the Raman signal retrieved from the Bi-LSTM model, where the extracted imaginary spectrum closely resembles the true spectrum. Also, it has not predicted any other spurious lines throughout the spectral range.
The SE plot visualization (represented at the bottom of the Figure, for example, see Fig. 2(a)) efficaciously demonstrates the differences between the true and retrieved Raman signals throughout the spectral range for a single test spectrum. Nevertheless, visualization for the entire test set would not be feasible. Therefore, the mean square error (MSE) plot is considered for evaluating each trained model, as shown in Fig. 3(a)–(d). The MSE is estimated by averaging the measured SE over 300 test spectra. The black dots in Fig. 3(a)–(d) represent the average SE, and the red line corresponds to their standard deviation. For easy interpretation, the total spectral window can be divided into three parts, the first region (0–0.1 cm−1), midregion (0.1–0.9 cm−1), and last region (0.6–1 cm−1), where the middle region itself accounts for 80% of total data points, and the remaining 20% represents the first and last regions.
It is also observed that the error is less in the midregion compared to the other two regions of the spectra. The measured MSE is the highest in the first region compared to the remaining two regions, as shown in Fig. 3(a)–(d). It is true for all four models, irrespective of their architecture. The standard deviation is found to be a maximum of ∼0.06 for the VECTOR and CNN, and it is slightly less for the LSTM ∼0.055. However, a drastic change is observed in the case of the Bi-LSTM model, which has shown a 60 times lower standard deviation contrary to the other models, that is, only ∼0.0012. Also, the deviation is approximately the same throughout the spectral range, except for a few points for the Bi-LSTM. However, the scenario is entirely different for the other three models. The deviation in the first region is more than 15 times compared to the midregion for the VECTOR model, whereas it is 5 and 10 times for the LSTM and CNN models, respectively.
In the last region, the maximum deviation is observed for the CNN model, that is, ∼0.035, whereas the minimum is for the Bi-LSTM, that is, ∼0.001. In the case of VECTOR, it is ∼0.025, and it is ∼0.005 for the LSTM model. Also, the deviation is nearly the same in the mid and last regions for the LSTM. Overall, the MSE plot visually demonstrated that the Bi-LSTM model has a superior capability in predicting the imaginary part from the CARS spectra among all four models. The same behaviour is noticed for the mean absolute error (MAE), as shown in Fig. S5 in the ESI.† In the following section, Pearson correlation analysis is performed. It provides a unique numerical parameter for each test spectrum, that is, a correlation coefficient. Hence, it can be utilized as a performance metric for validating the predictions of the four different models.
The PCC values estimated for the Bi-LSTM model have given higher coefficients compared to the other models for more than 97% of test spectra, as shown in Fig. 4(b).
Only one spectrum has given a PCC value of ∼0.80 out of 300 spectra; all others have given PCCs of more than 0.92. In the case of CNN and VECTOR, four spectra have a PCC value less than 0.8, and it is five spectra for the LSTM model. Furthermore, a histogram plot is drawn to graphically visualize the distribution of the estimated PCCs for the 300 test spectra, as shown in Fig. 5. This plot presents the number of spectra that have the PCCs in a specific range, that is, frequency count in the selected PCC range. For example, seven spectra have PCC between 0.9 and 0.92 for the LSTM. Cumulatively, 273, 299, 264, and 273 spectra have PCCs > 0.9 for the VECTOR, Bi-LSTM, CNN, and LSTM models, respectively, which account for more than ∼90% of the test spectra. Hence, their distribution (on the x-axis) is presented only in the range of 0.9–1 instead of 0–1, which ascertains the best visualization of the PCC distribution.
Also, it is noticed that 282 spectra have PCC values >0.98 for the Bi-LSTM model, which corresponds to the ∼94% of total test data. It demonstrates that the Raman signal extracted using the Bi-LSTM model is in better agreement with the ground truth. On the other hand, only 102 and 131 spectra have PCC values >0.98 for the CNN and LSTM models, respectively, which account for less than ∼50% of the total data. The CNN and LSTM models’ performances were found to be almost the same when comparing their PCC values. Hence, the frequency count in most of the bins is approximately the same for the LSTM and CNN models. Furthermore, their estimated PCC difference is less than 0.05 for 253 spectra and less than 0.1 for 285 spectra, as shown in Fig. S2 in the ESI.†
Furthermore, it is observed that the maximum PCC value obtained is close to 1 for all the models. Nevertheless, the minimum values have shown a notable variation when compared with the predictions of the Bi-LSTM model. The lowest predicted PCC value is ∼0.81 for the Bi-LSTM; meanwhile, it is ∼0.58, ∼0.62, and ∼0.67 for the LSTM, VECTOR, and CNN models, respectively. The test spectrum with the lowest PCC value in each model is marked with a red asterisk (*) for easy representation. For example, it is the 257th spectrum for the VECTOR prediction, the 108th spectrum for CNN, and the 172nd spectrum in the case of Bi-LSTM and LSTM models.
The second lowest PCC value is presented with a blue asterisk. These test spectra, along with their Raman line shapes extracted by the four models, are shown in Fig. 6. These visualizations inherently represent the limitations of each model in retrieving the imaginary part from the CARS spectra. It also investigates the route cause for attaining the lowest PCC value for each trained model.
Fig. 6(a1)–(a4) illustrates the results obtained from the 257th test spectrum using VECTOR, Bi-LSTM, CNN, and LSTM models, respectively. The input CARS spectrum has four spectral features in the entire spectral range where all the lines have a higher intensity, except for the peak at 0.66 cm−1. Among four lines, one is located near the right extrema, that is, at 0.99 cm−1, and it could not be extracted by the VECTOR, whereas the other three models predicted it, but a huge error is found in the case of CNN. A similar observation was noticed in our previous work,33 where the CNN prediction capability is poor at the edges. The LSTM and Bi-LSTM models have predicted all the lines, including the line at 0.99 cm−1, and the predictive performance was found to be the same for both models. Furthermore, this inefficient extraction of the Raman line at 0.99 cm−1 has given an SE of ∼0.19 for the VECTOR and led to the minimum PCC value in the entire test dataset, that is, ∼0.62. The SE for the Bi-LSTM, LSTM, and CNN is ∼0.005, ∼0.01, and ∼0.08, respectively.
Fig. 6(b1)–(b4) shows the results of the 172nd test spectrum obtained from the VECTOR, Bi-LSTM, CNN, and LSTM models, respectively. The input CARS spectrum has one strong line at ∼0.86 cm−1 and three very faint spectral lines in the remaining spectral range. These faint lines’ intensities are close to the noise level. Also, the maximum spectral line intensity is only ∼0.062, and due to this, the spectrum looks noisy compared to other test spectra where the intensities are higher by more than order. All four models predicted only one line at ∼0.86 cm−1, and the rest of the lines were not extracted properly. Furthermore, the predicted intensities are matched with the true one for the Bi-LSTM where the lowest SE is noticed, that is, ∼0.001, and the SE is 4, 48, and 18 times more for the VECTOR, LSTM, and CNN models, respectively. It is also observed that all the models have predicted some spurious lines with minute intensities throughout the spectral range. These observations affected the PCC measurements, and hence, the lowest coefficients, ∼0.81 and ∼0.58, are achieved for Bi-LSTM and LSTM models, respectively.
Fig. 6(c1)–(c4) illustrates the 108th test spectrum results obtained from the VECTOR, Bi-LSTM, CNN, and LSTM models, respectively. The input CASRS spectrum has several vibrational spectral features with different peak intensities. Nevertheless, the first spectral line at ∼0.006 cm−1 (on the left extreme) has only half part, that is, the spectral line is started with the trailing part instead of the rising part. It is observed because of considering the spectral line/peak generation anywhere on the entire spectral range (0–1) during the CARS spectra simulation. So, the lines generated close to the extremes sometimes have either a rising or trailing part depending on the peak position and width. Hence, the error may also occur on the right side of the spectrum, as reported in our previous study.33 The CNN and VECTOR models have predicted all the Raman lines except for the first line at ∼0.006 cm−1, which is because of considering only half part of the spectral line. Similar observations were also noticed in the previous studies where the CNN model performance deteriorated when it encountered the spectral lines, with only having either a rising or trailing part.33 This inherent constraint has given a high SE of ∼0.15 and affected the PCC measurements, where its value is minimum (∼0.67) for the CNN model and the second lowest PCC value (∼0.70) for the VECTOR model. This could be a reason for the high MSE observed on either side of the extrema for the VECTOR and CNN models, as shown in Fig. 3(a) and (c), respectively. Furthermore, the Bi-LSTM and LSTM have predicted all the lines, including the first one on the left end. However, the LSTM model has given a high error compared to the Bi-LSTM model, which is of more than order.
Furthermore, the test spectra corresponding to the second lowest PCC value are presented in the ESI,† Fig. S3. It is the 84th spectrum for the CNN (∼0.69), the 111th spectrum for the LSTM (∼0.65), and the 129th spectrum for the Bi-LSTM (∼0.93) model. The results of the 111th spectrum are presented in the ESI,† Fig. S3(a1)–(a4), where the two spectral features are not predicted by the four models. These two spectral lines are very faint. Also, the predicted intestines have deviated, and the error is the highest for the LSTM, which is reflected in PCC measurements, and the error is minimum for the Bi-LSTM. The results of the 129th test spectrum are shown in the ESI,† Fig. S3(b1)–(b4), where the input CARS spectral line intensity is low. The SE of the spectral line at ∼0.97 (on the right extreme) is only ∼0.004 for the Bi-LSTM, whereas it is more than 20 times for the other three models. However, the Bi-LSTM could not predict the two lines, which led to the second lowest PCC value. Furthermore, the other three models also did not retrieve three/four lines. In the case of the 84th spectrum, four models have predicted all the lines. Nevertheless, the retrieved peak intensities are deviated for all the models, except for Bi-LSTM, as shown in the ESI,† Fig. S3(c1)–(c4). The deviation is found to be maximum for the CNN, followed by the LSTM, VECTOR, and Bi-LSTM, respectively. The SE of the spectral line at ∼0.97 (on the right extreme) is only ∼0.002 for the Bi-LSTM, whereas it is 30, 22, and 13 times more for the CNN, LSTM, and VECTOR models, respectively. These visual findings clearly demonstrate that the Bi-LSTM model has superior capability in predicting imaginary parts compared to the other three models.
In conclusion, Fig. 4, 6 and the ESI,† Fig. S3, have visually demonstrated the imaginary part prediction capability of four models where the performance of the Bi-LSTM model was found to be best. Numerically, it performed well on more than 97% of the total test dataset (i.e., it has a higher PCC value than the other three models). It also revealed that the Bi-LSTM model has better capability when extracting spectral lines at the ends, even though they only have either a rising or trailing part which led to the lowest MSE even at the edges, as shown in Fig. 3(b).
Furthermore, the efficiency decreased when only it encountered the noisy CARS spectrum with a very low intensity. The results of the experimental CARS spectra are discussed in detail in the next section.
Fig. 7 shows the results obtained from the four models on these experimental CARS data. Each plot in Fig. 7 is a three-stacked plot (see Fig. 7(a) for reference). The first row represents the input CARS spectrum (green line), and the second row shows the true (black line) and predicted (red line) imaginary parts. The labels ‘True’ and ‘Pred’ in the Figure correspond to the imaginary part extracted by the maximum entropy method and trained DL models, respectively. Furthermore, the third row represents the square of the error (blue line), that is, the square of the difference between the predicted and true imaginary parts. In each sample, the y-axis scale is considered to be the same for all four models for better visualization.
Fig. 7(a1)–(a4) represents the results of the protein sample in the fingerprint region (700–1900 cm−1) obtained from the VECTOR, Bi-LSTM, CNN, and LSTM models, respectively. It has various resonance vibrational bands, including tyrosine peaks at 850, 1210, and 1616 cm−1; amide I bands (∼1220–1250 cm−1); amide III bands (∼1600–1700 cm−1); and CH2 band at 1445 cm−1. Here, the prediction of the Bi-LSTM model is in good agreement with the true one, where the SE is only ∼0.02. In contrast, the LSTM model prediction is poor, where the extracted line shapes are very broad, and intensities deviate from the true ones. The other two models also predicted the spectral lines, albeit the intensities differed from the actual signal. Hence, the SE is found to be 6 times more for the VECTOR model compared to the Bi-LSTM, and it is 20 times for the LSTM and 5 times for the CNN. These observations are reflected in PCC measurements as shown in Fig. 9(a), where the highest value is obtained for the Bi-LSTM, that is, ∼0.95, and the minimum is for the LSTM ∼0.42. The other two have given the same value, ∼0.89.
Fig. 7(b1)–(b4) represents the results of the protein sample in the range of 1830–3100 cm−1 from the VECTOR, Bi-LSTM, CNN, and LSTM models, respectively. The predicted line shapes match with the true one for the Bi-LSTM, but the peak intensities have slightly deviated. In contrast, the CNN and VECTOR models have correctly extracted the peak intensities; however, the line shapes have deteriorated. Similar behaviour is noticed for the LSTM, and in addition, a broad spurious peak is also observed in the spectral range of 2200–2400 cm−1. An overall minimum SE of ∼0.032 is noticed for the Bi-LSTM, and a maximum of ∼0.059 is noticed for the LSTM.
Fig. 7(c1)–(c4) illustrates the imaginary part retrieved from the yeast sample by these four models, respectively. All the models except for the LSTM have extracted major resonance spectral features (a C–H bend of the aliphatic chain at 1440 cm−1, an amide band at 1654 cm−1, a CC bending mode of phenylalanine at ∼1590 cm−1); nonetheless, the predicted intensities have deviated for the VECTOR and CNN models compared to the Bi-LSTM. In the case of LSTM, an intense ringing structure has appeared throughout the spectral region, excluding resonance peak positions that are not present in the true Raman spectra. The maximum estimated SE for Bi-LSTM is ∼0.04, and it is more than two times for the LSTM and CNN. The error is more than an order for the LSTM due to the deteriorated spectral line shapes. The measured PCC values also conveyed the same information, where the predictive performance is superior for the Bi-LSTM (∼0.96) model, followed by the VECTOR (∼0.92), CNN (∼0.89), and LSTM (∼0.41), as shown in Fig. 9(a).
The results of the ADP/AMP/ATP mixture obtained by the VECTOR, Bi-LSTM, and LSTM are presented in Fig. 8(a1)–(a3). The CNN model prediction can be found here.35 The adenine vibrations are observed in the range of 1270 to 1400 cm−1, and the strongest one is noticed at ∼1330 cm−1, as shown in Fig. 8(a1)–(a3).55 All four models have retrieved these adenine vibrations, albeit the extracted line intensities do not match with the true intensities. The measured SE in this spectral range is noticed maximum for the VECTOR models (∼0.003), followed by the LSMT (∼0.04), Bi-LSTM (∼0.004), and CNN (∼0.001).56 Furthermore, the symmetric stretching vibration of the triphosphate group of ATP (∼1123 cm−1) is retrieved by all the models except for the LSTM. Hence, the highest SE is observed for the LSTM (∼0.36) and the minimum is observed for the CNN (∼10−5). In the case of Bi-LSTM, the SE is ∼10−4, and it is ∼10−2 for the VECTOR. A similar behaviour is noticed for the diphosphate resonance band (∼1100 cm−1). The monophosphate resonance band of AMP (979 cm−1) is only extracted by Bi-LSTM and CNN. The LSTM and VECTOR models could not predict it and led to high SE. Here also, the PCC values measured for all four models with the Bi-LSTM and CNN are the best among all as they have the highest coefficient of ∼0.93, followed by VECTOR (∼0.85) and LSTM (∼0.42).
Fig. 8(b1)–(b3) depicts the results of the DMPC sample retrieved from the VECTOR, Bi-LSTM, and LSTM models, respectively. The results of the CNN can be found here.35 Prominent vibrational bands such as the CH stretch mode, symmetric and antisymmetric stretching modes of methylene groups, and overtone of the methylene scissoring mode appeared in the range of 2600–3000 cm−1.57,58 All four models have extracted these vibrational bands except for the LSTM. It could not predict the vibrational mode at 2946 cm−1 and led to a high error. Also, a strong spurious line appeared on the right extreme for the LSTM prediction. These observations have affected the PCC measurements, where the PCC values are ∼0.8, ∼0.93, ∼0.89, and ∼0.42 for the VECTOR, Bi-LSTM, CNN, and LSTM models, respectively, as shown in Fig. 9(a). The Bi-LSTM model performance was found to be the best among all, where the highest average correlation coefficient is obtained for the Bi-LSTM, followed by CNN, VECTOR, and LSTM models, respectively, as shown in Fig. 9(b). However, a relatively higher computational time is required for Bi-LSTM, as presented in Table S1 in the ESI.†
![]() | ||
Fig. 9 (a) The PCC values measured on the experimental CARS predictions using four trained models. (b) The average value of the PCCs estimated from the five experimental CARS data. |
Overall, the BLSTM model predictions are optimum not only on the synthetic data but also on the experimental data. However, a few limitations were noticed when evaluating it on the spectra with low spectral line intensities and higher noise levels where it could not find some peaks. These observations suggest that modification of the spectral simulation parameters is required. Especially, we are planning to train the model with data generated by different noise levels in our future work. It would also be interesting to train the model with data generated by different simulation parameters (number of peaks, frequencies, amplitudes, etc.) to fit specific applications in different spectral regions.59 Also, fine-tuning or transferring learning mechanisms can be explored to circumvent these limitations, which positively impacts model performance.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cp01618h |
‡ These authors contributed equally. |
This journal is © the Owner Societies 2023 |