Ryan
Muddiman
a,
Kevin
O' Dwyer
a,
Charles. H.
Camp
Jr
b and
Bryan
Hennelly
*ac
aDepartment of Electronic Engineering, Maynooth University, Co. Kildare, Ireland
bBiosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
cDepartment of Computer Science, Maynooth University, Co. Kildare, Ireland
First published on 31st July 2023
Broadband coherent anti-Stokes Raman scattering (BCARS) is capable of producing high-quality Raman spectra spanning broad bandwidths, 400–4000 cm−1, with millisecond acquisition times. Raw BCARS spectra, however, are a coherent combination of vibrationally resonant (Raman) and non-resonant (electronic) components that may challenge or degrade chemical analyses. Recently, we demonstrated a deep convolutional autoencoder network, trained on pairs of simulated BCARS-Raman datasets, which could retrieve the Raman signal with high quality under ideal conditions. In this work, we present a new computational system that incorporates experimental measurements of the laser system spectral and temporal properties, combined with simulated susceptibilities. Thus, the neural network learns the mapping between the susceptibility and the measured response for a specific BCARS system. The network is tested on simulated and measured experimental results taken with our BCARS system.
The BCARS signal is generated through the interference of two optical pulses and the third-order electric susceptibility of the sample χ(3). The intensity is proportional to the squared susceptibility, therefore the detected signal is quadratic in the sample concentration. The susceptibility contains a contribution from vibrational resonances within the molecule as well as a non-resonant component known as the non-resonant background (NRB). The sample electron response to the incident fields is responsible for the NRB and is generated via four-wave mixing. The coherent interference of the NRB and the vibrationally resonant response gives rise to asymmetric lineshapes in the BCARS signal intensity and as such obscures the relative species concentration. This property of BCARS measurements is a well-known drawback for which multiple experimental and mathematical-based methods have been produced for its treatment. Experimental methods involve for example controlling the excitation and detection polarization angle3 (P-CARS), frequency-modulation CARS (FM-CARS)4 and time-resolved CARS (T-CARS).5 These methods although useful, add to the complexity of the measurement and in some instances suppress the resonant contribution during the process.
Another approach for recovering the resonant susceptibility is to use BCARS intensity data and mathematical relationships between the phase and intensity of the BCARS signal. One such method involves the fact that the susceptibility obeys causality and therefore the Kramers–Kronig relations of a BCARS measurement can be used to provide the phase of χ(3) as demonstrated by Liu et al.6 In practice, this method requires pre-processing of the spectra, either using the singular value decomposition as a denoising and spectral encoding step as shown by Masia et al.7 or a baseline correction of the retrieved phase due to errors from the finite frequency range of actual measurements and the use of a surrogate NRB.8 Another approach is the maximum entropy method (MEM) which computes χ(3) through maximisation of the entropy of an autocorrelation function, whose parameters are consistent with the measured data.9 This method requires prior information of at least two locations in the spectrum that the phase is known, in order for proper retrieval, as well as a measurement of the NRB. Since baseline correction is a supervised technique and prior information is required for the MEM, both of these methods must also be supervised for Raman signal extraction. Thus, the current mathematical approaches are unsuitable for bulk phase retrieval of unknown BCARS spectra.
Another emerging technique for NRB removal is to use a deep learning network to recover the Raman spectrum directly from BCARS measurements. This has been demonstrated first by Valensise et al. using purely synthetic BCARS data which was used to train a convolutional neural network (CNN).10 Long-short term memory (LSTM) networks were also applied to this problem, with the performance of the network tested on real spectra.11 Saghi et al.12 then demonstrated improved retrieval using semi-synthetic training data and fine-tuning an already trained network. A comparison with the performance of the MEM approach was also shown. The semi-synthetic data were generated using actual recordings of Raman spectra, with an NRB generated mathematically using sigmoid functions.
In our previous paper,13 we trained a CNN called VECTOR on paired CARS intensity data and its respective Raman intensity. The CARS spectra were simulated assuming a spectrally flat excitation. The stimulation profile from the laser sources has a profound effect on the CARS spectra generated, and this impacted negatively on the networks ability to remove the NRB from real BCARS measurements. In this paper, we introduce VECTOR2, whereby the training set is produced using a synthetically generated susceptibility; however, we also take into account the effect of laser pulse characteristics on the spectral shape. Two processing steps are applied to the synthetic susceptibility in accordance with the physical model. Firstly the laser stimulation profile, which can be measured experimentally, is used to modulate the susceptibility, and secondly the probe laser spectrum is convolved with the result. Thus, we more accurately mimic the effect of a given system in simulating a spectrum.
In CARS, the third-order nonlinear susceptibility χ(3) is probed. This susceptibility is defined as the sum of a resonant part that is complex and a purely real non-resonant part
![]() | (1) |
In BCARS, it is the squared magnitude of the susceptibility modulated by the laser profile that is detected, as given by eqn (2)
ICARS ∝ |[χ(3)(ES × EP)] * Epr|2 | (2) |
ICARS ∝ |Sχ(3)|2 ≈ |S|2|χ(3)|2 | (3) |
![]() | (4) |
It has been shown that the Kramers–Kronig (KK) method allows quantitative and reliable extraction of the Raman spectrum of neat chemicals and tissues.8 The KK relation is built on the relation that the real and imaginary parts of an analytic function are related. In common usage, the Bode gain-phase relation is employed14
![]() | (5) |
The main problem of the current technique is the requirement of a surrogate NRB reference spectrum. If the NRB is not purely real, then any absorbance or scattering will affect the NRB spectrum. The implementation of a windowed Hilbert transform for performing the KK relation also results in an approximately additive error term in the retrieved spectral phase of the susceptibility. The removal of this baseline through detrending is a method requiring supervision, since the error term depends on the window size. Classical methods are also susceptible to noise, since any detrending method will be affected by noise present in the phase, and thus often statistical denoising is employed using many BCARS spectra in order to prepare the spectra for phase retrieval. The retrieval of the susceptibility through KK also assumes that the probe spectrum is significantly narrower than the resonant linewidths.
Using deep learning for phase retrieval has multiple benefits; firstly, deep neural networks are very accurate function approximators. Secondly, they have the capability to perform efficient dimensionality reduction. It has been shown that a single hidden layer autoencoder network solves the same problem as singular value decomposition (SVD).15 This is due to the fact that the loss function is equal to the SVD objective function. Deep learning approaches also make no assumption of the phase of S, and the full electric field can be included to reproduce complex laser phase conditions. This is discussed in more detail in the ESI.† This approach also makes no assumption on the probe shape, as this is included in the simulation. This could possibly be beneficial in terms of the deconvolution process inherent to VECTOR2.
Stage | O (K, Cin, Cout, S) | n(T × C) | |
---|---|---|---|
Input | — | 1000 × 1 | |
Encoder | Layer 1 | 8, 1, 64, 1 | 993 × 64 |
Layer 2 | 8, 64, 128, 2 | 493 × 128 | |
Layer 3 | 8, 128, 256, 2 | 243 × 256 | |
Layer 4 | 8, 256, 512, 2 | 118 × 512 | |
Layer 5 | 8, 512, 1024, 2 | 56 × 1024 | |
Layer 6 | 8, 1024, 2048, 2 | 25 × 2048 | |
Layer 7 | 8, 2048, 2048, 2 | 18 × 2048 | |
Layer 8 | 8, 2048, 2048, 2 | 11 × 2048 | |
Layer 9 | 8, 2048, 2048, 2 | 4 × 2048 | |
Latent space | — | 4 × 2048 | |
Decoder | Layer 1 | 8, 2048, 2048, 2 | 11 × 2048 |
Layer 2 | 8, 2048, 2048, 2 | 18 × 2048 | |
Layer 3 | 8, 2048, 2048, 2 | 25 × 2048 | |
Layer 4 | 8, 2048, 1024, 2 | 56 × 1024 | |
Layer 5 | 8, 1024, 512, 2 | 118 × 512 | |
Layer 6 | 8, 512, 256, 2 | 243 × 256 | |
Layer 7 | 8, 256, 128, 2 | 493 × 128 | |
Layer 8 | 8, 128, 64, 2 | 993 × 64 | |
Layer 9 | 8, 64, 1 | 1000 × 1 | |
Output layer | Sigmoid | 1000 × 1 |
Here, we use our previously designed architecture, VECTOR,13 which incorporates convolutional layers instead of fully-connected layers as well as symmetric skip connections17–19 between each paired convolutional layer and transposed convolutional layer in the encoder and the decoder respectively. We previously demonstrated that skip connections significantly boosted performance for deeper networks such as the one used in this paper. Skip connections allow some features in the input or in shallow layers to propagate through the network and bypass the low-dimensional latent space. This property allows high-level features to be preserved that might otherwise fail to be preserved by the latent space.
This previous network performed Raman extraction on fully synthetic BCARS spectra that had a spectrally flat excitation; the input was the ideal BCARS spectrum and the 1 loss function was applied to the output and the true Raman spectrum. The performance of this previous network was poor when applied to real BCARS spectra because non-material dependent artefacts, such as the stimulation spectrum or probe shape were not present in the training data. Therefore, the network had no information about the optical system generating the spectra. Our solution to this is to generate BCARS training spectra using the actual laser stimulation profile of our BCARS microscope as illustrated in Fig. 2, which is the core subject of this manuscript. The details of which are given in Section 4. Implementing the system parameters in the training samples provided much more realistic BCARS training data from which the network can learn. Although we have previously demonstrated the VECTOR architecture for BCARS NRB removal in ref. 13, there are several points regarding the novelty in its usage in this paper. The primary novelty is the use of training sets of simulated BCARS spectra that include the specific stimulation profile of the BCARS system. The resultant network performs optimally for a specific system. Another point of novelty is that the convolution of the probe is also included in the simulated training sets. The result is that the network will perform deconvolution and the resolution of the retrieved spectra will be enhanced. A custom loss function has also been designed for these new training sets to better accommodate the significant disparity that can exist for resonance amplitudes in the fingerprint and CH-stretching regions of the Raman spectrum.
The resonant susceptibility for each region was generated as a sum of individual complex Lorentzians as follows:
![]() | (6) |
Dataset | |||||
---|---|---|---|---|---|
Component | Symbol | Name | Unit | VECTOR-MU-sparse | VECTOR-MU-dense |
NRB | ω NRB | NRB centre frequency | THz | U(600, 800) | |
σ NRB | NRB width | THz | U(400, 500) | ||
β | NRB amplitude | — | U(0.3, 0.5) | ||
Noise | SNR | SNR (relating to shot noise) | — | U(200, 1000) | |
σ read | Gaussian additive noise (std) | — | U(0.00001, 0.0001) | ||
Distortion | A | Ripple amplitude | — | U(0, 0.02) | |
T | Ripple frequency | Pixels | U(50, 1000) | ||
Φ | Ripple phase | — | U(0, π) | ||
Fingerprint | a | Amplitude | — | U(0, 0.1) | |
N | Number of peaks | — | U(1, 15)∈![]() |
U(1, 50)∈![]() |
|
Γ | Half-width | cm−1 | U(2, 10) | U(2, 75) | |
Ω | Resonant frequency | cm−1 | U(600, 1800) | ||
CH-region | a | Amplitude | — | U(0, 1) | |
N | Number of peaks | — | U(0, 3)∈![]() |
||
Γ | Half-width | cm−1 | U(2, 30) | ||
Ω | Resonant frequency | cm−1 | U(2900, 3500) |
The non-resonant susceptibility was generated using a wide Gaussian for the shape:
![]() | (7) |
![]() | (8) |
In Fig. 3(a), the XFROG spectrogram is shown for our system, including the temporal profile at 578.2 nm, corresponding to the envelope of Epr. This function can be used to obtain Epr which is shown in Fig. 3(b). The estimate of ES (amplitude and phase) is shown in Fig. 3(c), from which it can be seen that the phase is flattened over the width of the pulse. In Fig. 3(d) shows the resulting amplitude and phase of the stimulation profile S = ES × EP, which are calculated using ES and Epr as described in the figure caption. It is notable that the amplitude in the 2-colour region of the stimulation profile is relatively weak compared to the 3-colour region. Similar systems have been reported to produce a much stronger amplitude in this region21 and it is likely the amplitude of our system could be increased by optimising the laser spectrum. Nevertheless, the amplitude is sufficient to capture resonance effects in the 2-colour region as demonstrated in Section 4.
A simpler method also exists to estimate S and Epr. The latter can simply be modelled using a Gaussian function with a standard deviation related to the full width half maximum of the spectrum, often provided by the laser manufacturer. The result of this approach is also shown in Fig. 3(b) and is found to be in good agreement with the XFROG method. Estimating S involves recording a BCARS spectrum from a non-resonant material such as glass, and taking the square root. Any measurement of the four-wave mixing response in a medium with no electronic or Raman resonances results in and is real valued. It follows that
. If the amplitude of the NRB is assumed to be approximately flat, ES is assumed to have an approximately flat phase, and Epr is assumed to be sufficiently narrow, then it follows that
The result of this approach is shown in Fig. 3(d) in which it can be seen there is good agreement with the XFROG method.
Both of the methods described above provide values for S and Epr. The XFROG method has the advantage of providing the phase of S; however, if this phase is flat (which is true for the three and two-colour regions in Fig. 3) then this advantage may be overlooked. The second method has the advantage of being simpler, requiring only a single non-resonant spectrum. The second method also has the very significant advantage of including the same sensitivity response of the spectrometer with respect to the BCARS spectra. This cannot be said for the XFROG method, which involves recording spectra in a different band of wavelengths. We believe that the slight difference in the estimates for S that can be observed in Fig. 3(d) can be attributed to this difference. Therefore, the use of S obtained directly from glass is preferable as this will provide self-referencing for intensity calibration of BCARS spectra. For the reasons highlighted here, we employ the simpler method for estimating S in subsequent sections.
Once the susceptibility is generated, and S and Epr have been obtained by one of the two methods above, the final BCARS spectrum is calculated according to eqn (2). This involves multiplication of the susceptibility with S, followed by convolution with Epr. The resultant function ICARS is normalised between 0 and 1 and, which serves as input to the next section, in which the noise generation is described.
For a given irradiance I, shot noise results from the variations in the number of photons arriving in a unit time, and is governed by a poisson distribution where the variance is equal to I. This can be directly related to the well known signal-to-noise ratio based on the relationship between the signal, I, and the standard deviation Given the linear relationship between intensity and irradiance, the SNR of a given BCARS spectrum can, therefore, be approximated as
To simulate an appropriate BCARS intensity, we first decide on a desirable SNR value for the maximum signal intensity. This is randomly chosen from a uniform distribution over a range of values from 200 to 1000, which approximate the experimental reality. This value is then multiplied by the normalised BCARS intensity. Each individual sample ICARSm in the spectrum is scaled by the factor SNR,2 and the resultant value is taken as the mean of a poisson distribution, P. A value is randomly selected based on this probability distribution, and this value is then scaled by SNR−2 in order to ensure normalisation, which is required by the VECTOR2 architecture. This process is summarised in eqn (9):
![]() | (9) |
![]() | (10) |
During training, a third noise term was included in the form of a multiplicative low amplitude and low frequency distortion (ripple) in order to simulate variation in the signal intensity that is not predicted by our model. The origins of this experimental variation appear to be sample dependent and might be related to sample induced chromatic aberration. Inclusion of this distortion was necessary in order to prevent rigidly over-training the network on a single possible stimulation profile shape; for such a case, any slight deviations in the expected profile shape are erroneously recovered as appreciable resonances. Early attempts to train our networks without this noise source resulted in poor performance when the trained network was applied to experimental data. The distortion is defined as a multiplicative sinusoidal signal which was added to the normalised intensity as follows:
![]() | (11) |
![]() | (12) |
![]() | (13) |
Thus, the objective of the network is to retrieve the Raman spectrum directly from the BCARS input.
![]() | ||
Fig. 5 Average training and validation loss per epoch for the two different autoencoder networks (both on same ordinate scale). |
Parameter | Value |
---|---|
Training set size | 1![]() ![]() |
Validation set size | 10![]() |
Testing set size | 10![]() |
Epochs | 100 |
Batch size | 256 |
Weight decay | 5 × 10−4 |
Momentum | 0.9 |
Initial learning rate | 0.1 |
For quantitative comparison, two test sets were generated. Both the dense and sparse test sets contained 10000 simulated spectra using the parameters listed in Table 2. Both test sets were input to both networks and the mean absolute error (MAE) of the output spectra were measured with respect to the true Raman spectrum. The results of testing are shown in Fig. 7 as a violin plot. On average, each network performs better on its own spectral type within the fingerprint region, which is as expected. VECTOR-MU-dense performs comparably, however, to VECTOR-MU-sparse when tested on sparse spectra, with only a slight improvement. This is expected because sparse spectra are a subset of the dense training spectra. The loss in the CH-region is similar for both networks, which is expected since the CH-regions in both training sets used identical parameters. The results for the dense test set are markedly better for the VECTOR-MU-dense network. This is also expected, since the VECTOR-MU-sparse network was trained on relatively simpler spectra.
![]() | ||
Fig. 8 Six experimental BCARS spectra. The logarithm of the intensity is shown and the spectra were offset vertically for clarity. The retrieved Raman spectra are shown in the next figure. |
The intensity in the 2-colour region of our BCARS system is relatively weak compared to the 3-colour as expected, due to the shape of the stimulation profile as discussed in Section 4.2. The SNR, however, is sufficient to recover resonances in the 2-colour region with a relatively low exposure time (1 s) as evidenced in Fig. 9. We note that no effort was made to apply intensity calibration to the BCARS spectra shown in Fig. 8, relating to the sensitivity response of the system. This step is superfluous because the stimulation profile that is used to train the network (obtained from a glass spectrum) has been modulated by the same sensitivity response. Therefore, all retrieved Raman spectra will be inherently corrected for the system sensitivity. This point is also true for spectra retrieved using the KK method, which also makes use of the glass spectrum or other similar non-resonant material.
![]() | ||
Fig. 9 Retrieval of the six chemicals: (a) glycerol, (b) a proprietary polymer slide, (c) PMMA, (d) polystyrene, (e) ethanol, (f) benzonitrile. The spectra retrieved from both networks are shown, together with the corresponding intensity calibrated spontaneous Raman spectrum. These results are reproduced in the ESI† over full band and compared with the Kramers–Kronig method. |
In order to compare a Raman spectrum retrieved using VECTOR-MU, with a corresponding spontaneous Raman spectrum, each sample was also analysed using a commercial Raman spectrometer with high resolution. An Horiba Jobin Yvon LabRAM HR Raman micro-spectroscopy system with 660 nm excitation laser and 1800 L mm−1 grating was used with an MPlan 10×/0.25 NA (Olympus) objective. Using an automated routine, the system recorded from 400 to 4500 cm−1 with an approximate theoretical resolution of 0.4 cm−1 resulting in spectra with 25232 spectral pixels. The acquisition time per spectral band was 10 s and the average of five separate measurements was taken. The Raman spectra were dark current subtracted and intensity calibrated using a NIST calibrated white light source as described in ref. 25. For the case of benzonitrile, laser excitation of 532 nm was used to avoid a broad spectral peak at 800 cm−1 that appears when 660 nm excitation was used, which may result from fluorescence. The Raman data was filtered using a Savitzky–Golay filter of window size 9 and order 3. These spectra provided high-resolution and background-free spontaneous Raman spectra for qualitatively judging the performance of VECTOR-MU. Fig. 9 shows the Raman spectra of each of the six chemicals.
For the case of glycerol both VECTOR-MU networks show good agreement with the relative peak heights and locations found in the spontaneous Raman spectrum in both the fingerprint and CH regions. VECTOR-MU-sparse fails to reconstruct the slow background from 1200 to 1500 cm−1, which is expected, since no examples of this type were used in training. Both networks failed to recover the relative strength between the 3- and 2-colour regions. This may be due to the low relative intensity of the two colour region, which results in a low dynamic range for the BCARS response in our system. This failure is true for all of the spectra that were tested and is discussed further at the end of this section. In the polymer sample there is good performance in the fingerprint region for both networks, however VECTOR-MU-dense performs better at retrieving relative peak heights. In the CH region, there is a noticeable error in the height of the peak at 2900 cm−1 for the sparse network; however, this may be due to sampling error in training, since while both networks used the same CH region parameters, the actual spectra used for training were different.
In the PMMA sample, the Raman spectrum is relatively sparse containing a small number of strong sharp resonances, and for this reason, the sparse network performs better than the dense. In the CH-stretch region, the retrieval is very similar to the actual Raman spectrum for both networks. For the case of polystyrene, the Raman spectrum has a strong resonance at around 1000 cm−1, the amplitude of which VECTOR-MU-sparse fails to retrieve correctly, relative to the neighbouring peaks. It is notable that the dense network erroneously produces a baseline at 780 cm−1 and 1550 cm−1. The dense-network can clearly produce some errors with regard to baseline, when dealing with sparse narrow resonances. The sample of ethanol is particularly sparse, containing approximately ten resonances in the spontaneous Raman spectrum. VECTOR-MU-sparse outperformed VECTOR-MU-dense, which once again inserts an incorrect baseline. The hydroxide bond is also recovered in the CH region at approximately 3200–3400 cm−1 by both networks. For the final spectrum, the location of fingerprint peaks of benzonitrile were recovered with good accuracy by both networks; however the peak heights ratio between 1150 and 1600 cm−1 were noticeably incorrect as was the peak at 1000 cm−1. Once again, VECTOR-MU-sparse showed superior performance.
The current implementation of VECTOR2 takes longer to run (especially train) than the MEM or KK, but applying the trained network to new data can be extremely fast (∼1 ms) and further improved with better coding implementations. Once a system has been optimised, the laser stimulation profile will remain approximately unchanged for an extended period, and therefore, a given trained network can be used over this period. Additionally, new experimental results that are produced by an updated system (with a new laser stimulation profile) can be used to update the current network via “transfer learning”; thus, not necessitating retraining from scratch. Aside from the speed discrepancies, VECTOR2 has a very powerful potential to improve Raman signal extraction from raw BCARS spectra. The KK and MEM methods suffer from artifacts that VECTOR2 may not. For example, the discrete Hilbert transform (DHT) is not exactly equivalent to the continuous Hilbert transform for which the KK method is founded upon, leading to consequential baseline errors as shown by Camp.26 This effect is identically seen in MEM methods as well27 thus, leading to padding schemes for both the KK and MEM which are inaccurate especially for peaks approaching the window edge.
Another important benefit that VECTOR2 has over KK is higher accuracy in the presence of high levels of noise which is twofold: firstly, the noise is more effectively removed from the retrieved Raman signal and secondly the signal itself is extracted with higher accuracy from this noisy background as evidenced by Fig. S1 and S2 of the ESI.† Furthermore, errors in phase-retrieval methods (MEM and KK) rely upon detrending methods to make corrections as shown by Camp et al.8 but these detrending methods may not accurately retrieve the baseline. For example, higher-order polynomial detrending is often used (especially in the Raman community) but this, of course, assumes the baseline is a polynomial which it has no theoretical reason to be. Methods such as asymmetric least-squares (ALS) offer more flexibility, but they too are limited in that they are governed solely by two hyperparameters that may be inarticulate for complex spectra. Deep learning methods, such as VECTOR2, have the benefit in that they can be trained using purely synthetic (or mixed with experimental) data. Furthermore, VECTOR2 can be training with data that uses the KK or MEM to convert a raw BCARS spectrum; thus, incorporating other physics-based approaches into the deep learning model. There is no way to “train” a polynomial detrender or ALS to be better.
VECTOR2 being a neural network requires some normalization in order for simulated spectra to be properly generated. In our case, the input spectra and the simulated Raman spectra are normalized to their minimum and maximum values. Therefore, absolute concentration information is not retrievable with VECTOR2 in its current state. However relative species concentration, obtained from the ratio of two or more retrieved peaks is preserved. We have not tested the capability of concentration measurements in this paper, but this could be grounds for future work. KK retrieved spectra have been shown to be linear in the concentration.2
In summary, it is true that the MEM and KK offer powerful methods to extract Raman features. It is also true, however, that they require additional tools to maximize their extraction potential (phase-error correction, scale-error correction, denoising, etc.) and these methods are also limited with samples with either/or low NRB-to-resonant ratios or low SNR.11
It is important to stress that this approach to training a network is highly specific to the laser system used in the BCARS setup. On the one hand, this is a benefit to our approach since any BCARS system can simply record a glass spectrum and include it in the training process, resulting in a network that is bespoke to the system used. Furthermore, inclusion of this information in training provides an inherent intensity calibration to correct for the sensitivity response of the system. On the other hand, the major drawback of this approach is the time required to train VECTOR2, which means that lengthy retraining is necessary if the stimulation profile changes. In this paper, a network required >48 hours to train and the per spectrum runtime of the trained network is on the order of ∼1 ms. However, it is possible that the time taken to retrain the network could be significantly shortened as described in the previous section.
We could improve the performance of VECTOR2 in primarily two ways: improving the network architecture and extending the physical model that underpins the training sets. For the first case, experimenting with deeper and higher dimensional layers could potentially improve the performance of VECTOR2 for more complex spectra containing denser number of resonances. Other architectures could also be investigated such as generative adversarial networks, transformers or diffusion networks. For the second case the physical model could be extended to account for electronic resonances for relevant samples. This could possibly be achieved by modulating the stimulation profile appropriately in the training sets.
Disregarding the stimulation profile, it is important to emphasise the type of training data in terms of number of resonances and linewidth. We trained two networks using two different Raman datasets. VECTOR-MU-sparse was representative of pure chemical spectra (e.g., simple compounds such as benzonitrile) while VECTOR-MU-dense presented more complex spectra such as glycerol or a biological spectrum. It is notable that the sparse dataset is a subset of the dense dataset and, therefore, it could be argued that the sparse network is redundant. However, it is important to emphasise that using the sparse network on sparse data is advantageous for two reasons: (i) the sparse network takes less time to train as evidenced by the faster converging loss functions in Fig. 5 and (ii) it provides higher accuracy than the dense network applied to sparse data as evidenced in Fig. 7. While both networks performed well on sparse test sets, VECTOR-MU-sparse performed poorly on dense test sets as expected. We demonstrated that both networks also performed well at retrieving the Raman spectra from experimental chemical measurements taken with our system and were in good agreement with the corresponding spontaneous Raman spectra.
As well as requiring no tuning, another important feature of the network is the potential benefit of deconvolution with respect to the probe laser; an example of this is provided in the ESI.† Since the simulated data used for training includes convolution with the probe, it can be expected that the network learns to deconvolve, potentially producing higher resolution retrieved spectra than can be produced using KK and MEM. This could be further enhanced with a second convolution to model the spectrometer impulse response function. We do not explore this capability here, although this could be performed in future work.
We note that both VECTOR-MU networks failed to recover the relative strength between the 3-colour and 2-colour regions for all of the chemical spectra. We believe that this may be due to the low intensity of the 2-colour region, resulting from the weak stimulation profile for our specific system. This results in a low dynamic range for the BCARS intensity in that region; thus, a large range of values for the resonant susceptibility in the 2-colour regions are mapped to only a small range of values in the BCARS spectrum. This makes it challenging for VECTOR2 to accurately identify the correct scale in the retrieved spectrum. We believe that our system could be optimised to produce a stronger response in the 2-colour region by laser optimisation. Despite scaling issues in this region, results were far better than those provided by the KK method, which suffered due to the low signal-to-noise ratio in this region.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3ay01131c |
This journal is © The Royal Society of Chemistry 2023 |