Teemu
Härkönen
*a,
Erik M.
Vartiainen
a,
Lasse
Lensu
a,
Matthew T.
Moores
ab and
Lassi
Roininen
a
aDepartment of Computational Engineering, School of Engineering Sciences, LUT University, Yliopistonkatu 34, FI-53850, Lappeenranta, Finland. E-mail: teemu.harkonen@lut.fi
bNational Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, NSW 2522, Australia
First published on 8th January 2024
We propose an approach utilizing gamma-distributed random variables, coupled with log-Gaussian modeling, to generate synthetic datasets suitable for training neural networks. This addresses the challenge of limited real observations in various applications. We apply this methodology to both Raman and coherent anti-Stokes Raman scattering (CARS) spectra, using experimental spectra to estimate gamma process parameters. Parameter estimation is performed using Markov chain Monte Carlo methods, yielding a full Bayesian posterior distribution for the model which can be sampled for synthetic data generation. Additionally, we model the additive and multiplicative background functions for Raman and CARS with Gaussian processes. We train two Bayesian neural networks to estimate parameters of the gamma process which can then be used to estimate the underlying Raman spectrum and simultaneously provide uncertainty through the estimation of parameters of a probability distribution. We apply the trained Bayesian neural networks to experimental Raman spectra of phthalocyanine blue, aniline black, naphthol red, and red 264 pigments and also to experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose. The results agree with deterministic point estimates for the underlying Raman and CARS spectral signatures.
Deep neural networks offer a compelling solution for automatic spectral correction across various applications, from weather predictions13–15 to medical imaging16–18 and many others.19–23 In the realm of Raman spectroscopy, deep neural networks have been used in chemical species identification and background removal.24–28 Similarly, they have been applied to extract the underlying Raman spectra from CARS measurement.29–35 Despite their efficacy, non-Bayesian neural networks lack a critical feature: the ability to quantify uncertainty in Raman spectrum estimation. Bayesian inference, on the other hand, provides an avenue to solve this problem.
Bayesian inference treats the parameters of a given model as random variables. These models consist of a likelihood function that is combined with prior distributions for the parameters to produce posterior estimates. The likelihood function is analogous to a utility function in an optimization context. It quantifies how well the model fits the observed data. The aforementioned prior distributions for the model parameters represent the information known beforehand, including any constraints dictated by the physical nature of the parameters, such as non-negativity. In spectroscopic analysis, the model parameters can be, for example, amplitudes, locations, and widths of Gaussian, Lorentzian, or Voigt line shape functions. The combination of the likelihood and the priors results in a posterior distribution over the model parameters. The posterior is a probabilistic representation of the uncertainty in the parameter estimates. Bayesian approaches have been considered for estimating spectrum parameters, where the authors used sequential Monte Carlo algorithms to numerically sample from the posterior distribution.36,37 While the uncertainty quantification provided by Bayesian modeling and Markov chain Monte Carlo (MCMC) methods is compelling, the approach is known to be computationally expensive, see for example.38 This becomes a major issue particularly with hyperspectral data sets. A hyperspectral data set, or an image, consists of pixels where each pixel contains a spectrum. This can quickly result in millions of individual spectra, experimental or synthetic, which are to be analyzed.
Bayesian neural networks are a synthesis of the aforementioned two ideas. Bayesian neural networks model the weights and biases of standard neural networks as random variables, which can be assigned prior distributions. When combined with a likelihood according to Bayes’ theorem, the resulting utility function corresponds to the posterior for the neural network parameters. Advantages of this Bayesian neural network approach in comparison to non-Bayesian neural networks include robustness in terms of overfitting, providing uncertainty estimates instead of only point estimation, sequential learning, and better generalization.39 In particular, uncertainty quantification has seen widespread research covering many application areas and topics, for example.40
One of the challenges of Bayesian neural networks is that they typically contain an enormous number of parameters. For instance, our network comprises over 11 million parameters, far beyond what is commonly considered high-dimensional for MCMC.41,42 Some neural networks, such as large language models (LLMs), can have billions of parameters.43 Thus, it can be challenging to establish convergence of such a large number of parameters in a statistically rigorous manner. To combat this, partially-Bayesian neural networks have been used as a practical tool to provide uncertainty estimation with neural networks. In addition to empirical validation through practice, studies have provided compelling analytical and numerical evidence that partially-Bayesian neural networks are indeed capable of providing posterior estimates on par or even superior performance to fully-Bayesian neural networks.44 The above points lead us to construct our neural network for this study as a partially-Bayesian neural network.
Neural networks typically require large volumes of training data. This has been noted to be a problem also in spectroscopic applications as it is difficult to acquire large sets of independent data sets.28 Therefore, many studies mentioned above use synthetic data to train the neural networks. The synthetic data is usually generated using random linear combinations of Lorentzian line shapes, where the amplitudes, locations, and widths are sampled from predefined probability distributions, see for example.25,29,45 The background data is generated similarly. The backgrounds are modeled explicitly using a parametric functional form, such as a polynomial or a sigmoidal function, and the parameters of the model are again sampled from a predefined probability distribution.25,32,46 An extension to this is to use experimental Raman spectra on top of the randomly generated spectra.34
Stochastic processes can be used to draw samples of random functions. A typical example of a stochastic process is the widely-used Gaussian process (GP). Properties of the drawn samples such as differentiability are governed through kernel functions, which are used to model dependencies between data points. For readers unfamiliar with GPs, we recommend the book by Rasmussen and Williams.47 Instead of using explicit, parametric functions to model the spectroscopic features, we propose using stochastic processes as a more flexible tool for the purpose. In this study, we use GPs as a generative model for the additive and multiplicative backgrounds of Raman and CARS spectra, see Fig. 1.
For the purpose of generating synthetic Raman spectral signatures, we propose a specific type of doubly-stochastic Lévy process which we call a log-Gaussian gamma process. Our construction of the log-Gaussian gamma process is inspired by log-Gaussian Cox process which the authors have previously used as a model for spectra.48 While it makes sense to model spectra as a Cox process where the relaxation from higher energy levels happens at a constant rate and results in counts of photons, the data is often available in scaled floating-point numbers which prevents direct application of the log-Gaussian Cox process model. Gamma-distributed variables have direct connections to Poisson-distributed variables, which constitute the Cox process, making the extension to a log-Gaussian gamma process intuitive as a model for Raman spectroscopy. The log-Gaussian gamma process can be used to generate arbitrary amounts of synthetic spectra once parameters of the stochastic process have been estimated. We perform the estimation using MCMC methods which allow us to construct a Bayesian posterior distribution for the model parameters, thereby including uncertainty of the parameter estimates in our data generation. This also applies to our GP-based background model. We present a high-level diagram of our stochastic process method for data generation in Fig. 1. Fig. 2 shows an example of the aim of this paper, a Raman spectral signature extracted from a CARS spectrum using a Bayesian neural network. We provide a pseudo-code description of our approach in Algorithm 3.
The key contributions of this paper are the following. We propose using log-Gaussian gamma processes for modeling Raman spectral signatures and GPs to model additive or multiplicative background signals. The aforementioned doubly-stochastic processes are sampled randomly, enabling us to generate an arbitrary number of synthetic spectra that are statistically similar to experimental spectra. Finally, we present a partially-Bayesian neural network for analyzing Raman and CARS spectral measurements, which we train using the sampled synthetic spectra. Once trained, we use these neural networks to estimate the spectral signatures for experimental Raman spectroscopy measurements of phthalocyanine blue, naphthol red, aniline black, and red 264 pigments and for experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose in addition to synthetic test spectra.
Algorithm 1 Log-Gaussian gamma process data generation for training Bayesian neural networks | |
---|---|
Step 1: | Fit a log-Gaussian gamma process to Raman spectrum data. |
Step 2: | Fit a GP to background data. |
Step 3: | Draw a large number of realizations from the fitted log-Gaussian gamma process. |
Step 4: | Draw a large number of realizations from the fitted GP. |
Step 5: | Use a forward model to combine the realizations to form a data set of synthetic spectra. |
Step 6: | Use a forward model to combine the realizations to form a data set of synthetic spectra. |
The rest of the paper is structured as follows. We detail the steps used to generate the synthetic training data in three stages in the following three sections. We first present the log-Gaussian gamma process as a model for Raman spectral signatures and explain how to draw realizations of this doubly-stochastic process. This is followed by a description of our GP-based additive and multiplicative background models. We finalize the explanation of our synthetic data generation method with definitions of the forward models used to simulate synthetic training data for Raman and CARS measurements with additive and multiplicative backgrounds, respectively. Next, we present our partially-Bayesian neural network architecture, which we train against the synthetic data sets that we have generated. We document computational details and prior distributions in the next section, followed by a presentation of our results for both artificial and real experimental data. Finally, we conclude with a discussion of the significance and other potential applications for our method.
rk: = r(νk) ∼ Gamma(α,β(νk)), | (1) |
We extend eqn (1) by modeling the log-scale as a GP, resulting in a hierarchical model
log![]() | (2) |
![]() | (3) |
The posterior distribution involves the likelihood function , a log-GP prior for the scale π0(β(ν)|μβ,θβ), and a joint prior distribution π0(α,μβ,θβ) for rest of the model parameters. Given a measured Raman spectrum r: = (r(ν1),…,r(νK))T, we can formulate the likelihood as a product of conditionally-independent, gamma-distributed random variables
![]() | (4) |
![]() | (5) |
![]() | (6) |
In the posterior in eqn (6), the dimension of β(ν) is K.
The scale is a vector of the same dimension as the data, . MCMC methdos are known to struggle estimating high-dimensional parameters. At a minimum, the high-dimensional parameters incur a computational cost for inference with MCMC. To amend these issues and to simplify the inference, we perform dimension reduction for the scale β(ν). To achieve this, we observe that our data r should be a reasonable estimate for the expectation of the gamma process in eqn (1),
. This implies that the shape of the data r is close to the shape of the scale function, β(ν). Thus, we approximate the scale β(ν) as a convolution between a Gaussian kernel and the data
β(ν) ≈ cβG(ν;σG)*r, | (7) |
![]() | (8) |
Given samples from the posterior distribution π(α,cβ,σG,μβ,θβ|
r) obtained with MCMC, we can sample realizations for the synthetic spectra to generate an arbitrary amount of synthetic data in the following way. First, we sample the GP parameters (
β
β)T from the MCMC chain. Next, we use (
β
β)T sample a GP realization
(ν*|
β
β) at prediction locations
modeling the scale β(ν*) with
![]() ![]() ![]() ![]() ![]() | (9) |
We normalize the realizations (ν) such that max{
(ν)} = 1 and introduce an additional parameter to control amplitudes of the realizations. With an amplitude parameter A, we sample a normalized shape
N(ν
|
ψ) of the spectrum and multiply this by a sampled amplitude Ã. This procedure results in the following statistical model
![]() | (10) |
![]() | ||
Fig. 3 Example realizations drawn from the log-Gaussian gamma process model defined in eqn (10). On the left, realizations for the scale process β(ν), drawn from a log-Gaussian process. On the right, corresponding realizations from the gamma process. All realizations are normalized and multiplied by a sampled amplitude. |
As noted above, we model additive or multiplicative spectral backgrounds as a GP
e(ν) ∼ GP(μe,Σe(ν,ν,θe)), | (11) |
![]() | (12) |
Given a measurement of the background process, e: = (e(ν1),…,e(νK))T, we can formulate a posterior distribution for the background GP parameters (μe,θe)T as
![]() | (13) |
![]() | (14) |
Given a posterior π(μe,θe|
e), we construct realizations for the spectrum by drawing realizations from the GP predictive distribution. We sample starting and ending points for the background function from priors π0(estart) and π0(estop), π0(estart,estop) = π0(estart)π0(estop). Next, we compute the predictive mean
e*(ν![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() | (15) |
![]() | (16) |
With the above mathematical machinations, we can sample realizations for the background function by
ẽ(ν![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() | (17) |
![]() | (18) |
![]() | ||
Fig. 4 Example realizations drawn from the background model defined in eqn (18) for a multiplicative background. The starting and end points are sampled from a prior distribution and the GP predictive mean and covariance are used to sample the background shape. |
Raman spectra y(ν) with an additive background B(ν) are constructed using
y(ν) ∼ r(ν![]() ![]() | (19) |
CARS spectra z(ν) are generated similarly to the additive Raman realizations. The CARS model consists of a multiplicative background function εm(ν|
μe,θe) distorting a CARS spectrum S(ν|BNR,ψ) given as
z(ν) ∼ εm(ν|μe,θe)S(ν|BNR,ψ), | (20) |
![]() | (21) |
![]() | ||
Fig. 5 Example realizations for the Raman spectrum model defined in eqn (19). The realizations correspond to the log-Gaussian gamma process realizations in Fig. 3. |
![]() | ||
Fig. 6 Example realizations for the CARS spectrum model defined in eqn (20). The realizations correspond to the log-Gaussian gamma process realizations in Fig. 3. |
To achieve a partially Bayesian neural network,44 we use a Bayesian layer for the first convolutional layer. Additionally, we augment the architecture with a probabilistic output layer. This transforms the neural network estimate into estimates of a stochastic process instead of directly estimating the Raman spectrum. We use a gamma distribution as our output layer, following our formulation of Raman spectra as a log-Gaussian gamma processes. We also found that L1 or L2 regularization was not necessary for the deterministic parts of the network and therefore only employ Dropout53 regularization with the last dense layer of the network. This in agreement with the documented robustness of Bayesian neural networks with respect to overfitting.39 The above results in the following partial posterior probability distribution, or cost function, used for training the neural network
![]() | (22) |
![]() | (23) |
In the log-Gaussian gamma process section, we estimate parameters of a doubly-stochastic process via MCMC. The Bayesian neural network architecture proposed here can be seen as an estimate of a triply-stochastic process where the neural network outputs are two stochastic process realizations αNN(ν) and βNN(ν), an extension to the analytical log-Gaussian gamma process in Section 2 where the log-Gaussian parameterization of the scale process βNN(ν) is used for mathematical convenience due to the closed form of the probability density in eqn (5).
The uncertainty quantification of the Bayesian neural network is achieved in two stages, the Bayesian convolutional layer and gamma distributed output layer. Numerical samples are generated for the parameters of the Bayesian convolutional layer during the training process. These numerical samples are propagated through the deterministic layers of the network and ultimately into the output layer. The output layer, modelled as a gamma process, outputs a gamma distribution for each prediction point which ultimately controls the uncertainty of the spectrum estimate.
![]() | ||
Fig. 8 On the left, experimental Raman spectra of phthalocyanine blue, naphthol red, aniline black, and red 264 pigments in blue and point estimates for their respective additive backgrounds B(ν). On the right, point estimates for the underlying Raman spectra corresponding to the Raman measurements on their left. The 4 cases are used to estimate the LGGP and GP parameters defined in the posterior distributions in eqn (10) and (18). |
![]() | ||
Fig. 9 On the left, experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose in blue and point estimates for their respective multiplicative backgrounds εm(ν). On the right, point estimates for the underlying Raman spectra corresponding to the CARS measurements on their left. The 4 cases are used to estimate the LGGP and GP parameters defined in the posterior distributions in eqn (10) and (18). |
We run the DRAM algorithm with 5 proposal steps and with a length of 100000 samples for both the log-Gaussian gamma process parameters and the GP parameters. We use a burn-in of 50
000 samples. The prior distributions for the log-Gaussian gamma process likelihood and the GP background likelihood are documented in Table 1. We use TensorFlow and TensorFlow Probability together with Keras to implement the neural network architecture.56–58 We use the Adam optimizer for estimating the network parameters ΨD and ΨS.
The aforementioned MCMC sampling runs were done on an AMD Ryzen 3950X processor. Wall times for the MCMC sampling ranged from a couple of minutes to approximately one hour, depending on the number of data points in the spectra. The MCMC samplers can be run embarrassingly parallel for each measurement spectrum. Training the Bayesian neural network took approximately two hours on an NVIDIA 1070 graphics card for sets of 500000 spectra. Given the small computational cost of the MCMC sampling and training the neural network, which are offline costs, the main computational limitation comes from loading data during inference.
![]() | ||
Fig. 10 On the left, synthetic Raman test spectra generated using eqn (19). On the right, corresponding Raman spectrum estimates with the median estimate shown in solid blue and the 90% confidence interval in shaded blue. The solid black line shows the ground truth spectrum. |
Results for synthetic CARS spectra are shown in Fig. 12 and results for experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose are presented in Fig. 13. The spectra themselves were not part of the training data set. The results show the median estimate of the Raman spectrum obtained from the trained Bayesian neural network along with the 90% confidence intervals of the Raman spectrum estimate. We overlay the Raman spectrum estimate with a scaled versions of the point estimates in Fig. 9. The point estimates are scaled such that the minima and maxima of the point estimate are equal to the minima and maxima of the median estimate of the Raman spectrum. The results coincide with the overall shape of the point estimates, supporting the validity of the data generation approach and the Bayesian neural network design.
![]() | ||
Fig. 12 On the left, synthetic CARS test spectra generated using eqn (20). On the right, corresponding Raman spectrum estimates with the median estimate shown in solid blue and the 90% confidence interval in shaded blue. The solid black line shows the ground truth spectrum. |
As additional validation for the synthetic spectrum generation approach, we compare the cost function values, see eqn (6), of the fully synthetic training spectra to the cost function values obtained for partially experimental spectra. We generate the partially experimental spectra by taking the point estimates of the experimental CARS spectra in Fig. 9 and generating sets of CARS spectra with the forward model defined in eqn (20) combined with the GP realizations. The results are in agreement which implies that our log-Gaussian gamma process is capable of generating valid Raman spectra for training neural networks. This approach is similar to approaches used for approximate Bayesian computation, see for example.60,61 The resulting log cost function distribution of the synthetic spectra and the partially experimental spectra are provided in the ESI.†
This data generation method is applied to train two Bayesian neural networks, specifically designed for correcting spectral measurements. One network is tailored for Raman spectra with additive backgrounds, while the other is optimized for coherent anti-Stokes Raman scattering (CARS) spectra with multiplicative backgrounds. Bayesian neural networks expand upon prior research involving neural networks for spectral corrections, offering not only point estimates but also the critical capability of uncertainty quantification.
Our approach is validated using synthetic test data generated from the stochastic processes and experimental Raman spectra of phthalocyanine blue, aniline black, naphthol red, and red 264 pigments, along with experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose. The results demonstrate excellent agreement with deterministically obtained point estimates of the Raman spectra, while simultaneously providing valuable uncertainty estimates for the Raman spectrum estimates.
As a future avenue of research, our log-Gaussian gamma process formulation could be extended to a mixture of log-Gaussian gamma processes, similarly to mixtures of Gaussian processes.62–64 Such an extension would allow modelling of nonstationarity and heteroscedasticity, meaning different signal or noise behaviour at different parts of measurement spectrum. A more straight-forward modification, if necessary, would be to include an additional noise term consisting of, for example, Gaussian white noise process to the log-Gaussian gamma process formulation.
In addition, log-Gaussian gamma processes might be used to model other inherently positive measurements such as reflectance, absorbance, fluorescence, or transmittance spectra65–67 and other non-spectroscopic data sets like pollutant or protein concentrations or masses of stellar objects, for which Gaussian processes have been used68–70 Modifications to the log-Gaussian gamma process and the Bayesian neural network should be minimal due to the uninformative priors and general purpose structure of them. The generation of synthetic data sets for other types of problems requires an appropriate forward model. If the forward model has non-local behaviour, it might warrant architectural changes to the Bayesian neural network such as augmenting the first layer with a dense layer for modelling the non-local dependencies in the data.
A comparison study similar to35 would be an interesting avenue of future research. However, a direct comparison between non-Bayesian and Bayesian neural networks is not straight-forward due to the missing uncertainty quantification of the non-Bayesian neural network estimates. One could possibly use the Dropout regularization as an approximate Bayesian approach.71 This could be combined with simulation-based calibration72 to provide an appropriate one-to-one comparison between the non-Bayesian and Bayesian neural networks.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cp04960d |
This journal is © the Owner Societies 2024 |