Katherine J. I.
Ember
ab,
Nassim
Ksantini
ab,
Frédérick
Dallaire
ab,
Guillaume
Sheehy
ab,
Trang
Tran
ab,
Mathieu
Dehaes
cde,
Madeleine
Durand
bf,
Dominique
Trudel
bg and
Frédéric
Leblond
*abg
aDepartment of Engineering Physics, Polytechnique Montréal, Montreal, Quebec, Canada. E-mail: frederic.leblond@polymtl.ca
bCentre de Recherche du Centre Hospitalier de l'Université de Montréal (CRCHUM), Montreal, Quebec, Canada
cDepartment of Radiology, Radio-oncology and Nuclear Medicine, Université de Montréal, Montreal, Canada
dInstitute of Biomedical Engineering, Université de Montréal, Montreal, Canada
eCentre de Recherche du Centre Hospitalier Universitaire Sainte-Justine (CRCHUSJ), Montreal, Canada
fInternal Medicine service, Centre Hospitalier de l'Univsersité de Montréal (CHUM), Montreal, Quebec, Canada
gInstitut du cancer de Montréal, Montreal, Quebec, Canada
First published on 21st October 2024
With greater population density, the likelihood of viral outbreaks achieving pandemic status is increasing. However, current viral screening techniques use specific reagents, and as viruses mutate, test accuracy decreases. Here, we present the first real-time, reagent-free, portable analysis platform for viral detection in liquid saliva, using COVID-19 as a proof-of-concept. We show that vibrational molecular spectroscopy and machine learning (ML) detect biomolecular changes consistent with the presence of viral infection. Saliva samples were collected from 470 individuals, including 65 that were infected with COVID-19 (28 from hospitalized patients and 37 from a walk-in testing clinic) and 251 that had a negative polymerase chain reaction (PCR) test. A further 154 were collected from healthy volunteers. Saliva measurements were achieved in 6 minutes or less and led to machine learning models predicting COVID-19 infection with sensitivity and specificity reaching 90%, depending on volunteer symptoms and disease severity. Machine learning models were based on linear support vector machines (SVM). This platform could be deployed to manage future pandemics using the same hardware but using a tunable machine learning model that could be rapidly updated as new viral strains emerge.
Current viral testing techniques are limited. For COVID-19, the average polymerase chain reaction (PCR) tests are the gold standard. However, these are time-consuming, require trained personnel and are expensive. The average PCR test costs $ 127 (ref. 3) and takes hours from sample acquisition to diagnosis if a PCR machine is on site,4 which is often not the case. Meanwhile, rapid antigen tests are limited in accuracy with recent studies suggesting that 90% of asymptomatic individuals go undetected.5 Furthermore, both PCR and rapid antigen tests rely on tailored biochemical reagents which must potentially be re-adapted for new viral strains. Indeed, the U.S. Food and Drug Administration (FDA) has warned that PCR tests may be less effective at detecting variants, a situation that has already been observed with influenza.6 Management of future pandemics require low-cost, high-sensitivity tests allowing frequent screening regardless of variants.
Raman spectroscopy (RS) is a label-free analytical tool using laser light to yield molecular information about samples.7 The Raman scattering phenomenon was predicted in 1928 by Smekal8 and experimentally proven in 1928 by Raman and Krishnan.9 When light is shone at a sample, the light inelastically scattered back from the sample gives information about molecular structure and bonding, visualized as a Raman spectrum. The position of a peak on the x-axis of a Raman spectrum gives information about the identity of the molecule responsible for the peak. Meanwhile, the y-axis is related to the concentration of the molecules in the sample.7,10,11 RS has proven sensitive to disease and metabolic state of tissues and biofluids, rendering it a possible screening tool for cancer,12–14 organ health,15 and pathogenic infections.16,17 We developed a high-speed, low-cost, reagent-free, portable instrument using RS to detect the biomolecular changes associated with COVID-19 infection in saliva. The switch from normal cellular metabolism to viral synthesis,18 the death of host cells,19 and the activation of the immune response20 all bring about biomolecular changes which may affect biofluid composition. In previous studies, vibrational spectroscopy has been used to detect COVID-19 infection in saliva from senior, hospitalized volunteers.21,22 Ember et al. (including authors of this manuscript) extended this to asymptomatic and symptomatic volunteers at a walk-in COVID testing clinic.23 This achieved a sensitivity of 79% and a specificity of 75% in males, and a sensitivity of 84% and a specificity of 64% in females. However, these results were obtained using an expensive, commercial, slow RS instrument unsuitable for widespread testing.
An RS-based rapid COVID test would not rely on chemical reagents which can be costly, require refrigeration, have limited shelf life, and can lose their specificity as the virus mutates. This can limit deployment in low- and middle-income countries where rapid on-site screening with minimal sample preparation and operator involvement can be required. Furthermore, an RS-based system can integrate a user-friend graphical user interface into a portable device, rendering it more widely applicable than PCR tests. Here, a study is presented that was designed to evaluate Raman spectroscopy for its potential to detect COVID-19 infected individuals based on their saliva, specifically in the supernatant. The dataset presented was collected during the COVID-19 pandemic in 2020 and is the first demonstration of label-free, optical COVID-19 infection detection using liquid saliva samples. It is also the first case in which a system suitable for point-of-care use has been employed. As of 2023, SARS-CoV-2 symptoms have rapidly declined in severity; however, it remains in widespread circulation and may mutate into more severe variants. Infection is associated with lung damage,24 brain damage25 and long-term exhaustion.26
Outbreaks in the workplace continue to directly impact the economy whilst outbreaks in care homes and hospitals often prove lethal. Detection of the virus, therefore, remains paramount.
A saliva preparation protocol was developed aiming at minimizing the presence of confounding saliva constituents during the spectroscopy interrogation process (e.g., food products) which is described in detail in a paper by Ember et al.23 The protocol allows the sample to be collected at any moment during the day and includes using rinsing with water, and optimized centrifugation cycles. This separates samples into pellet (containing food debris, not used) and supernatant component used for analyses. Samples were aliquoted into 4 tubes and stored in a −80 °C freezer for research. Supernatant samples were thawed and a 10 μL drop pipetted and deposited on a low-Raman background aluminum holder, and Raman spectra were immediately acquired from the liquid sample. Each spectrum was correlated with the correspondent demographics and infection status characteristics (ESI Table S1†). For dried saliva supernatant samples, a 10 μL drop pipetted and deposited on a low-Raman background aluminum slide, and dried for 45 minutes.
Before each measurement the system CCD sensor was cooled to −80 °C. Calibration of the x-axis (Raman shift) was determined from a spectrum acquired using acetaminophen powder (Tylenol®) prior to each measurement. The system response was characterized using the fluorescence spectrum of a standard reference material (SRM 2214, National Institute of Standards and Technology, NIST, USA). For each sample, a dark count measurement was taken with the laser off (integration time of 1600 ± 800 ms). Then, a series of 200 repeat spectroscopy measurements were acquired. The laser power was kept at a fixed output value of 890 mW at the surface of the biofluid. Automated exposure control was used to optimize the overall photon counts while ensuring no camera saturation resulted.27 The laser spot size was approximately 1 mm2. Several processing steps were applied to each raw spectrum to isolate the vibrational spectroscopy contribution using a custom software.20 These included: averaging of the 200 repeat spectra, subtraction of dark count spectrum acquired with the laser turned off, normalization with the NIST standard to correct for the instrument response, x-axis (wavenumber shift) calibration and interpolation, baseline removal of low-frequency background signals using the BubbleFill algorithm,28 and standard normal variate (SNV) normalization.28 The spectral range is of the instrument is 350 cm−1–2100 cm−1. Furthermore, spectra are cropped at 1500 cm−1 because of the very large water peak between ∼1520 and 1720 cm−1, which completely covers the Amide I peak.
A few metrics were computed to assess of the signal quality of each measured spectrum. Prior to the SNV normalization of the Raman spectra, two metrics were computed: (i) the signal-to-noise ratio (SNR), which is the ratio between the total number of photonic counts in a Raman spectrum and the square root of the total signal (Raman + baseline + dark count), and (ii) the signal-to-background ratio (SBR), namely the ratio between the total number of photonic counts in the Raman spectrum and in the baseline. Then a quality factor (QF) value was computed on the SNV-normalized, which was used to separate the dataset into low and high quality (ESI Fig. S1†). The QF metric ranges between 0 and 1; low QF values usually means higher stochastic noise, lower inelastic scattering photonic counts, and poorly defined Raman peaks. A random signal would have a QF close to 0.28
In each SNV-normalized spectrum, any peak with an intensity higher than 0.5 was fitted with a Gaussian to extract its position, height, and width, with a 2 cm−1 tolerance on the position. Only peaks present in at least 50% of the dataset were considered for training machine learning models.29
To compare liquid and dried saliva supernatant samples, measurements were taken from 136 dried samples and 298 liquid samples.
Negative group size (after cutoff) | Negative group description | Positive group size (after cutoff) | Positive group description | |
---|---|---|---|---|
Model 1a | 251 (174) | PCR negatives | 37 (30) | PCR positives |
Model 1b | 251 (174) | PCR negatives | 28 (23) | Hospitalized |
Model 1c | 251 (174) | PCR negatives | 65 (53) | 37 PCR positives + 28 hospitalized |
Model 2a | 405 (293) | 251 PCR negatives + 154 healthy | 37 (30) | PCR positives |
Model 2b | 405 (293) | 251 PCR negatives + 154 healthy | 28 (23) | Hospitalized |
Model 2c | 405 (293) | 251 PCR negatives + 154 healthy | 65 (53) | 37 PCR positives + 28 hospitalized |
Model 3a | 154 (124) | Healthy | 37 (30) | PCR positives |
Model 3b | 154 (124) | Healthy | 28 (23) | Hospitalized |
Model 3c | 154 (124) | Healthy | 65 (53) | 37 PCR positives + 28 hospitalized |
Prior to machine learning model training and validation, the feature pool set consists of 700 individual intensity bins and 45 peak features (15 peaks × 3 features). The number of features was reduced to include only those that contributed the most to the variance between the categories. This was accomplished using linear support vector machines (SVM) with L1 regularization (regularization parameter between 0.05 and 0.5). Machine learning model training from the dimensionally reduced features set was then done using linear SVM with the regularization parameter C. Each time a model was trained, hyperparameters (number of features, C) were selected by carrying out a grid search across all possible combinations. The regularization parameter C was varied between 0.05 and 1, the number of individual bands was varied between 10 and 40 for the individual intensities and between 5 and 15 for the peak features. Because we wanted our models to account for age and sex, these two variables were added to the training pool set as extra features, making the total number of features for a given model between 17 and 57. This was done as sex and age had been revealed to be a potential confounding variable in other analysis.23 Only spectra with QF > 0.4 were considered for classification and outliers were also removed (spectra with an intensity at 1002 cm−1 lower than 1). For each combination, performance was assessed using leave-one-out 5-fold cross validation based on the number of false/true positives and false/true negatives, by comparing the model prediction with the assigned PCR test result. Accuracy, sensitivity, and specificity were calculated from a receiver-operating-characteristic (ROC) curve analysis and the area under curve (AUC) was reported. The features that were retained by the model for machine learning model building are detailed in Table 2.
Peak center [cm−1] | Biomolecular assignement | Models | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
1a | 2a | 3a | 1b | 2b | 3b | 1c | 2c | 3c | ||
878 | Phosphate (dihydrogen phosphate)23,31 | √ | √ | √ | √ | √ | ||||
927 | Protein (N–C–C)23,32 | √ | ||||||||
990 | Phosphate (monohydrogen phosphate)23,31 | |||||||||
1001 | Protein (phenylalanine, tryptophan),23,30,32 carotenoids,33,34 urea23 | √ | √ | √ | ||||||
1046 | Nitrate,23,31 protein (phenylalanine)23,30,32 | |||||||||
1080 | Phosphate (dihydrogen phosphate, monohydrogen phosphate),23,31 lipid (C–C)35 | |||||||||
1090 | √ | √ | √ | |||||||
1126 | Fatty acid (C–C),35 protein (C–N, serine),23,30,32 glucose33,34,36 | √ | √ | √ | √ | √ | √ | √ | √ | |
1163 | Fatty acid (C–C)35 | √ | √ | √ | √ | |||||
1205 | Protein (tyrosine)23,30,32 | √ | √ | √ | ||||||
1245 | Protein (amide III)23,32 | √ | √ | √ | √ | |||||
1267 | Protein (amide III, histidine, valine),30,32 glucose,33,34,36 lipid (CH)35 | √ | √ | √ | √ | √ | √ | |||
1347 | Protein (histidine, leucine, lysine, methionine, serine, threonine)23,30,32 | √ | √ | |||||||
1416 | Lipid (beta, CH2),35 protein (alanine, cystine, glycine, lysine, methionine, proline, serine, threonine)23,30,32 | √ | √ | |||||||
1453 | Protein (C–H, glycine, isoleucine, lysine, valine)23,30,32 lipid (CH2/CH3)35 | √ |
Time from supernatant being obtained to spectral acquisition was 51 minutes for dried samples and 6 minutes for wet samples. The mean signal-to-noise (SNR) for dried saliva supernatant was 15.31 ± 7.66 whilst the mean SNR for liquid saliva supernatant was 15.73 ± 4.98. We are therefore able to obtain slightly more stable and Raman-rich data from wet samples than from dried samples. The signal to background ratio (SBR) for dried saliva supernatant was 0.0048 ± 0.0016 whilst the mean SNR for liquid saliva supernatant was 0.0044 ± 0.0023.
Spectral shifts in some peaks were apparent when comparing wet samples with dried samples. This is consistent with numerous studies indicating that Raman spectra of biomolecules change depending on whether they are interrogated in solid or liquid state.36 For example, peaks at 877, 989 and 1079 cm−1 were clearly visible in the liquid saliva (Table 2) and were not as apparent in the dried saliva. The peaks at 877 and 1079 cm−1 correspond to the peaks of dihydrogen phosphate (H2PO4) in solution and also to weak peaks of monohydrogen phosphate (HPO4) in solution, and the peak at 989 corresponds with the very strong peak of monohydrogen phosphate in solution (ESI Fig. S3†).30
It is also worth noting that the saliva supernatant spectra from dried samples taken using a Raman microscope and those taken using our device are comparable in terms of relative peak intensity and position (ESI Fig. S4†).
Raman spectra from the COVID positive PCR samples taken from the COVID testing clinic (blue line, Fig. 2A) exhibited greater overlap with the COVID negative PCR samples. However, there were still spectral differences at 876 and 989 cm−1, associated with phosphates,23,30 and at 1347 cm−1, associated with multiple amino acid side chains.23,31,36 This latter change suggests that COVID-19 infection is associated with either a difference in the composition of free amino acids in saliva, or a difference in the types of proteins found in saliva. Indeed, there are metabolomics studies with nuclear magnetic resonance (NMR) spectroscopy showing that alanine, glutamine, histidine, leucine, lysine, phenylalanine, and proline were all downregulated in COVID-19 PCR positives vs. PCR negatives.41
The greater overlap of the COVID-negative samples with the positives from the testing clinic is likely because testing clinic volunteers exhibited fewer symptoms and less severe pathology than the hospitalized patients. The metabolism of testing clinic volunteers was therefore less likely to be perturbed by the SARS-CoV-2 virus, and the saliva less likely to contain immune cells, cytokines, metabolic by-products, and cellular debris associated with COVID-19 infection. Multiple studies show that changes in metabolism such as ceramide metabolism, tryptophan degradation, lipoproteins and cholesterol.42,43 Notably, one saliva sample from a hospitalized COVID patient had visible tissue (possibly lung tissue) present in the sample whilst this was not the case with any testing clinic samples. Saliva from hospitalized patients was also much stickier than from other individuals.
An ML model for discrimination between PCR confirmed COVID-19 negative individuals and hospitalized individuals resulted in an AUC of 0.95, corresponding to a sensitivity of 88% and a specificity of 87% (Model 2a, Fig. 2C, and Table 2). This sensitivity is 26% higher than the median lateral flow test sensitivity, and the specificity is only slightly lower, exhibiting an increase of 14% compared to detection of those in the testing clinic. As stated earlier, the improvement in performance when detecting COVID-19 in hospitalized individuals compared to testing clinic volunteers is to be expected, as disease severity can impact the viral load and the metabolic signature of COVID-19 in biofluids.46 The three key features used were 1001, 1126 and 1453 cm−1, all of which can be found in proteins, but also have contributions from glucose and lipids. The peak at 1001 cm−1 is assigned to phenylalanine which may be present within proteins or as a free amino acid. A study using NMR spectroscopy shows that phenylalanine in saliva is reduced in COVID-19 patients compared to PCR negative controls.41
Finally, a model was created to discriminate between PCR-confirmed COVID-19 negative individuals and the whole COVID positive dataset (hospitalized and testing clinic individuals). This led to an ROC curve with an AUC of 0.76, corresponding to a sensitivity of 75% and a specificity of 66% (Model 3a, Fig. 2D, and Table 2). All features overlapped with those of model 1A except there is one extra feature at 1205 cm−1 associated with tyrosine.23,31,36
An ML model trained to discriminate between total COVID-19 negative individuals (healthy + PCR negatives) and PCR positives resulted in an AUC of 0.79, corresponding to a sensitivity of 77% and a specificity of 70% (Model 1b, Fig. 2E, and Table 2). This was a slightly higher AUC than that of Model 1a. The ML model trained to classify total COVID-19 negative individuals and COVID-19 hospitalized patients showed an almost identical AUC, sensitivity, and specificity (0.95, 88% and 91% respectively) to the model produced with PCR-confirmed negative samples only (Model 2b, Fig. 2F, and Table 2). The model trained from the whole dataset (hospitalized and testing clinic individuals) led to a ROC curve with a greater AUC of 0.80, corresponding to a sensitivity of 69% and a specificity of 81% (Fig. 2G). Overall, these increases in accuracy may be because the “B” machine learning models have more examples of COVID-19 negative spectra than the “A” machine learning models. Therefore, they can distinguish features that are particular to COVID-19 negative spectra more easily than when using PCR-negative cases alone.
Model 1C discriminated between the healthy group and COVID-19 positive volunteers from a testing clinic, and had an AUC of 0.93, a sensitivity of 95% and a specificity of 83% (Model 1c, Fig. 2H, and Table 2). Both sensitivity and specificity were greatly improved compared to Model 1A. The sensitivity is 33% higher than median lateral flow test sensitivity and the specificity is 13% lower. These parameters would be highly useful in a case where the need for detection of infected individuals is greater than the detection of non-infected individuals, such as choosing which individuals could visit a clinic, hospital or care home.
Model 2c discriminated between the healthy group and COVID-19 hospitalized individuals with an AUC of 0.94, sensitivity of 81% and specificity of 91%, comparable to both “a” and “b” models.
Finally, Model 3c, which was developed to classify healthy and all COVID positive samples, achieved an AUC of 0.93, a sensitivity of 82% and a specificity of 92%. Both sensitivity and specificity are comparable to those of lateral flow tests.
All “c” models used the peak at 1163 cm−1 which was not present in the “a” models. This was associated with fatty acids, which are implicated in respiratory diseases. Metabolomics using liquid chromatography and high-resolution mass spectrometry have shown that the lipid profile of human sputum changes with different respiratory viruses (e.g. influenza H3, rhinovirus).48 Furthermore, Pérez-Torres et al. suggest that SARS-CoV-2 may alter fatty acid metabolism as total non-esterified fatty acids are reduced in plasma compared to healthy subjects.49 It may be that changes in fatty acids could indicate the presence of respiratory viruses. The peak at 1126 cm−1 is the most consistent feature between all models, although it was not a major feature in the Model 2B for discriminating between all negatives and COVID-19 hospitalised patients (Table 2). This peak is the major peak due to aqueous glucose. Glucose dysregulation is strongly linked with COVID-19 severity.50–52 The peak at 1126 cm−1 also has contributions from proteins and unsaturated lipids (discussed earlier).
The COVID-19 positive cohort analysed in the Ember et al. 2022 (see Table 1 of that article) were almost identical to the COVID-19 positive testing clinic cohort analysed in this study. In terms of the COVID-19 negative testing clinic cohort, we were able to analyse over six times more samples using our rapid single-point system than with the microscope system. We matched the positive and negative samples for as many demographic characteristics as possible. Finally, the addition of healthy volunteers who had no suspicion of respiratory infection allowed us to determine that there is an increase in accuracy when using COVID negatives who have no respiratory infection symptoms or chance of respiratory infection in ML models.
Overall, the preliminary results shown in this article provide enticing evidence that Raman spectroscopy and machine learning could be utilized for rapid biochemical analysis and biofluid classification for infectious disease screening. The true value of the platform applied to infectious disease characterization will come from machine learning models developed from larger scale datasets with high spectral quality. For example, our power studies suggest that at least 150 COVID-19 positive, 150 RSV positive and 150 influenza positive patients would allow us to use many features from the Raman spectra to develop truly generalizable models tested on independent hold-out datasets. Studies will be carefully designed to ensure that models can be trained from data fully capturing the heterogeneity of the general population and the disease pathology. For COVID-19 hospitalized patients, it is possible that the treatment for COVID-19 may itself influence the Raman spectrum of saliva through metabolic changes. Therefore, in a future clinical trial of such a device, controls would need to be taken from patients without COVID-19 but the same course of treatment.
The system used in this study is the size of a microwave on top of a cart, allowing it to be wheeled into a pharmacy, doctor's office or testing clinic. However, we have since developed a suitcase-based system for greater portability. The total cost of the system is five times cheaper than commercial Raman microscopes.
The new system will be adaptable to other biofluids, e.g., blood, urine and tears, and the detection of other diseases, e.g., seasonal influenza which kills 500000 people each year, measles which is one of the most infectious human viruses, and the early stages of cancer. In future projects, we also aim to investigate the effects of using RS to monitor COVID-19 progression in terms of the pre-symptomatic, symptomatic, and immunogenic periods of the disease. To this end, SERS nanoparticles or surfaces53 could be functionalized using ligands against viral proteins and/or antibodies,54–56 allowing selective enhancement of the Raman signal.
In conclusion, our spontaneous Raman-based saliva assay allows detection of biomolecular changes associated with viral infection in real time.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4an00729h |
This journal is © The Royal Society of Chemistry 2024 |