Kelly Browna,
Amy Farmera,
Sabina Gurunga,
Matthew J. Bakerb,
Ruth Boardc and
Neil T. Hunt
*a
aDepartment of Chemistry and York Biomedical Research Institute, University of York, UK
bSchool of Medicine and Dentistry, University of Central Lancashire, UK
cDepartment of Oncology, Lancashire Teaching Hospitals NHS Trust, Preston, UK. E-mail: neil.hunt@york.ac.uk
First published on 7th April 2025
Non-linear laser spectroscopy methods such as two-dimensional infrared (2D-IR) produce large, information-rich datasets, while developments in laser technology have brought substantial increases in data collection rates. This combination of data depth and quantity creates the opportunity to unite advanced data science approaches, such as Machine Learning (ML), with 2D-IR to reveal insights that surpass those from established data interpretation methods. To demonstrate this, we show that ML and 2D-IR spectroscopy can classify blood serum samples collected from patients with melanoma according to diagnostically-relevant groupings. Using just 20 μL samples, 2D-IR measures ‘protein amide I fingerprints’, which reflect the protein profile of blood serum. A hyphenated Partial Least Squares-Support Vector Machine (PLS-SVM) model was able to classify 2D-protein fingerprints taken from 40 patients with melanoma according to the presence, absence or later development of metastatic disease. Area under the receiver operating characteristic curve (AUROC) values of 0.75 and 0.86 were obtained when identifying samples from patients who were radiologically cancer free and with metastatic disease respectively. The model was also able to classify (AUROC = 0.80) samples from a third group of patients who were radiologically cancer-free at the point of testing but would go on to develop metastatic disease within five years. This ability to identify post-treatment patients at higher risk of relapse from a spectroscopic measurement of biofluid protein content shows the potential for hybrid 2D-IR-ML analyses and raises the prospect of a new route to an optical blood-based test capable of risk stratification for melanoma patients.
In parallel with progress in 2D-IR interpretation, the last decade has also seen considerable advances in measurement technology, with high pulse repetition-rate lasers and mid-IR pulse shaping meaning that a 2D-IR spectrum now takes just seconds or minutes to acquire.29–33
This combination of information density and data abundance makes 2D-IR a promising candidate for combination with data science approaches such as machine learning (ML) to maximise the insight obtained from experimental datasets. The possibility of linking ML with 2D-IR has been assessed using simulated data, showing the potential for models to learn spectral signatures of dynamic proteins.34,35 Experimental applications of ML to 2D-IR have also shown the ability to classify spectra of small, purpose-designed sets of chemically distinct samples.25
An important barrier for hybrid 2D-IR-ML approaches to cross however is to provide insights from experimental data that could not be achieved with traditional spectroscopic analyses. The ability to approach problems that are intractable by other means would open the door to many new applications in protein analysis ranging from structure interpretation and intermolecular interactions to biomedical analysis.36 To explore this, we have linked 2D-IR protein fingerprints and ML to classify blood serum samples collected from patients with melanoma according to their protein profile.
Human serum is a protein-rich fluid, containing around 70 mg mL−1 proteins composed mainly of serum albumin (35–50 mg mL−1) and the globulins (25–35 mg mL−1).37 The latter group is comprised of more than 50 individual types of protein, present at concentrations ranging from milligrams to less than micrograms per millilitre. The types and concentrations of proteins present in blood serum samples respond sensitively to metabolic processes37 and, of relevance to this study, the protein profile can also be a marker for disease.38–40
The range and varying abundance of constituent proteins also mean that measuring the serum protein profile quickly and directly is challenging. Infrared (IR) absorption spectroscopy studies have highlighted changes in protein signatures in samples from cancer patients,41 but a combination of a lack of resolution and confounding absorptions from water hinder direct interpretation of protein signals.42 Despite this, studies using sample drying or background subtraction methods have shown that IR signatures of blood serum samples can be used to detect cancers and have reported changes in the protein region of the spectrum, but detailed analysis was restricted to bands assigned to non-proteinaceous species.41–49 In contrast to IR absorption, 2D-IR not only spreads the protein signature over two spectral dimensions, increasing resolution, but also suppresses the background water absorption27 allowing a more direct and detailed measurement of changes in serum protein profiles without sample manipulation or background subtraction.
Here, we apply 2D-IR and ML to the problem of melanoma risk stratification. Melanoma is the fifth most common cancer in the UK, with incidences rising worldwide. A major challenge in treatment planning for melanoma patients is the accurate assessment of the post-operative risk of relapse. Patients at high risk of developing melanoma metastasis (relapse) after surgery can reduce the risk and increase their distant melanoma-free survival through adjuvant treatment.50–53 Whilst adjuvant therapies, both immunotherapy and BRAF-targeted treatments, reduce the recurrence risk, more work is required to distinguish patients needing treatment from those cured by surgery alone.54 This is important to healthcare providers in terms of reducing treatment burden and the high price of drugs, but vital to patients who could avoid treatment toxicities if adjuvant therapy is not required. Furthermore, melanoma patients at high risk of relapse undergo regular radiological imaging for five years post-surgery, irrespective of adjuvant therapy.55 This exposes patients to serial radiation, which increases the risk of cancer. The ability to identify patients with high-risk disease through alternative methods would therefore improve follow up stratification.
The diagnostic process to establish a patient's risk of relapse currently depends simply on the stage of the melanoma. A liquid biopsy, using biofluids to identify at-risk patients would therefore provide a step-change in early detection, leading to lifesaving and prolonging treatment whilst avoiding treatment toxicities in others. Our results show that a hybrid 2D-IR-ML approach is capable of differentiating serum samples according to diagnostically relevant groups. The considerable overlap of the spectra in these groups means that such an outcome would be extremely difficult without the application of ML tools and so highlights the potential of such methods. Although exploratory, our results also suggest that optical tools based on advanced spectroscopies and ML could have a role to play in future diagnostic approaches.
For the experiments described below, the output of both OPAs was centred at 1650 cm−1, resonant with the protein amide I mode. The OPAs produced usable bandwidths of >200 cm−1 with energies of 2.5 and 1.5 μJ per pulse, respectively, at a pulse repetition rate of 50 kHz.
2D-IR data collection was via a 2DQuick spectrometer (Phasetech) employing the pump–probe beam geometry and a mid-IR pulse shaper to generate and control the time delay (τ) between the pair of “pump” pulses.57,58 Signal detection was via 64-element HgCdTe array detector using the ZZZZ (parallel) polarization geometry, which maximises signal intensity. Each sample was measured at waiting time (Tw) values of 250 fs and 5 ps, yielding both the protein signal (Tw = 250 fs) and a small thermal signal from H2O (Tw = 5 ps) that was used for signal pre-processing and standardisation via previously published methods.27,59,60 For a given value of Tw, τ was scanned in steps of 24 fs to a maximum delay time of 3 ps, applying a rotating frame frequency of 1208 cm−1. Each 2D-IR plot represents the average of 500 spectra, repeated 3 times.
The serum samples were representative of three patient groups: the control group, where after surgery the patient did not present with a subsequent cancer diagnosis. The metastatic group where the presence of metastatic disease was already confirmed at the time the blood sample was obtained, and the developed metastasis group, where patients were radiologically cancer free following surgery but went on to develop metastatic disease within the five-year follow-up period. The sample cohort analysed consisted of 40 individual patients; 8 control, 21 metastatic and 11 developed metastasis. A breakdown of the relevant patient metadata for each class is given in Table S1.†
Each patient sample was measured in triplicate, generating three spectra per patient. To account for potential variations in instrument performance with time, control group spectra were collected during each measurement set, resulting in the measurement of 16 control samples, with each of the 8 individual control patients' serum measured twice. Overall, this resulted in the collection of 144 spectra. 48 spectra in the control group, 63 in the metastatic group and 33 in the developed metastasis group.
PLS was employed to address the high dimensionality of the spectral dataset by projecting the scaled and mean centred spectral data onto a lower-dimensional latent variable (LV) space while maximising covariance with the class labels. PLS was applied independently to each training and test split to extract 15 LVs representing the most informative spectral features. During execution of the nested CV, the overlap within the feature space between the training and testing PLS LV scores was assessed by comparing the distribution of PLS scores for the training and test sets in each outer fold. This evaluation confirmed that the test and training sets within each outer-fold produced scores of similar magnitudes, validating the suitability of this approach within the nested CV PLS-SVM model (Fig. S2†). The extracted LV scores were then used as input features for training the SVM models with a radial basis function (RBF) kernel. Hyperparameters for the cost parameter (C) and sigma were optimised using a grid search strategy with area under the one-vs.-all receiver operating characteristic curve (AUROC) of the validation sets used to guide parameter selection.
For each outer fold, the final SVM model was trained on the full training set with the optimal hyperparameters obtained from the inner loop. Model performance was assessed using the independent test set using Cohen's kappa, sensitivity, specificity and AUROC parameters. Probabilistic predictions were recorded to facilitate post hoc analysis and visualisation of class separations. Variable importance in projection (VIP) scores were calculated for each PLS LV to assess their contribution to the model. VIP scores were computed by weighting each component's contribution to the explained variance of the PLS model. The use of the nested CV approach allowed for unbiased estimates of generalisation performance but also ensured model tuning and evaluation were conducted on strictly independent datasets. By employing a stratified, hierarchical framework, we mitigate the risk of overfitting, especially given the imbalanced dataset.
The spectra in Fig. 1(a)–(c) show averaged results encompassing all of the spectra measured from patients in each of the three groups (control (a), developed metastasis (b) and metastatic (c)). The spectra are broadly similar, as would be expected given the general similarities of human protein profiles, though some small differences are apparent in the amplitude and shape of the amide I bands in Fig. 1(a–c). Difference spectra (Fig. 1(d–f)) produced via subtraction of the spectra in Fig. 1(a–c) from one another show that the spectral changes between classes only appear clearly following magnification (Fig. 1(g–i)), revealing the subtle distinctions between the patient groups. This broad consistency between samples confirms the effectiveness of the data pre-processing strategy. It is encouraging to note that the changes displayed in Fig. 1(g–i) focus not only on the α-helix region of the spectrum, near 1660 cm−1, but also in the β-sheet region near 1630–1640 cm−1. This firstly suggests that the changes between samples are localised on the protein portion of the response, rather than spectral noise for example. Secondly, it suggests that there may be changes in both albumin and globulin content that can be used to differentiate spectra of the three sample groups.
Although we can extract these subtle changes through averaging of all spectra from a given group and careful spectral subtraction, visual classification on a per-sample basis would be challenging and unreliable, a fact that would be further complicated by patient-to-patient and sample-to-sample variation as protein levels respond to many everyday factors. The overlapping spectral features, combined with variations in peak intensity and shape, create an intricate pattern that does not lend itself to straightforward interpretation, as demonstrated by the application of PCA or PLS analyses (Fig. S3†). However, ML models offer a solution via the ability to identify patterns within complex datasets. By training models on a range of spectral data, the ability to detect subtle spectral variations can be developed, potentially enhancing classification accuracy. Our aim was thus to exploit ML methods to leverage the nuanced spectral information, improving diagnostic reliability that could ultimately reveal markers of disease progression or risk from the serum protein profile.
Subsequently, more powerful classification approaches such as k-Nearest Centroid (kNC), Random Forest (RF) and Support Vector Machines (SVM) were evaluated due to their proven efficacy for high-dimensional dataset classifications.62–67 All three hyphenated models (with PLS) were implemented using the nested CV approach described in the experimental section. The performance of each model was assessed using the standard evaluation metrics of AUROC, sensitivity and specificity. Each of these metrics provide unique insights into the model's classification performance. AUROC evaluates the model's ability to distinguish between classes, with values closer to unity indicating better discriminating power. Sensitivity assesses the ability to identify true positives correctly, which is crucial for detecting subtle spectral differences, while specificity evaluates the ability to identify true negatives correctly, reflecting the model's robustness in minimising false positives.
The performance of the three models is summarised in Table 1, where classification performance of the control, developed metastasis and metastatic groups is shown. The kNC model demonstrated moderate improvements in sensitivity and AUROC compared to PLS-DA for the developed metastasis and metastatic groups, although its performance for the control group remained limited. The RF model further improved AUROC and specificity, particularly for the metastatic group, but its sensitivity for the control and developed metastasis groups remained below commonly accepted performance standards.
Model | Parameter | Sample group | ||
---|---|---|---|---|
Control | Developed metastasis | Metastatic | ||
k-Nearest centroid (kNC) | AUROC | 0.53 | 0.66 | 0.73 |
Sensitivity | 0.53 | 0.56 | 0.93 | |
Specificity | 0.78 | 0.96 | 0.71 | |
Random Forest (RF) | AUROC | 0.63 | 0.70 | 0.81 |
Sensitivity | 0.36 | 0.50 | 0.93 | |
Specificity | 0.82 | 0.90 | 0.69 | |
Support vector machine (SVM) | AUROC | 0.75 | 0.80 | 0.86 |
Sensitivity | 0.69 | 0.72 | 0.70 | |
Specificity | 0.76 | 0.89 | 0.88 |
The SVM model emerged as the most effective approach, achieving the highest AUROC values across all groups (0.75, 0.80 and 0.86 for control, developed metastasis and metastatic, respectively). PLS-SVM also achieved a balance between sensitivity and specificity, with notable improvements in sensitivity for the control and developed metastasis groups. These results therefore show that SVM offers the most promising approach for addressing the classification challenges posed by the 2D-IR dataset.
The ROC curves (Fig. 2(b)) for each sample group further demonstrate the discriminative power of the PLS-SVM model, with AUROC values of 0.75, 0.80, and 0.86 for the control, developed metastasis, and metastatic groups, respectively. These values show that the model effectively separates the classes, particularly for the metastatic group, where the highest AUROC value reflects superior classification performance. The shape of the ROC curves for all groups, with an upward trajectory towards the top left corner of the plot, indicates high sensitivity and specificity across the range of classification thresholds. This progression highlights the ability of the model to classify true positives correctly while minimising false positives. The model achieves balanced sensitivity and specificity values across all groups (Table 1), with sensitivity values ranging from 0.69 for the control group to 0.72 for the developed metastasis group. Specificity values are higher, peaking at 0.89 for the developed metastasis group. These results indicate that the model is capable of correctly identifying true positives but also robust in minimising false positives. The Kappa value of 0.523 reflects a more moderate agreement between predicted and actual classifications but is still consistent with reliable performance of the model.
While the PLS-SVM model clearly captures the underlying patterns in the data, the inherent overlap in spectral features would be expected to impose a limitation on classification accuracy for this relatively small experimental dataset. This can be assessed via probability box plots (Fig. 2(c)), which provide a quantitative measure of predictive confidence returned by the model for each sample. These plots show that for each of the three groups there is a significant clustering of high probabilities for the correct class, showing that the model maintains strong confidence in its predictive ability. However, the box plots also illustrate the challenge posed by the overlapping spectral features, which results in a relatively wide distribution of probabilities showing lower confidence in some of the predictions. For example, the control group exhibits a broad distribution of predicted probabilities, with significant overlap into the developed metastasis and metastatic regions. Similarly, the developed metastasis group demonstrates a range of probabilities, perhaps reflecting its intermediate nature between the other two groups and so the potential for shared spectral characteristics with the control and metastatic classes. These overlapping distributions align with the misclassifications observed in the confusion matrix and highlight the presence of some uncertainty in distinguishing between groups but overall, the performance is strong, and uncertainties would be expected to be reduced with the addition of more data to the model.
It is instructive to consider the regions of the 2D-IR response that the ML model uses to make decisions when classifying samples. Variable Importance in Projection (VIP) scores show the contribution of each PLS LV to the model's classification performance. This not only provides useful spectroscopic insight but can also be used to assess whether classification was based on meaningful, biologically relevant spectral features, rather than random noise and to guard against overfitting. The VIP scores, Fig. 3(a), highlight the importance of the specific PLS components in distinguishing between the control, developed metastasis and metastatic groups. Components with VIP scores greater than unity are considered the most influential, as they capture significant variations in the data and reflect the spectral regions that contribute most to the model. Here we observe that the most important LVs identified are 15, 13 and 14, with LV 15, capturing the most significant variations. The corresponding loading plots (Fig. 3(b–d)) illustrate the specific spectral regions associated with these LVs. The most prominent features in LV 15 and 13 primarily lie in the region around 1660 cm−1 with additional contributions from the 1640 cm−1 region. LV 14 appears to be dominated by changes in 1625–1640 cm−1 region. It is noteworthy that, although no direct correlation between these loadings and the difference spectra discussed above are expected, or necessary for good model performance, many of the areas that arise in LVs 13–15 align with features in the inter-group difference spectra from Fig. 1 (reproduced in Fig. 3(e–g), see coloured arrows). These observations underline that the PLS loadings are identifying regions of the 2D-IR spectrum that correspond to the main parts of the amide I band, as would be expected for a model that is using spectral information for sample classification. In combination with the other parameters that are used to assess model performance this further confirms that the ML approach is leading to an accurate and robust sample classification output.
![]() | ||
Fig. 3 (a) Average variable importance of projection scores across training set of outer folds. VIP scores greater than 1 indicate important latent variables used in making model predictions. Panels (b) to (d) shows the spectral loadings of the three variables considered to be the most important, (b) = LV 15, (c) = LV 13, (d) = LV 14. Panels (e–g) reproduce the difference spectra from Fig. 1(g–i) for comparison. Coloured arrows highlight points of interest as discussed in the text. |
The fact that our 2D-IR-ML method is able to differentiate spectra obtained from different patient groups based on the serum protein profile is equally encouraging. The method requires only small volumes of blood serum, with measurement times on the order of minutes, while data collection requires no prior sample manipulation to account for the presence of water, all of which suggest that 2D-IR-ML methods have the potential for further development towards applications in biomedical diagnostics and more generally for solution-phase protein analysis.
Considering the results of this study in the more specific context of risk stratification for the treatment of melanoma. A promising approach to detecting melanoma residual disease exploits detection of circulating tumour DNA (ctDNA).54,68,69 Presence of ctDNA as a biomarker has been shown to correlate with relapse risk68 and clinical trials are ongoing, though quantities of ctDNA in cases of non-metastatic melanoma are small and so hard to detect.54 Our results show that variations in the protein profile of the patient's blood serum may offer another, parallel, route to identifying disease states and predicting relapse risk. This correlates with observations relating to other cancers using IR absorption spectroscopy,41–49 but the addition of superior spectral resolution means that 2D-IR may offer a useful complementary technology to these tools.
One advantage of the 2D-IR ML approach is the detailed insight that is contained within the regions of the amide I band that were identified with sample classifications. Fig. 3 shows that changes to both the α-helix and β-sheet region were highlighted by the model as being of importance, suggesting that the changes could encompass a range of proteins. It is also noteworthy that some consistency was achieved between ML output and the difference spectra obtained from the average signal from each sample class. This suggests that 2D-IR results may ultimately be able to point towards molecular markers for disease based on changes in the broad protein profile of serum samples. As discussed above, these changes may include variations in the relative concentrations of some of the major proteins, but there are also indications from prior 2D-IR studies of serum that changes in structure, dynamics or ligand binding can all influence the amide I profile.15 Equally, the contribution of post-translational modifications to the amide I band are as yet unexplored. In making a link between serum spectroscopy and disease, one has to be aware of potential confounding factors given that serum reports on many bodily processes,70 but these results offer a firm basis for follow-up studies. Further experiments using protein libraries to understand some of the potential spectroscopic contributions would be instructive. Similarly, combining 2D-IR results with powerful supporting technologies like proteomics analyses would be of particular value in identifying specific molecular changes that are leading to the 2D-IR-based classifications. Such a multi-platform approach could add vital new information relating to understanding the molecular nature of disease progression.71
One clear result of the study is that there is overlap of spectral features between the individual sample groups and that this has proved a challenge for the ML model. This overlap in spectral features was expected given the broad molecular similarity of patient serum samples, the presumed gradual progression of disease states and the shared biochemical markers likely to be present between the groups. For instance, the differences between control and developed metastasis samples may stem from subtle changes in protein profile that are indicative of early-stage disease. However, since the developed metastasis samples come from different patients at varying stages of disease progression, these subtle differences may not be consistently evident across all samples, making it harder to differentiate them from the control group. This in essence is the challenge that the 2D-IR-ML approach sought to overcome, so indications that it may be possible are encouraging. The fact that the ML model uses the full 2D-plot also shows that the information density inherent in the 2D-R method will be valuable in doing so.72,73 Similarly, patients in the later stages of developed metastasis may exhibit spectral profiles that resemble those of metastatic disease, further blurring the distinction between these two groups and complicating classification. As has been shown to be the case with ML-based approaches, such problems would benefit considerably from larger studies involving many more samples and serial samples over time.74,75 Additionally, the provision of true controls from healthy individuals would provide useful insights. In this respect, the clear differentiation between the three patient groups, all of which have had treatment from melanoma that would be expected to reduce the variation between them is another positive indicator for the potential of combined 2D-IR-ML strategies.
Despite the nuanced spectral differences observed, manual classification was not tractable due to overlapping spectral features and subtle variations across patient groups. However, advanced ML strategies, particularly the PLS-SVM model, proved capable of good classification performance, achieving AUROC values of 0.75, 0.80, and 0.86 for the control, developed metastasis and metastatic groups, respectively and demonstrating robust discriminative power. Balanced sensitivity and specificity further reinforced the model's reliability in identifying disease states.
These findings highlight the potential of 2D-IR spectroscopy combined with ML to contribute to cancer diagnostics. While the inherent overlap in spectral features imposes some limitations on classification accuracy, the demonstrated ability to differentiate between patient groups at an accepted level underscores the feasibility of this approach for clinical applications. Future work should focus on refining ML strategies, particularly through the expansion of datasets, including the addition of non-symptomatic healthy individuals. These could potentially be enhanced by inclusion of data collected using different polarisation geometries, which could enhance off-diagonal regions of the spectrum, though careful consideration of how to combine the datasets would be required.76 Additionally, exploring complementary spectroscopic techniques could enhance classification performance and provide deeper insights into the biological features of disease progression leveraged for classification. Ultimately, this study lays a foundation for the exploration of 2D-IR-ML approaches, offering a promising tool for harnessing the information content of 2D datasets.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5sc01526j |
This journal is © The Royal Society of Chemistry 2025 |