Open Access Article
Megha Mehta
,
Hao Meng,
Suzy Eldershaw and
Nick Stone
*
Department of Physics and Astronomy, University of Exeter, Exeter EX4 4QL, UK. E-mail: N.Stone@exeter.ac.uk
First published on 23rd June 2026
Understanding the biochemical diversity among breast cancer subtypes is essential for developing more precise diagnostic and treatment strategies. In this study, we use optical photothermal infrared (O-PTIR) micro-spectroscopy to characterise three human breast cancer cell lines—MCF7, MDA-MB-231, and SKBR3—without the need for dyes or labels. Unlike conventional FTIR, which is limited by a spatial resolution of ∼5 µm due to the IR wavelength dependent diffraction limits, O-PTIR achieves sub-micron resolution, allowing biochemical analysis at near-single organelle levels. By analysing the spectral data, we identified variations in key biomolecules such as nucleic acids, proteins, and lipids that correlate with the distinct biological characteristics of each cell type. Our approach also included peak deconvolution and ratiometric analysis, which provided further insight into the biochemical variations. Principal component analysis (PCA) was used to assess these differences and when coupled with and linear discriminant analysis (LDA) showed clear separation between the cell lines. These results highlight the potential of O-PTIR as a powerful tool for label-free cancer cell characterisation, overcoming FTIR limitations by using a pulsed mid-IR laser for absorption and a visible probe laser for detection, enabling reflection-mode detection that bypasses water interference and achieves submicron resolution.
Conventional diagnostic methods, including immunohistochemistry (IHC), fluorescence in situ hybridisation (FISH), and gene expression profiling, are routinely employed in clinical settings to classify breast cancer subtypes. While these techniques have proven valuable, they often involve elaborate sample preparation, are dependent on staining protocols that may perturb the native biochemical state of cells, and are susceptible to variability in interpretation.5,7 These limitations underscore the need for more reliable, label-free analytical techniques that can capture the intrinsic molecular signatures of cancer cells with minimal intervention.
Vibrational spectroscopic techniques such as Fourier-transform infrared (FTIR) imaging and Raman spectroscopy have shown considerable promise in biomedical applications by enabling the detection of biomolecular components including proteins, lipids, and nucleic acids.8,9,18 However, conventional FTIR is constrained by low spatial resolution, due to being restricted by the diffraction limit (∼5–12 µm), making it difficult to resolve subcellular features; whereas Raman spectroscopy, using light of much shorter wavelengths (532–830 nm), can resolve these well, but the approach is usually much slower and can suffer from interference caused by sample autofluorescence.10,11
Optical Photothermal Infrared (O-PTIR) spectroscopy presents a novel alternative that addresses these limitations. This technique combines mid-infrared absorption, from a tuneable quantum cascade laser source, with a visible/NIR probe beam detection to achieve submicron spatial resolution, facilitating label-free spectral imaging of biological samples.12 Moreover, O-PTIR enables the simultaneous acquisition of infrared and Raman spectra at the same spatial location, offering complementary and detailed molecular information.13 Previous studies have demonstrated its utility in diverse biomedical contexts, including bacterial classification,13,14 lipid droplet characterisation,15,16 and tissue pathology.17,18 To date, the application of O-PTIR to breast cancer cell subtyping has yet to be explored.
In the present study, we employed O-PTIR micro-spectroscopy to analyse and compare three breast cancer cell lines representing distinct subtypes: MCF7 (luminal A), MDA-MB-231 (triple-negative), and SKBR3 (HER2-enriched). Through a combination of univariate spectral evaluation, Gaussian peak deconvolution, and multivariate statistical analysis including principal component analysis (PCA) and principal components-linear discriminant analysis (PCA-LDA), we investigated the biochemical heterogeneity of these cell lines. Our findings demonstrate that O-PTIR can effectively discriminate between breast cancer subtypes based on their vibrational fingerprints, highlighting its potential as a high-resolution, label-free tool for cellular phenotyping and diagnostic applications.
Cells were maintained at 37 °C in a humidified atmosphere with 5% CO2 (v/v) for MCF7 and SKBR3 cells or in a non-CO2 atmosphere19–21 for MDA-MB-231 cells.
Prior to spectroscopic measurements, harvested cells were seeded at a density of approximately 25
000 cells per cm2 onto poly-L-lysine-coated, 0.5 mm thick, 25 mm diameter calcium fluoride (CaF2) windows placed in six-well plates. The cells were allowed to attach overnight, fixed with 10% (w/v) formalin for 10 minutes, and then washed with PBS and ultrapure water.22
Measurements were taken with a spectral resolution of approximately 8 cm−1, using 43% IR pump power and 51% probe power. Data collection was based on four averaged scans per point with a spatial step size of 2 µm, producing hyperspectral maps covering single-cell areas of approximately 16–30 µm per cell, depending on cell line. Depending on the mapped region, this corresponded to approximately 112–150 spectra per cell, with a total acquisition time of approximately 30–40 minutes per cell map, due to hyperspectral acquisition and stage positioning time overhead. IR excitation beam has an approximate diffraction-limited spot diameter of ∼10 µm at the sample and co-aligned visible probe beam with a spot size of approximately ∼500 nm, enabling sub-micron spatial resolution.
For each of the three cell lines (MCF-7, MDA-MB-231, and SKBR3), triplicate slides were prepared and measured over three consecutive experimental days, sampling several cells per slide each day. In total, 25 cells per cell line were analysed (75 cells overall). Before data acquisition, background spectra were recorded using a Kevley low-emissivity (low-E) substrate for calibration. Instrument control and background subtraction were managed using PTIR Studio software.
Prior to statistical analysis, each individual pixel spectrum was pre-processed to remove background contributions. Baseline correction was performed using polynomial background subtraction, followed by min–max normalisation to account for variations in spectral intensity and sampling thickness. PCA and clustering analyses were performed on these baseline-corrected and normalised pixel spectra. For ratiometric analysis, the processed spectra within each cell map were averaged to obtain a representative per-cell mean spectrum, from which band intensity ratios were calculated.
To rigorously evaluate the PCA-LDA classification model's performance, leave-one-cell-out cross-validation was employed to account for intra-cell spectral correlation and to avoid potential data leakage that may arise from random pixel-wise training/test splitting of highly correlated spectra. In this procedure, all spectral data from one individual cell were excluded from the training dataset in each iteration and used exclusively for testing. This process was repeated such that each cell served once as the test set. This strategy accounts for within-cell spectral correlation and prevents overfitting, providing an unbiased estimate of the model's ability to generalise to unseen cells.
Each cell was analysed using hyperspectral O-PTIR mapping with a spatial step size of 2 μm, generating multiple pixel spectra per cell. A representative spectrum for each cell was obtained by averaging the processed pixel spectra within the cell map. The spectra shown in Fig. 1 and 2 represent subtype-averaged spectra obtained by averaging the per-cell mean spectra across all analysed cells of each breast cancer subtype.
MCF7 cells presented pronounced nucleic-acid-associated bands at ∼1088 cm−1 and ∼1244 cm−1, with a shoulder near ∼1068 cm−1, consistent with ordered nucleic acid structures.24 All three cell lines exhibited broadly similar spectral profiles in the nucleic acid region, with only modest differences in relative intensity, most notably at ∼1088 cm−1, which appeared slightly higher in MCF7 spectra. While additional bands exhibited similar trends, the discussion was restricted to the most robust and biochemically interpretable features to avoid over-interpretation of minor intensity variations.25 Peak assignments are summarised in Table 1.
| Peak (cm−1) | Assignment |
|---|---|
| 965 | C–C/protein backbone |
| 1088 | PO2− sym. stretch (DNA/RNA) |
| 1170 | C–O/carbohydrates |
| 1244 | Amide III/PO2− asym. (DNA) |
| 1398 | CH3 bending (lipids) |
| 1456 | CH2 bending (lipids/proteins) |
| 1544 | Amide II (proteins) |
| 1656 | Amide I (α-helix) |
| 1744 | C O stretch (lipids/esters) |
In the protein–lipid region, MDA-MB-231 spectra displayed relatively stronger intensities at ∼1398, ∼1456, and ∼1544 cm−1, indicating increased protein and lipid-associated contributions consistent with its aggressive phenotype. SKBR3 spectra showed comparatively stronger intensity at ∼1744 cm−1, corresponding to the ester carbonyl stretching vibration, which is often associated with lipid droplet accumulation and membrane-associated lipid content.13
The Amide II band around ∼1544 cm−1 was relatively more intense in MDA-MB-231 spectra, suggesting increased protein-associated contributions. Meanwhile, the Amide III band near ∼1244 cm−1 showed subtle variation in relative intensity between cell lines. These spectral differences are consistent with variations in protein-associated biochemical composition across the three breast cancer cell lines.
The resulting fits (Fig. 2) demonstrate comparable band structures among the three cell lines, with subtype-dependent differences primarily reflected in relative peak amplitudes within nucleic acid (∼1088, 1244 cm−1), lipid (∼1456 cm−1), and protein (∼1544, 1656 cm−1) regions. This provides an internally consistent basis for comparative biochemical interpretation.
Quantitative parameters derived from the Gaussian fitting—including integrated peak area, FWHM, and peak height—were used to characterise subtype-dependent spectral variation.29 The distribution of fitted peak areas and corresponding FWHM values across individual cells is presented in SI Fig. S1 and S2, providing a quantitative visualisation of intra- and inter-cell line variability.
MCF7 spectra exhibited relatively narrow FWHM values in the amide region, indicating comparatively uniform band profiles across the analysed cells. In contrast, MDA-MB-231 spectra showed broader peak widths and larger variations in integrated peak areas across the protein (∼1544 and ∼1656 cm−1) and lipid (∼1456 cm−1) bands, reflecting increased heterogeneity in biochemical composition. SKBR3 spectra displayed intermediate peak widths with relatively enhanced lipid-associated contributions (∼1456 cm−1). Notably, while overall band positions remained conserved across subtypes, relative amplitude variations were observed within nucleic acid- and protein-associated regions. MCF7 displayed comparatively higher intensity within the symmetric phosphate (∼1088 cm−1) region, whereas lipid-associated contributions (∼1456 cm−1) were comparatively enhanced in MDA-MB-231 and SKBR3. These trends are consistent with the ratiometric analysis presented in Section 3.1.4 and are also evident in the constrained Gaussian fits shown in Fig. 2.
To further differentiate the biochemical profiles among MCF7, MDA-MB-231, and SKBR3 cell lines, spectral band intensity ratios were calculated from representative spectra of individual cells. For each cell map, pixel spectra were first screened to remove background contributions and then min–max normalised. A representative spectrum for each cell was obtained by averaging the processed pixel spectra, yielding one spectrum per cell (n = 25 per subtype). Band intensity ratios were subsequently calculated from these per-cell mean spectra to enable statistical comparison between breast cancer subtypes.30–32
Two diagnostic ratios were calculated using characteristic vibrational bands within the biochemical fingerprint region. The nucleic acid to protein ratio was calculated the symmetric phosphate band at ∼1088 cm−1 as a nucleic acid marker, normalised to the combined protein-associated Amide I (∼1656 cm−1) and Amide II (∼1544 cm−1) bands. The ∼1244 cm−1 band, which contains contributions from both Amide III and asymmetric PO2− vibrations, was not used directly in the ratio calculation but was considered qualitatively to support observed trends.
Consistent with this interpretation, the relative intensity of the ∼1244 cm−1 band was lowest in MCF7, highest in SKBR3, and intermediate in MDA-MB-231, supporting the observed nucleic acid-related trends across the three cell lines. The lipid-to-protein ratio was calculated using the ester carbonyl stretch at ∼1744 cm−1, normalised to the same protein bands. Boxplots were generated to visualise these ratios, with medians represented by the central line and whiskers spanning 1.5× the interquartile range (IQR).
To determine statistical significance, one-way ANOVA was applied to each ratio type, followed by unpaired post-hoc t-tests between cell line pairs. Significance thresholds were denoted with a single asterisk (*) for p < 0.05. The results revealed that for the nucleic acid/protein ratio, MCF-7 significantly differed from MDA-MB-231 (p = 0.0013), but neither MCF-7 vs. SKBR3 nor MDA-MB-231 vs. SKBR3 exhibited significant differences. In contrast, the lipid/protein ratio showed greater discriminatory power: MCF-7 differed significantly from both MDA-MB-231 (p = 0.0002) and SKBR3 (p < 0.0001).
These findings demonstrate that nucleic acid-associated variations are present between certain subtypes, lipid-associated features provide a more robust marker for distinguishing MCF7 from more aggressive phenotypes (Fig. 3). The full dataset of calculated ratios is provided as SI. These quantitative measures validate the utility of O-PTIR in biomolecular profiling and offer potential for use in non-invasive diagnostic models.
Spatial inspection of the segmentation maps demonstrated that the orange cluster, consistently located at the cell centre, most plausibly corresponds to nuclear regions rich in nucleic acids.24 The associated centroid spectrum exhibited strong bands at 1088 cm−1 and 1244 cm−1, arising from PO2− symmetric and asymmetric stretching vibrations characteristic of DNA and RNA phosphate backbones.
In contrast, clusters positioned adjacent to the nucleus displayed weaker or broadened nucleic acid features, suggesting cytoplasmic regions or zones exhibiting partial DNA degradation or chromatin relaxation.32 These areas may represent cells in transitional or apoptotic states, consistent with spectral signatures reported previously for chromatin decondensation and protein structural alterations.28,33 A shifted Amide II peak and reduced nucleic acid-to-protein ratio in these regions further support biochemical changes associated with stress or programmed cell death.
The green and purple clusters, previously interpreted as nuclear domains, instead localised predominantly to peripheral or punctate regions and displayed pronounced 1744 cm−1 C
O ester peaks, indicative of lipid-enriched domains, such as membrane boundaries or lipid droplets. Elevated CH2 deformation around 1450 cm−1 reinforced their lipid-dominant character, distinguishing them from the protein- and nucleic acid–rich clusters near the nucleus.
Within the protein region (1400–1650 cm−1), variations in the Amide I band between 1655 and 1659 cm−1 revealed differences in protein secondary structure, potentially marking transitions between α-helical and β-sheet conformations. These variations likely reflect differences in protein secondary structure composition and local biochemical environment across cellular regions. Collectively, this refined spatial–spectral analysis redefines the biochemical cluster assignments: the orange cluster represents nucleic-acid-rich nuclei, light green clusters correspond to protein-dense cytoplasmic regions, and green/purple clusters identify lipid-dominant subcellular domains. These results demonstrate the sensitivity of O-PTIR hyperspectral imaging in resolving subcellular biochemical heterogeneity and capturing molecular variations associated with different breast cancer phenotypes.
Together, this cluster-level analysis uncovers spatially distinct biochemical signatures17 within and across cell populations, demonstrating the resolution power of O-PTIR imaging to capture intra-sample molecular variation.
Fig. 5a demonstrates the discrimination between the three cell types in the PC3–PC4 score space, while Fig. 5b and c provide the corresponding loading plots, used for tentative biochemical interpretation.
The 2D PCA score plot reveals partial clustering among the three subtypes where MCF-7 (green) is distinguishable from both MDA-MB-231 (blue) and SKBR-3 (red) (Fig. 5a). While complete segregation is not observed, the majority of variance was captured by PC1 (56%) and PC2 (18%), accounting for 74% cumulatively, statistical analysis revealed that PC3 (10%) and PC4 (5%) contained the most biologically discriminative information between cell lines, highlighting that lower-variance components can encode critical subtype-specific biochemical differences.
The PCA loading spectra identify the wavenumbers contributing most strongly to variance captured by PC3 and PC4 (Fig. 5b and c). PC3 exhibits prominent loading features in the phosphate region (∼1088 cm−1), protein-associated bands (∼1544 and ∼1656 cm−1), and the lipid ester region (∼1744 cm−1). The sign of the loading indicates that spectra with positive PC3 scores are relatively enriched in features corresponding to positive loading peaks, whereas spectra with negative PC3 scores show relatively greater contribution from bands with negative loadings.
Similarly, PC4 displays notable loading contributions in the amide region (∼1544 and ∼1656 cm−1) and the lipid ester band (∼1744 cm−1), reflecting variability in protein- and lipid-associated spectral features across the dataset. In the PC3–PC4 score space (Fig. 5a), the three cell types exhibit partial separation with observable overlap, indicating subtype-dependent biochemical trends rather than complete segregation.
Spectra of MDA-MB-231 cells tend to exhibit positive PC3 scores, indicating higher relative contributions from protein-associated bands, as reflected in the loading plot. In contrast, MCF7 cells tend to show negative PC4 scores, suggesting a comparatively higher contribution from nucleic acid-associated features. SKBR3 spectra display positive PC4 scores, consistent with increased lipid-associated contributions. These trends highlight subtype-dependent biochemical differences captured by PCA.
SKBR3 spectra tend to distribute toward regions associated with stronger lipid contributions, whereas MDA-MB-231 spectra display broader dispersion along PC4, suggesting greater variability in protein-associated features. MCF7 spectra form a comparatively compact cluster, consistent with more uniform spectral profiles.
Collectively, PC3 and PC4 capture relative contrasts among nucleic acid-, protein-, and lipid-associated bands, complementing the dominant variance described by PC1 and PC2. The integration of PCA with ratiometric analysis provides a consistent multivariate framework for characterising subtype-dependent biochemical variation.
Classification performance was further quantified using a confusion matrix derived from leave-one-cell-out cross-validation (Fig. 6b). Here, spectra from one entire cell were held out as a test set while the model was trained on the remaining cells. Only the statistically significant PCs (p < 0.001) were included in each training model. This approach reduces overfitting risks associated with intra-cell spectral similarity and provides a realistic measure of model generalisability. The model achieved an overall accuracy of 95.4%, where overall accuracy was defined as the ratio of correctly classified spectra to the total number of spectra across all classes. Class sensitivities were 95.5% for MCF7, 93.0% for MDA-MB-231, and 98.4% for SKBR3. These results demonstrate the robustness of O-PTIR combined with PCA-LDA for identification of breast cancer cell phenotypes. The elevated accuracy for SKBR3 may be attributed to its distinctive HER2-driven biochemical phenotype, while the slightly lower value for MDA-MB-231 reflects its greater biochemical heterogeneity. These outcomes collectively underscore the strength of combining vibrational spectroscopic data with multivariate modelling for subtype identification in breast cancer diagnostics.
The present findings are consistent with previous vibrational spectroscopic investigations of breast cancer cell lines using FTIR and Raman methodologies. Gaussian deconvolution of IR absorption spectra has been widely applied to resolve overlapping amide and phosphate bands in biological systems, typically employing between four and ten component peaks within the Amide I and II regions to characterise protein secondary structure and nucleic acid contributions.24,25,28 Constrained multicomponent fitting approaches similar to that adopted here have been described in protein structural and cancer-related FTIR analyses, supporting the use of fixed peak centres with variable amplitude and bandwidth to avoid over-parameterisation.24,27
Ratiometric analysis of nucleic acid-to-protein and lipid-to-protein contributions has also been reported in breast cancer spectroscopy studies. Raman and FTIR investigations have demonstrated increased lipid-associated features in more aggressive phenotypes such as MDA-MB-231, alongside subtype-dependent variations in nucleic acid and protein signatures.29–31 These observations are consistent with the lipid enrichment trends observed here for SKBR3 and the intermediate behaviour of MDA-MB-231 in the ∼1244 cm−1 region.
In terms of multivariate classification performance, FTIR and Raman studies employing PCA-LDA and related chemometric methods have reported sensitivity values typically ranging between 85% and 95% for discrimination of breast cancer cell lines or tissue subtypes.9,10,32,34 The sensitivity values achieved in the present study (95.5% for MCF7, 93.0% for MDA-MB-231, and 98.4% for SKBR3) fall within or slightly above this reported range. The improved spatial resolution of O-PTIR relative to diffraction-limited FTIR may contribute to enhanced sensitivity by enabling subcellular biochemical heterogeneity to be captured more effectively.
Collectively, these comparisons indicate that the biochemical trends and classification performance observed here are consistent with established vibrational spectroscopy literature while demonstrating the applicability of O-PTIR within a sub-micron spatial resolution framework.
Peak intensity ratio analysis revealed subtype-dependent trends, with lipid-associated contributions more prominent in SKBR3 and comparatively stronger nucleic acid-associated features in MCF7, while MDA-MB-231 displayed intermediate behaviour and broader spectral variability. Principal Component Analysis demonstrated partial clustering with observable overlap, indicating biochemical trends rather than complete segregation. Supervised classification using PCA-LDA, validated through leave-one-cell-out cross-validation, achieved high class-wise sensitivity values (95.5% for MCF7, 93.0% for MDA-MB-231, and 98.4% for SKBR3), consistent with previously reported vibrational spectroscopy studies.
While the present results demonstrate the potential of O-PTIR for high-resolution, label-free biochemical profiling of cancer cell subtypes, this study represents a proof-of-concept investigation limited to malignant cell lines. Inclusion of non-malignant breast epithelial cells and larger datasets would be required to fully evaluate diagnostic applicability. Furthermore, although cross-validation was employed to minimise overfitting, independent external validation with new replicate cell populations would strengthen translational interpretation.
Overall, these findings support the utility of O-PTIR as a sub-micron vibrational spectroscopy platform capable of capturing intracellular biochemical heterogeneity and contributing to multivariate discrimination of breast cancer subtypes within a research framework.
See DOI: https://doi.org/10.1039/d6an00125d.
| This journal is © The Royal Society of Chemistry 2026 |