Optical photothermal infrared (O-PTIR) micro-spectroscopic characterisation of breast cancer cell lines: a comparative molecular profile of MCF7, MDA-MB-231, and SKBR3 cells

Megha Mehta; Hao Meng; Suzy Eldershaw; Nick Stone

doi:10.1039/D6AN00125D

View PDF Version

Open Access Article

This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

DOI: 10.1039/D6AN00125D (Paper) Analyst, 2026, Advance Article

Optical photothermal infrared (O-PTIR) micro-spectroscopic characterisation of breast cancer cell lines: a comparative molecular profile of MCF7, MDA-MB-231, and SKBR3 cells

Megha Mehta, Hao Meng, Suzy Eldershaw and Nick Stone*
Department of Physics and Astronomy, University of Exeter, Exeter EX4 4QL, UK. E-mail: N.Stone@exeter.ac.uk

Received 2nd February 2026 , Accepted 19th June 2026

First published on 23rd June 2026

Abstract

Understanding the biochemical diversity among breast cancer subtypes is essential for developing more precise diagnostic and treatment strategies. In this study, we use optical photothermal infrared (O-PTIR) micro-spectroscopy to characterise three human breast cancer cell lines—MCF7, MDA-MB-231, and SKBR3—without the need for dyes or labels. Unlike conventional FTIR, which is limited by a spatial resolution of ∼5 µm due to the IR wavelength dependent diffraction limits, O-PTIR achieves sub-micron resolution, allowing biochemical analysis at near-single organelle levels. By analysing the spectral data, we identified variations in key biomolecules such as nucleic acids, proteins, and lipids that correlate with the distinct biological characteristics of each cell type. Our approach also included peak deconvolution and ratiometric analysis, which provided further insight into the biochemical variations. Principal component analysis (PCA) was used to assess these differences and when coupled with and linear discriminant analysis (LDA) showed clear separation between the cell lines. These results highlight the potential of O-PTIR as a powerful tool for label-free cancer cell characterisation, overcoming FTIR limitations by using a pulsed mid-IR laser for absorption and a visible probe laser for detection, enabling reflection-mode detection that bypasses water interference and achieves submicron resolution.

1. Introduction

Breast cancer is a major global health concern and one of the leading causes of cancer-related deaths among women.¹ It encompasses a heterogeneous group of diseases, broadly classified into molecular subtypes such as luminal A, luminal B, HER2-enriched, and triple-negative breast cancer (TNBC), each exhibiting distinct genetic, biochemical, and clinical characteristics.^2–4 Accurate identification of these subtypes is critical for prognosis and the implementation of effective, targeted therapies.^5,6

Conventional diagnostic methods, including immunohistochemistry (IHC), fluorescence in situ hybridisation (FISH), and gene expression profiling, are routinely employed in clinical settings to classify breast cancer subtypes. While these techniques have proven valuable, they often involve elaborate sample preparation, are dependent on staining protocols that may perturb the native biochemical state of cells, and are susceptible to variability in interpretation.^5,7 These limitations underscore the need for more reliable, label-free analytical techniques that can capture the intrinsic molecular signatures of cancer cells with minimal intervention.

Vibrational spectroscopic techniques such as Fourier-transform infrared (FTIR) imaging and Raman spectroscopy have shown considerable promise in biomedical applications by enabling the detection of biomolecular components including proteins, lipids, and nucleic acids.^8,9,18 However, conventional FTIR is constrained by low spatial resolution, due to being restricted by the diffraction limit (∼5–12 µm), making it difficult to resolve subcellular features; whereas Raman spectroscopy, using light of much shorter wavelengths (532–830 nm), can resolve these well, but the approach is usually much slower and can suffer from interference caused by sample autofluorescence.^10,11

Optical Photothermal Infrared (O-PTIR) spectroscopy presents a novel alternative that addresses these limitations. This technique combines mid-infrared absorption, from a tuneable quantum cascade laser source, with a visible/NIR probe beam detection to achieve submicron spatial resolution, facilitating label-free spectral imaging of biological samples.¹² Moreover, O-PTIR enables the simultaneous acquisition of infrared and Raman spectra at the same spatial location, offering complementary and detailed molecular information.¹³ Previous studies have demonstrated its utility in diverse biomedical contexts, including bacterial classification,^13,14 lipid droplet characterisation,^15,16 and tissue pathology.^17,18 To date, the application of O-PTIR to breast cancer cell subtyping has yet to be explored.

In the present study, we employed O-PTIR micro-spectroscopy to analyse and compare three breast cancer cell lines representing distinct subtypes: MCF7 (luminal A), MDA-MB-231 (triple-negative), and SKBR3 (HER2-enriched). Through a combination of univariate spectral evaluation, Gaussian peak deconvolution, and multivariate statistical analysis including principal component analysis (PCA) and principal components-linear discriminant analysis (PCA-LDA), we investigated the biochemical heterogeneity of these cell lines. Our findings demonstrate that O-PTIR can effectively discriminate between breast cancer subtypes based on their vibrational fingerprints, highlighting its potential as a high-resolution, label-free tool for cellular phenotyping and diagnostic applications.

2. Materials and methods

2.1. Cell culture and preparation

Three different breast cancer cell lines, MCF7, SKBR3 and MDA-MB-231 cells, were studied. MCF7 cells were cultured in Dulbecco's modified Eagle's medium (DMEM, PAN Biotech), MDA-MB-231 in Leibowitz L15 medium (Gibco) and SKBR3 cells in McCoy's 5A (modified) medium (Gibco). All media was supplemented with 10% heat-inactivated foetal bovine serum (FBS, Gibco).

Cells were maintained at 37 °C in a humidified atmosphere with 5% CO₂ (v/v) for MCF7 and SKBR3 cells or in a non-CO₂ atmosphere^19–21 for MDA-MB-231 cells.

Prior to spectroscopic measurements, harvested cells were seeded at a density of approximately 25 [thin space (1/6-em)] 000 cells per cm² onto poly-L-lysine-coated, 0.5 mm thick, 25 mm diameter calcium fluoride (CaF₂) windows placed in six-well plates. The cells were allowed to attach overnight, fixed with 10% (w/v) formalin for 10 minutes, and then washed with PBS and ultrapure water.²²

2.2. O-PTIR measurements and spectral analysis

O-PTIR spectra were acquired using a mIRage infrared microscope (Photothermal Spectroscopy Corp.), equipped with a 40× Cassegrain reflective objective (NA = 0.78) and a motorised precision stage. This system employs a pump–probe configuration, utilising a tuneable pulsed mid-infrared quantum cascade laser (QCL) as the pump and a continuous-wave 785 nm laser as the probe. The QCL covers four discrete spectral regions between 900 and 1800 cm⁻¹, operating at repetition rates of up to 100 kHz with a duty cycle of 5%, a pulse width of 490 ns, and a gain of 5×.

Measurements were taken with a spectral resolution of approximately 8 cm⁻¹, using 43% IR pump power and 51% probe power. Data collection was based on four averaged scans per point with a spatial step size of 2 µm, producing hyperspectral maps covering single-cell areas of approximately 16–30 µm per cell, depending on cell line. Depending on the mapped region, this corresponded to approximately 112–150 spectra per cell, with a total acquisition time of approximately 30–40 minutes per cell map, due to hyperspectral acquisition and stage positioning time overhead. IR excitation beam has an approximate diffraction-limited spot diameter of ∼10 µm at the sample and co-aligned visible probe beam with a spot size of approximately ∼500 nm, enabling sub-micron spatial resolution.

For each of the three cell lines (MCF-7, MDA-MB-231, and SKBR3), triplicate slides were prepared and measured over three consecutive experimental days, sampling several cells per slide each day. In total, 25 cells per cell line were analysed (75 cells overall). Before data acquisition, background spectra were recorded using a Kevley low-emissivity (low-E) substrate for calibration. Instrument control and background subtraction were managed using PTIR Studio software.

2.3. Statistical analysis and software

Origin Pro 2025 and Matlab (Mathworks, USA) R2023A version were used to calculate the average spectra, Principal Components Analysis (PCA), K-means cluster average spectra and maps, and Principal Components Analysis fed Linear Discriminant Analysis. Gaussian peak fitting was performed using OriginPro 2025 (OriginLab Corporation, USA) employing a constrained nonlinear least-squares fitting routine, with peak positions fixed based on second-derivative analysis and only peak amplitudes and full-width at half-maximum (FWHM) allowed to vary.

Prior to statistical analysis, each individual pixel spectrum was pre-processed to remove background contributions. Baseline correction was performed using polynomial background subtraction, followed by min–max normalisation to account for variations in spectral intensity and sampling thickness. PCA and clustering analyses were performed on these baseline-corrected and normalised pixel spectra. For ratiometric analysis, the processed spectra within each cell map were averaged to obtain a representative per-cell mean spectrum, from which band intensity ratios were calculated.

To rigorously evaluate the PCA-LDA classification model's performance, leave-one-cell-out cross-validation was employed to account for intra-cell spectral correlation and to avoid potential data leakage that may arise from random pixel-wise training/test splitting of highly correlated spectra. In this procedure, all spectral data from one individual cell were excluded from the training dataset in each iteration and used exclusively for testing. This process was repeated such that each cell served once as the test set. This strategy accounts for within-cell spectral correlation and prevents overfitting, providing an unbiased estimate of the model's ability to generalise to unseen cells.

3. Results and discussion

3.1. Univariate analysis

3.1.1. Spectral profile overview. The OPTIR spectral range between 900 and 1800 cm⁻¹ was analysed across MCF7, MDA-MB-231, and SKBR3 breast cancer cell lines. This region includes characteristic vibrational bands of nucleic acids, proteins, lipids, and carbohydrates^23,24 providing a molecular fingerprint unique to each cell line (Fig. 1a and b).


	Fig. 1 Representative O-PTIR analysis of breast cancer cell lines. (a) Optical images of individual example cells: MCF7, MDA-MB-231, and SKBR3. Scale bar: 10 µm (field of view ∼16–30µm). (b) Normalised O-PTIR spectra of the three cell lines across 900–1800 cm⁻¹ showing vibrational band assignments highlighting nucleic acid, protein, and lipid-dominated regions among the subtypes.

Each cell was analysed using hyperspectral O-PTIR mapping with a spatial step size of 2 μm, generating multiple pixel spectra per cell. A representative spectrum for each cell was obtained by averaging the processed pixel spectra within the cell map. The spectra shown in Fig. 1 and 2 represent subtype-averaged spectra obtained by averaging the per-cell mean spectra across all analysed cells of each breast cancer subtype.


	Fig. 2 Constrained multi-peak Gaussian deconvolution of averaged O-PTIR spectra (900–1800 cm⁻¹) for (a) MCF7, (b) MDA-MB-231, and (c) SKBR3 breast cancer cell lines. The experimental spectra are shown in blue, while coloured component curves represent the individual constrained Gaussian bands contributing to the overall fit. Peak centres were constrained at 965, 1088, 1170, 1244, 1398, 1456, 1544, 1656, and 1744 cm⁻¹ based on second-derivative analysis and established vibrational assignments, while peak amplitudes and full-width at half-maximum (FWHM) values were allowed to vary. Dotted vertical lines indicate the constrained peak positions used during fitting.

MCF7 cells presented pronounced nucleic-acid-associated bands at ∼1088 cm⁻¹ and ∼1244 cm⁻¹, with a shoulder near ∼1068 cm⁻¹, consistent with ordered nucleic acid structures.²⁴ All three cell lines exhibited broadly similar spectral profiles in the nucleic acid region, with only modest differences in relative intensity, most notably at ∼1088 cm⁻¹, which appeared slightly higher in MCF7 spectra. While additional bands exhibited similar trends, the discussion was restricted to the most robust and biochemically interpretable features to avoid over-interpretation of minor intensity variations.²⁵ Peak assignments are summarised in Table 1.

Table 1 Assignment of the main spectral features in the O-PTIR spectra of different cell lines

Peak (cm⁻¹)	Assignment
965	C–C/protein backbone
1088	PO₂⁻ sym. stretch (DNA/RNA)
1170	C–O/carbohydrates
1244	Amide III/PO₂⁻ asym. (DNA)
1398	CH₃ bending (lipids)
1456	CH₂ bending (lipids/proteins)
1544	Amide II (proteins)
1656	Amide I (α-helix)
1744	CO stretch (lipids/esters)

In the protein–lipid region, MDA-MB-231 spectra displayed relatively stronger intensities at ∼1398, ∼1456, and ∼1544 cm⁻¹, indicating increased protein and lipid-associated contributions consistent with its aggressive phenotype. SKBR3 spectra showed comparatively stronger intensity at ∼1744 cm⁻¹, corresponding to the ester carbonyl stretching vibration, which is often associated with lipid droplet accumulation and membrane-associated lipid content.¹³

3.1.2. Amide band structural insights. Detailed analysis of the Amide I, II, and III bands provides insight into protein structural variations across the cell lines (Fig. 2). Minor variations in the Amide I region were observed primarily in terms of relative band intensity and spectral shape, within the limits of the instrumental spectral resolution (∼8 cm⁻¹). No definitive peak position shifts are inferred. Such variations may reflect differences in hydration state or relative contributions of protein secondary structure components.^26,27

The Amide II band around ∼1544 cm⁻¹ was relatively more intense in MDA-MB-231 spectra, suggesting increased protein-associated contributions. Meanwhile, the Amide III band near ∼1244 cm⁻¹ showed subtle variation in relative intensity between cell lines. These spectral differences are consistent with variations in protein-associated biochemical composition across the three breast cancer cell lines.

3.1.3. Constrained Gaussian deconvolution of the amide and nucleic acid regions. To ensure consistency and reproducibility across cell lines, spectral deconvolution was performed using a constrained Gaussian fitting approach. Peak centres were fixed at 965, 1088, 1170, 1244, 1398, 1456, 1544, 1656, and 1744 cm⁻¹ based on second-derivative analysis and established vibrational assignments, while peak amplitudes and full-width at half-maximum (FWHM) values were allowed to vary. This strategy standardises peak positioning across spectra while allowing biologically relevant variations in peak intensity.

The resulting fits (Fig. 2) demonstrate comparable band structures among the three cell lines, with subtype-dependent differences primarily reflected in relative peak amplitudes within nucleic acid (∼1088, 1244 cm⁻¹), lipid (∼1456 cm⁻¹), and protein (∼1544, 1656 cm⁻¹) regions. This provides an internally consistent basis for comparative biochemical interpretation.

Quantitative parameters derived from the Gaussian fitting—including integrated peak area, FWHM, and peak height—were used to characterise subtype-dependent spectral variation.²⁹ The distribution of fitted peak areas and corresponding FWHM values across individual cells is presented in SI Fig. S1 and S2, providing a quantitative visualisation of intra- and inter-cell line variability.

MCF7 spectra exhibited relatively narrow FWHM values in the amide region, indicating comparatively uniform band profiles across the analysed cells. In contrast, MDA-MB-231 spectra showed broader peak widths and larger variations in integrated peak areas across the protein (∼1544 and ∼1656 cm⁻¹) and lipid (∼1456 cm⁻¹) bands, reflecting increased heterogeneity in biochemical composition. SKBR3 spectra displayed intermediate peak widths with relatively enhanced lipid-associated contributions (∼1456 cm⁻¹). Notably, while overall band positions remained conserved across subtypes, relative amplitude variations were observed within nucleic acid- and protein-associated regions. MCF7 displayed comparatively higher intensity within the symmetric phosphate (∼1088 cm⁻¹) region, whereas lipid-associated contributions (∼1456 cm⁻¹) were comparatively enhanced in MDA-MB-231 and SKBR3. These trends are consistent with the ratiometric analysis presented in Section 3.1.4 and are also evident in the constrained Gaussian fits shown in Fig. 2.

3.1.4. Peak intensity ratio analysis. These findings highlight distinct molecular compositions among the cell lines, with MDA-MB-231 characterised by the most complex spectral landscape (SI).

To further differentiate the biochemical profiles among MCF7, MDA-MB-231, and SKBR3 cell lines, spectral band intensity ratios were calculated from representative spectra of individual cells. For each cell map, pixel spectra were first screened to remove background contributions and then min–max normalised. A representative spectrum for each cell was obtained by averaging the processed pixel spectra, yielding one spectrum per cell (n = 25 per subtype). Band intensity ratios were subsequently calculated from these per-cell mean spectra to enable statistical comparison between breast cancer subtypes.^30–32

Two diagnostic ratios were calculated using characteristic vibrational bands within the biochemical fingerprint region. The nucleic acid to protein ratio was calculated the symmetric phosphate band at ∼1088 cm⁻¹ as a nucleic acid marker, normalised to the combined protein-associated Amide I (∼1656 cm⁻¹) and Amide II (∼1544 cm⁻¹) bands. The ∼1244 cm⁻¹ band, which contains contributions from both Amide III and asymmetric PO₂⁻ vibrations, was not used directly in the ratio calculation but was considered qualitatively to support observed trends.

Consistent with this interpretation, the relative intensity of the ∼1244 cm⁻¹ band was lowest in MCF7, highest in SKBR3, and intermediate in MDA-MB-231, supporting the observed nucleic acid-related trends across the three cell lines. The lipid-to-protein ratio was calculated using the ester carbonyl stretch at ∼1744 cm⁻¹, normalised to the same protein bands. Boxplots were generated to visualise these ratios, with medians represented by the central line and whiskers spanning 1.5× the interquartile range (IQR).

To determine statistical significance, one-way ANOVA was applied to each ratio type, followed by unpaired post-hoc t-tests between cell line pairs. Significance thresholds were denoted with a single asterisk (*) for p < 0.05. The results revealed that for the nucleic acid/protein ratio, MCF-7 significantly differed from MDA-MB-231 (p = 0.0013), but neither MCF-7 vs. SKBR3 nor MDA-MB-231 vs. SKBR3 exhibited significant differences. In contrast, the lipid/protein ratio showed greater discriminatory power: MCF-7 differed significantly from both MDA-MB-231 (p = 0.0002) and SKBR3 (p < 0.0001).

These findings demonstrate that nucleic acid-associated variations are present between certain subtypes, lipid-associated features provide a more robust marker for distinguishing MCF7 from more aggressive phenotypes (Fig. 3). The full dataset of calculated ratios is provided as SI. These quantitative measures validate the utility of O-PTIR in biomolecular profiling and offer potential for use in non-invasive diagnostic models.


	Fig. 3 Boxplots of biochemical band intensity ratios calculated from per-cell mean spectra. Nucleic acid/protein calculated as I₁₀₈₈/(I₁₅₄₄ + I₁₆₅₆); lipid/protein ratio calculated as I₁₇₄₄/(I₁₅₄₄ + I₁₆₅₆) for MCF7, MDA-MB-231, and SKBR3 cell lines show statistically significant differences across subtypes (*p < 0.05).

3.2. Multivariate analysis

3.2.1. K-Means clustering and spatial biochemistry. Common K-means clustering of the hyperspectral maps from all 75 cell measurements togethers revealed ten spatially resolved biochemical clusters (Fig. 4). These clusters showed substantial variation in nucleic acid, protein, and lipid-associated regions. Furthermore, it is clear when viewing the maps of all 75 cells together (see Fig. S3 and S4) that there are distinct differences in the cluster group memberships between the cells too, demonstrated by cell structures exhibiting different group k-means group membership.


	Fig. 4 Unsupervised spectral classification. (a) Overlaid O-PTIR common k-means centroid spectra. Each coloured mean spectrum represents the spectral features most readily found in the spatially distributed image pixels in the (inset) K-means cluster map showing biochemical domain separation within a single example cell map. (b) Expanded view of nucleic acid region of the centroids. (c) Expanded view of protein/lipid region of the centroids.

Spatial inspection of the segmentation maps demonstrated that the orange cluster, consistently located at the cell centre, most plausibly corresponds to nuclear regions rich in nucleic acids.²⁴ The associated centroid spectrum exhibited strong bands at 1088 cm⁻¹ and 1244 cm⁻¹, arising from PO₂⁻ symmetric and asymmetric stretching vibrations characteristic of DNA and RNA phosphate backbones.

In contrast, clusters positioned adjacent to the nucleus displayed weaker or broadened nucleic acid features, suggesting cytoplasmic regions or zones exhibiting partial DNA degradation or chromatin relaxation.³² These areas may represent cells in transitional or apoptotic states, consistent with spectral signatures reported previously for chromatin decondensation and protein structural alterations.^28,33 A shifted Amide II peak and reduced nucleic acid-to-protein ratio in these regions further support biochemical changes associated with stress or programmed cell death.

The green and purple clusters, previously interpreted as nuclear domains, instead localised predominantly to peripheral or punctate regions and displayed pronounced 1744 cm⁻¹ C [double bond, length as m-dash] O ester peaks, indicative of lipid-enriched domains, such as membrane boundaries or lipid droplets. Elevated CH₂ deformation around 1450 cm⁻¹ reinforced their lipid-dominant character, distinguishing them from the protein- and nucleic acid–rich clusters near the nucleus.

Within the protein region (1400–1650 cm⁻¹), variations in the Amide I band between 1655 and 1659 cm⁻¹ revealed differences in protein secondary structure, potentially marking transitions between α-helical and β-sheet conformations. These variations likely reflect differences in protein secondary structure composition and local biochemical environment across cellular regions. Collectively, this refined spatial–spectral analysis redefines the biochemical cluster assignments: the orange cluster represents nucleic-acid-rich nuclei, light green clusters correspond to protein-dense cytoplasmic regions, and green/purple clusters identify lipid-dominant subcellular domains. These results demonstrate the sensitivity of O-PTIR hyperspectral imaging in resolving subcellular biochemical heterogeneity and capturing molecular variations associated with different breast cancer phenotypes.

Together, this cluster-level analysis uncovers spatially distinct biochemical signatures¹⁷ within and across cell populations, demonstrating the resolution power of O-PTIR imaging to capture intra-sample molecular variation.

3.2.2. Principal component analysis (PCA). To explore variance across the O-PTIR spectral profiles, Principal Component Analysis (PCA) was performed on the full spectral dataset.^33,36 PCA is an unsupervised multivariate statistical technique that reduces the dimensionality of complex datasets by transforming the correlated datasets into a smaller set of uncorrelated principal components (PCs) while retaining the most significant variance associated with biochemical differences. PCA was performed on baseline-corrected and min–max normalised spectra to minimise variance arising from baseline offsets and sampling differences. Although the first four PCs captured the majority of spectral variance, 25 PCs (>99% cumulative variance) were retained during PCA-LDA training to ensure that lower-variance, yet biologically relevant features were not excluded. ANOVA analysis of the scores identified PC3 and PC4 as the most discriminating components (see Fig. 5b and c).


	Fig. 5 PCA analysis of O-PTIR spectra. (a) PCA score plot of PC4 (5% of variance) versus PC3 (10% of variance) showing partial clustering of the three breast cancer cell lines. (b) Loading spectrum for PC3. (c) Loading spectrum for PC4.

Fig. 5a demonstrates the discrimination between the three cell types in the PC3–PC4 score space, while Fig. 5b and c provide the corresponding loading plots, used for tentative biochemical interpretation.

The 2D PCA score plot reveals partial clustering among the three subtypes where MCF-7 (green) is distinguishable from both MDA-MB-231 (blue) and SKBR-3 (red) (Fig. 5a). While complete segregation is not observed, the majority of variance was captured by PC1 (56%) and PC2 (18%), accounting for 74% cumulatively, statistical analysis revealed that PC3 (10%) and PC4 (5%) contained the most biologically discriminative information between cell lines, highlighting that lower-variance components can encode critical subtype-specific biochemical differences.

The PCA loading spectra identify the wavenumbers contributing most strongly to variance captured by PC3 and PC4 (Fig. 5b and c). PC3 exhibits prominent loading features in the phosphate region (∼1088 cm⁻¹), protein-associated bands (∼1544 and ∼1656 cm⁻¹), and the lipid ester region (∼1744 cm⁻¹). The sign of the loading indicates that spectra with positive PC3 scores are relatively enriched in features corresponding to positive loading peaks, whereas spectra with negative PC3 scores show relatively greater contribution from bands with negative loadings.

Similarly, PC4 displays notable loading contributions in the amide region (∼1544 and ∼1656 cm⁻¹) and the lipid ester band (∼1744 cm⁻¹), reflecting variability in protein- and lipid-associated spectral features across the dataset. In the PC3–PC4 score space (Fig. 5a), the three cell types exhibit partial separation with observable overlap, indicating subtype-dependent biochemical trends rather than complete segregation.

Spectra of MDA-MB-231 cells tend to exhibit positive PC3 scores, indicating higher relative contributions from protein-associated bands, as reflected in the loading plot. In contrast, MCF7 cells tend to show negative PC4 scores, suggesting a comparatively higher contribution from nucleic acid-associated features. SKBR3 spectra display positive PC4 scores, consistent with increased lipid-associated contributions. These trends highlight subtype-dependent biochemical differences captured by PCA.

SKBR3 spectra tend to distribute toward regions associated with stronger lipid contributions, whereas MDA-MB-231 spectra display broader dispersion along PC4, suggesting greater variability in protein-associated features. MCF7 spectra form a comparatively compact cluster, consistent with more uniform spectral profiles.

Collectively, PC3 and PC4 capture relative contrasts among nucleic acid-, protein-, and lipid-associated bands, complementing the dominant variance described by PC1 and PC2. The integration of PCA with ratiometric analysis provides a consistent multivariate framework for characterising subtype-dependent biochemical variation.

3.2.3 Cluster analysis and classification accuracy. Supervised classification using Principal Component Analysis combined with Linear Discriminant Analysis (PCA-LDA) was performed to assess the ability of O-PTIR spectral data to discriminate among the three breast cancer cell lines. 25 PCs were utilised in the training model, plotted in Fig. 6. The 3D histogram of LD1 and LD2 scores (Fig. 6a) illustrates partial separation between MCF7 (green), MDA-MB-231 (blue), and SKBR3 (red) clusters, indicating distinct biochemical profiles corresponding to each subtype.^30,33–35


	Fig. 6 Supervised classification using PCA-LDA. (a) 3D histogram of LD1 and LD2 values illustrating spectral separation by class. (b) Confusion matrix demonstrating the performance of each spectrum, when applying leave-one-cell-out cross-validation (i.e., all spectra from that cell) across MCF7, MDA-MB-231, and SKBR3. The numbers inside each square represent the number of spectra assigned to each predicted class, with diagonal elements corresponding to correctly classified spectra.

Classification performance was further quantified using a confusion matrix derived from leave-one-cell-out cross-validation (Fig. 6b). Here, spectra from one entire cell were held out as a test set while the model was trained on the remaining cells. Only the statistically significant PCs (p < 0.001) were included in each training model. This approach reduces overfitting risks associated with intra-cell spectral similarity and provides a realistic measure of model generalisability. The model achieved an overall accuracy of 95.4%, where overall accuracy was defined as the ratio of correctly classified spectra to the total number of spectra across all classes. Class sensitivities were 95.5% for MCF7, 93.0% for MDA-MB-231, and 98.4% for SKBR3. These results demonstrate the robustness of O-PTIR combined with PCA-LDA for identification of breast cancer cell phenotypes. The elevated accuracy for SKBR3 may be attributed to its distinctive HER2-driven biochemical phenotype, while the slightly lower value for MDA-MB-231 reflects its greater biochemical heterogeneity. These outcomes collectively underscore the strength of combining vibrational spectroscopic data with multivariate modelling for subtype identification in breast cancer diagnostics.

The present findings are consistent with previous vibrational spectroscopic investigations of breast cancer cell lines using FTIR and Raman methodologies. Gaussian deconvolution of IR absorption spectra has been widely applied to resolve overlapping amide and phosphate bands in biological systems, typically employing between four and ten component peaks within the Amide I and II regions to characterise protein secondary structure and nucleic acid contributions.^24,25,28 Constrained multicomponent fitting approaches similar to that adopted here have been described in protein structural and cancer-related FTIR analyses, supporting the use of fixed peak centres with variable amplitude and bandwidth to avoid over-parameterisation.^24,27

Ratiometric analysis of nucleic acid-to-protein and lipid-to-protein contributions has also been reported in breast cancer spectroscopy studies. Raman and FTIR investigations have demonstrated increased lipid-associated features in more aggressive phenotypes such as MDA-MB-231, alongside subtype-dependent variations in nucleic acid and protein signatures.^29–31 These observations are consistent with the lipid enrichment trends observed here for SKBR3 and the intermediate behaviour of MDA-MB-231 in the ∼1244 cm⁻¹ region.

In terms of multivariate classification performance, FTIR and Raman studies employing PCA-LDA and related chemometric methods have reported sensitivity values typically ranging between 85% and 95% for discrimination of breast cancer cell lines or tissue subtypes.^9,10,32,34 The sensitivity values achieved in the present study (95.5% for MCF7, 93.0% for MDA-MB-231, and 98.4% for SKBR3) fall within or slightly above this reported range. The improved spatial resolution of O-PTIR relative to diffraction-limited FTIR may contribute to enhanced sensitivity by enabling subcellular biochemical heterogeneity to be captured more effectively.

Collectively, these comparisons indicate that the biochemical trends and classification performance observed here are consistent with established vibrational spectroscopy literature while demonstrating the applicability of O-PTIR within a sub-micron spatial resolution framework.

4. Conclusions

O-PTIR micro-spectroscopy has demonstrated its capability to characterise subtype-dependent biochemical variation among three breast cancer cell lines—MCF7, MDA-MB-231, and SKBR3—based on their vibrational signatures. Through constrained Gaussian deconvolution, ratiometric intensity analysis, and multivariate modelling, consistent differences were observed in nucleic acid-, protein-, and lipid-associated spectral regions across the three phenotypes.

Peak intensity ratio analysis revealed subtype-dependent trends, with lipid-associated contributions more prominent in SKBR3 and comparatively stronger nucleic acid-associated features in MCF7, while MDA-MB-231 displayed intermediate behaviour and broader spectral variability. Principal Component Analysis demonstrated partial clustering with observable overlap, indicating biochemical trends rather than complete segregation. Supervised classification using PCA-LDA, validated through leave-one-cell-out cross-validation, achieved high class-wise sensitivity values (95.5% for MCF7, 93.0% for MDA-MB-231, and 98.4% for SKBR3), consistent with previously reported vibrational spectroscopy studies.

While the present results demonstrate the potential of O-PTIR for high-resolution, label-free biochemical profiling of cancer cell subtypes, this study represents a proof-of-concept investigation limited to malignant cell lines. Inclusion of non-malignant breast epithelial cells and larger datasets would be required to fully evaluate diagnostic applicability. Furthermore, although cross-validation was employed to minimise overfitting, independent external validation with new replicate cell populations would strengthen translational interpretation.

Overall, these findings support the utility of O-PTIR as a sub-micron vibrational spectroscopy platform capable of capturing intracellular biochemical heterogeneity and contributing to multivariate discrimination of breast cancer subtypes within a research framework.

Author contributions

M. Mehta and N. Stone: conceptualisation, methodology, visualisation, discussion of results. M. Mehta: writing – original draft preparation, data curation, investigation. N. Stone: funding, supervision, manuscript review & editing. H. Hao: O-PTIR training and system optimisation. S. Eldershaw: cell culture preparation and revision of cell methodology.

Conflicts of interest

There are no conflicts to declare.

Data availability

The data supporting this study are available in the Supplementary Information (SI), which includes Figures S1, S2, S3 and S4 and in the accompanying Excel file containing the underlying numerical data. Additional raw O-PTIR datasets generated during the current study are available from the corresponding author upon reasonable request.

See DOI: https://doi.org/10.1039/d6an00125d.

Acknowledgements

This work is supported by EU TROPHY (ulTRafast hOlograPHic FTIR microscopy) funded by European and Innovation Council under the Horizon Europe program under grant agreement number 101047137, and is supported by UK Research and Innovation (Innovate UK) [10032224].

References

R. L. Siegel, K. D. Miller, H. E. Fuchs and A. Jemal, CA-Cancer J. Clin., 2023, 73, 17–48 CrossRef PubMed.
T. Sørlie, et al., Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 10869–10874 CrossRef PubMed.
C. M. Perou, et al., Nature, 2000, 406, 747–752 CrossRef CAS PubMed.
N. Howlader, et al., SEER Cancer Statistics Review 1975–2018, National Cancer Institute, Bethesda, MD, 2018 Search PubMed.
J. Ferlay, et al., Global Cancer Observatory: Cancer Today, IARC, Lyon, 2020 Search PubMed.
E. A. Rakha, et al., Pathology, 2020, 52, 134–144 Search PubMed.
W. M. Elshemey, A. M. Ismail and N. S. Elbialy, J. Med. Biol. Eng., 2016, 36, 369–378 CrossRef.
M. C. Cummings, et al., Histopathology, 2014, 65, 1–9 Search PubMed.
M. J. Baker, et al., Nat. Protoc., 2014, 9, 1771–1791 CrossRef CAS PubMed.
J. R. Hands, et al., Nat. Commun., 2014, 5, 3973 CrossRef PubMed.
S. Duraipandian, et al., Expert Rev. Mol. Diagn., 2014, 14, 547–564 Search PubMed.
T. P. Wrobel, L. Mateuszuk and S. Chlopicki, Analyst, 2020, 145, 2080–2099 Search PubMed.
M. Kansiz, et al., Analyst, 2020, 145, 6382–6395 Search PubMed.
C. L. M. Morais, et al., Analyst, 2020, 145, 1025–1044 RSC.
D. Desai, et al., Anal. Chem., 2020, 92, 12255–12261 Search PubMed.
F. K. Lu, et al., Proc. Natl. Acad. Sci. U. S. A., 2015, 112, 11624–11629 CrossRef CAS PubMed.
M. S. Bergholt, et al., Biomed. Opt. Express, 2011, 2, 576–589 Search PubMed.
M. J. Pilling and P. Gardner, Chem. Soc. Rev., 2016, 45, 1935–1957 RSC.
D. L. Holliday and V. Speirs, Breast Cancer Res., 2011, 13, 215 CrossRef PubMed.
R. M. Neve, et al., Cancer Cell, 2006, 10, 515–527 CrossRef CAS PubMed.
M. A. Lemmon and J. Schlessinger, Cell, 2010, 141, 1117–1134 CrossRef CAS PubMed.
H. Fabian, et al., Biochim. Biophys. Acta, 2006, 874–887 CrossRef CAS PubMed.
R. Bhargava, Appl. Spectrosc., 2012, 66, 1091–1120 CrossRef CAS PubMed.
Z. Movasaghi, S. Rehman and I. U. Rehman, Appl. Spectrosc. Rev., 2008, 43, 134–179 CrossRef CAS.
F. Gasparri and M. Muzio, Biochem. J., 2003, 369, 239–248 CrossRef CAS PubMed.
A. Barth, Biochim. Biophys. Acta, 2007, 1073–1101 CrossRef CAS PubMed.
D. M. Byler and H. Susi, Biopolymers, 1986, 25, 469–487 CrossRef CAS PubMed.
H. Abramczyk, B. Brożek-Płuska, M. Kopeć and J. Surmacki, Sci. Rep., 2020, 10, 1483 CrossRef PubMed.
J. Zhao, et al., J. Raman Spectrosc., 2007, 38, 220–226 Search PubMed.
S. G. Kazarian and K. L. A. Chan, Analyst, 2013, 138, 1940–1951 RSC.
M. J. Baker and E. Gazi, Analyst, 2015, 140, 2114–2120 RSC.
S. Kumar, et al., Anal. Bioanal. Chem., 2020, 412, 819–831 CrossRef PubMed.
P. Lasch, Chemom. Intell. Lab. Syst., 2012, 117, 100–114 CrossRef CAS.
S. Wold, K. Esbensen and P. Geladi, Chemom. Intell. Lab. Syst., 1987, 2, 37–52 CrossRef CAS.
M. Barker and W. Rayens, J. Chemom., 2003, 17, 166–173 CrossRef CAS.
H. J. Butler, J. M. Ashton, B. Bird, G. Cinque, K. Curtis, M. J. Dorney, K. Esmonde-White, N. J. Fullwood, B. Gardner, P. L. Martin-Hirsch, M. J. Walsh, M. R. McAinsh, N. Stone and F. L. Martin, Nat. Protoc., 2016, 11, 664–687 CrossRef CAS PubMed.

Click here to see how this site uses Cookies. View our privacy policy here.