Barnaby G.
Ellis
a,
Conor A.
Whitley
a,
Safaa
Al Jedani
a,
Caroline I.
Smith
a,
Philip J.
Gunning
b,
Paul
Harrison
a,
Paul
Unsworth
a,
Peter
Gardner
c,
Richard J.
Shaw
bd,
Steve D.
Barrett
a,
Asterios
Triantafyllou
e,
Janet M.
Risk
b and
Peter
Weightman
*a
aDepartment of Physics, University of Liverpool, L69 7ZE, UK. E-mail: peterw@liverpool.ac.uk
bDepartment of Molecular and Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L3 9TA, UK
cManchester Institute of Biotechnology, 131 Princess Street, University of Manchester, Manchester, M1 7DN, UK
dRegional Maxillofacial Unit, Aintree University Hospital, Liverpool, L9 7AL, UK
eDepartment of Pathology, Liverpool Clinical Laboratories, University of Liverpool, Liverpool, L69 3GA, UK
First published on 5th July 2021
A novel machine learning algorithm is shown to accurately discriminate between oral squamous cell carcinoma (OSCC) nodal metastases and surrounding lymphoid tissue on the basis of a single metric, the ratio of Fourier transform infrared (FTIR) absorption intensities at 1252 cm−1 and 1285 cm−1. The metric yields discriminating sensitivities, specificities and precision of 98.8 ± 0.1%, 99.89 ± 0.01% and 99.78 ± 0.02% respectively, and an area under receiver operator characteristic (AUC) of 0.9935 ± 0.0006. The delineation of the OSCC and lymphoid tissue revealed by the image formed from the metric is in better agreement with an immunohistochemistry (IHC) stained image than are either of the FTIR images obtained at the individual wavenumbers. Scanning near-field optical microscopy (SNOM) images of the tissue obtained at a number of key wavenumbers, with high spatial resolution, show variations in the chemical structure of the tissue with a feature size down to ∼4 μm. The image formed from the ratio of the SNOM images obtained at 1252 cm−1 and 1285 cm−1 shows more contrast than the SNOM images obtained at these or a number of other individual wavenumbers. The discrimination between the two tissue types is dominated by the contribution from the 1252 cm−1 signal, which is representative of nucleic acids, and this shows the OSCC tissue to be accompanied by two wide arcs of tissue which are particularly low in nucleic acids. Haematoxylin and eosin (H&E) staining shows the tumour core in this specimen to be ∼40 μm wide and the SNOM topography shows that the core centre is raised by ∼1 μm compared to the surrounding tissue. Line profiles of the SNOM signal intensity taken through the highly keratinised core show that the increase in height correlates with an increase in the protein signal. SNOM line profiles show that the nucleic acids signal decreases at the centre of the tumour core between two peaks of higher intensity. All these nucleic acid features are ∼25 μm wide, roughly the width of two cancer cells.
There have been several reviews of advances in the instrumentation and application of the FTIR technique to cancer12–15 and the application of techniques for obtaining chemical information from FTIR.3,6–9 We recently applied a novel machine learning multivariate metrics analysis (MA) technique to the analysis of FTIR images obtained from four cell lines associated with oesophageal cancer16 and compared its performance with the well-established random forest (RF) method. The MA was found to achieve greater accuracy in discriminating between the cell types in a shorter time than the RF method. In particular the MA was able to discriminate with accuracies in the range of 81% to 97% between OE19 and OE21 cell lines, associated respectively with adenocarcinoma and squamous carcinoma, and more importantly between cancer associated myofibroblasts (CAM) and adjacent tissue myofibroblasts (ATM) obtained from the same patient. In addition to discriminating between these cell lines, the MA yielded a number of key spectral biomarkers that had not been identified in previous FTIR studies of oesophageal cancer.
FTIR and Raman imaging has previously been applied to the discrimination of oral cancer from histologically normal or benign tissue in a number of studies.17 For example, Pallua et al.18 used principal component analysis (PCA) and cluster analysis to produce pseudo-colour images of oral squamous cell carcinoma (OSCC) tissue microarrays and showed correspondence between FTIR and routine histology, suggesting that tissue types are separable by their IR spectra when appropriate methods are used to analyse the dataset. Lloyd et al.2 developed a multivariate analysis technique that combined PCA followed by linear discriminant analysis (LDA) to results obtained by Raman spectroscopy. This was able to discriminate between lymph nodes with benign pathology from those harbouring lymphoma or metastases of head and neck cancer with sensitivities and specificities of 81% and 89% respectively. Another study19 used a framework of feature selection and classification algorithms to identify spectral features which distinguished normal mucosa, pre-cancerous tissue and cancer of the oral cavity. Particular wavenumbers, previously correlated with chemical moieties such as glycogen and proteins, were discriminatory which suggests that relevant information comparable to that previously obtained via other methodologies is achievable from such data. A comprehensive review of Raman and FTIR studies of oral cancers has recently been published by Byrne et al.17
The present investigation examines the value of the MA technique in discriminating between lymph nodal metastasis of oral cancer and indigenous lymphoid tissue. High spatial resolution measurements using an aperture scanning near-field optical microscope (SNOM) provide additional insight into the chemical biology of the metastatic tissue.
Regions of interest (ROIs) (n = 2) containing both metastatic OSCC and surrounding lymphoid tissue were identified by light microscopy on sections routinely prepared and stained with haematoxylin and eosin (H&E). Cores of 1 mm diameter corresponding to the ROIs were then obtained from the FFPE blocks using a Beecher MTA-1 tissue microarrayer for constructing a tissue microarray block. Serial, 5 μm thick, sections were cut from the tissue microarray block and floated onto charged glass slides for histopathology and immunohistochemistry (IHC) and onto calcium fluoride (CaF2) disks for FTIR imaging. While sections for IHC were eventually subjected to deparaffinisation, sections for FTIR remained in paraffin wax to minimise further changes in hydration and structure of the samples. Six serial sections were utilised and comprised two sections for FTIR imaging sandwiched between two sections stained with H&E and two with IHC for pan-cytokeratins using the AE1AE3 antibody (Agilent DAKO, Stockport, UK) and a Bond RX™ autostainer (Leica Biosystems, Milton Keynes, UK). The H&E and IHC stained sections were scanned using an Aperio CS2scanner (Leica Biosystems) to facilitate co-registration with IR images.
The histopathological and FTIR images were cross-referenced and spectra from the ROIs were identified and labelled as OSCC or lymphoid tissue as appropriate. Labelled FTIR data were used to train a discriminatory model using the MA technique.16 MA is a supervised learning technique which generates an ensemble of bivariate classifiers based on the ratio of absorbances for all pairings of wavenumber features in the data. Through an iterative approach, it seeks to determine the ratios which provide the best classification accuracy, incorporating the top ranking metrics into a dynamic hard-voting ensemble classifier. The main advantage of this approach is that it is a more direct measure of feature importance – a cumulative importance histogram is obtained, rather than a multivariate weight vector that results from classifiers such as logistic regression and linear discriminant analysis. An equal number of spectra were randomly sampled from each image so as to mitigate the risk of overfitting to image-specific features. The MA model was trained using a three-fold cross validation regime, whereby the data is divided into three partitions, selecting two for training and holding out the third for testing. This process is repeated three times so that all data appears in both the training and testing sets.
These two wavenumbers and those contained in the next four metrics in rank order, 1254/1285, 1250/1289, 1252/1287 and 1252/1289, draw attention to a very narrow region of the FTIR spectrum, wherein the average spectra of different types of tissue show differences (Fig. 2).
Fig. 2 Average FTIR profiles for (a) lymphoid tissue (grey) spectra and (b) OSCC (black). The shaded grey rectangles show the regions of 1250–1254 cm−1 and 1285–1289 cm−1. |
This highest-ranking metric discriminates between OSCC and lymphoid tissue better than the individual wavenumbers (Fig. 3). If the absorbance at 1252 cm−1 and 1285 cm−1 were used individually as discriminatory features, the performance of the model would drop significantly. The sensitivity and specificity obtained by using 1252 cm−1 alone would be 89.3% and 73.3% respectively; the corresponding results acquired using 1285 cm−1 alone would be 90.4% and 54.4% respectively. This is illustrated by the normal distributions shown in Fig. 1(b) and (c). Thus, although a correspondence between tumour cells stained by IHC [Fig. 3(a)] and the low absorbance at 1252 cm−1 [Fig. 3(c)] is observed, a greater correlation is seen between the IHC and the ratio of 1252 cm−1/1285 cm−1 [Fig. 3(d)]. However, topographically different areas of the metastasis (e.g. periphery versus the more heavily keratinised centre as appreciated on H&E sections) are not discriminated by the metric (Fig. 3).
In order to bring out in more detail the information captured in the images obtained with high spatial resolution using the SNOM (Fig. 4), the smaller region of the tumour in the bottom right-hand corner of the H&E image [Fig. 4(a)] was used to create line profiles of the topography and the SNOM intensities at each wavenumber. Each profile was obtained along a 1-pixel-wide line close to the centre of the OSCC nodal metastasis (Fig. 5). The noise levels in the SNOM images (and hence the profiles) were quantified by comparing raw images with de-noised images, and the noise-to-signal ratios were found to be <5% for all wavenumbers. Line profiles taken within 8 microns of those shown in Fig. 5 show only very small differences from those shown in the figure. The topography [Fig. 5(a)] of the centre of the tumour can be seen to be higher than the surrounding tissue. This increase in height correlates with an increase in the protein signal [Fig. 5(c)] in this region of the image. The line profiles obtained at other wavenumbers show more marked variations in intensity across smaller distances, indicating that there are many subtle changes in the chemistry of the metastasis.
The SNOM images were taken in IR transmission mode and so the SNOM intensity profiles in Fig. 5 have been inverted to present a more intuitive interpretation – peaks (valleys) in the profiles correspond to more (less) absorption. The profiles are presented on vertical scales that have been corrected for image acquisition parameters such as detector sensitivity. Comparison between profiles at different wavenumbers should not be taken as providing values for relative molecular concentrations, as the SNOM fibre transmission varies with wavenumber and each molecular vibration has a different transition dipole strength.
FTIR absorbance at 1252 cm−1 would be expected to be related to nucleic acid content. However, absorbance at this wavenumber was observed to be lower in OSCC metastasis compared with the surrounding lymphoid tissue [Fig. 3(c)] and this is reflected in the ratio of 1252 cm−1 and 1285 cm−1. This is surprising since it is known that OSSC, like many solid tumours, often shows changes in DNA ploidy27 and, indeed, that such changes may be an early event.28 This might be explained by the fact that the nuclei in lymphoid tissue are more closely packed than in the tumour, with its typically larger cells, and hence the IR absorbance at 1252 cm−1 would be higher for lymphoid tissue. The inability of FTIR to discriminate between the periphery and highly keratinised centre of the metastasis [Fig. 3(a), (c) and (d)] was overcome in higher resolution studies utilising SNOM.
The high spatial resolution of the SNOM images have the potential to provide some chemical information, although over a smaller region of the specimen and at a limited number of wavenumbers. This makes the choice of wavenumbers particularly important since biological macromolecules give complex IR absorbance spectra. Nevertheless, with a careful choice of wavenumbers, the SNOM images and line profile data, obtained with higher intrinsic spatial resolution than FTIR, can be used to infer on basic chemistry of individual tissues. The wavenumbers 1751 cm−1, 1650 cm−1 and 1369 cm−1 are commonly attributed to lipids, the amide I peak of proteins and the C–N stretch vibrations of the cytosine and guanine components of nucleic acids respectively29 and have been employed in previous SNOM studies,20,24,25 whereas the 1285 cm−1 signal is characteristic of collagen.29 The image obtained at 1252 cm−1 can be attributed to the (PO2−) nucleic acids and/or RNA signal, since this wavenumber is within a broad range of absorption from these molecules.30 As expected, the SNOM images of the small region of the tissue microarray core shown in Fig. 4 show detail on a finer length scale than is obtained in the diffraction-limited FTIR images of Fig. 3. These images show variations in the chemical structure of the tissue with a feature size down to ∼4 μm. All the images indicate differences in spectral intensities in the region of the OSCC nodal metastatic core and this region of the image is also clearly delineated in the topographic image [Fig. 4(c)]. The image formed from the ratio of the intensities of the SNOM images obtained at 1252 cm−1 and 1285 cm−1 [Fig. 4(i)] shows more contrast between different areas of the tissue than the images obtained at any of the individual wavenumbers. In particular it shows that the centre of the tumour in the bottom right of Fig. 4(a) is bounded by two broad arcs of tissue in which the ratio of the intensity of the discriminating wavenumbers is particularly low. Thus, the SNOM images are able to provide more detail than the FTIR images and highlight differences between the centre and periphery of the metastasis.
The line profiles obtained in the small region of the tumour core shown in Fig. 5 provide more detail of the chemical differences therein. As regards topography [Fig. 5(a)] the centre of the tumour was higher than the periphery. Although it is not possible to quantify this difference precisely due to the difficulty in calibrating the vertical scale of the topographic image, it was found to be ∼1 μm. The increase in height correlates with an increase in the protein signal [Fig. 5(c)] in this region of the image. The centre of the metastasis appeared highly keratinised and this is mirrored in the 1650 cm−1 amide I line profile which can be attributed to the α-helical structure of cytokeratins.31,32 Furthermore, changes in spatial arrangement and subpopulations of cytokeratins and the molecules related to keratinisation (involucrin, etc.) are expected between the often heavily keratinised centre of tumour cells aggregates and the less keratinised periphery, the latter also corresponding to the advancing front of the primary, which could be reflected in the line profile at 1650 cm−1. In contrast to the smooth increase and decrease of both the height and the protein intensity in this region of the image, the line profiles obtained at other wavenumbers show more marked variations in intensity over smaller distances, indicating that there are subtle changes in the chemistry of the tissue. The attribution of the 1252 cm−1 signal to the (PO2−) vibration of nucleic acids is supported by the very close correspondence between the line profiles obtained at 1252 cm−1 [Fig. 5(f)] and 1369 cm−1 [Fig. 5(d)], since the latter is attributed to the C–N stretch vibrations of the cytosine and guanine components of nucleic acids. A similar correspondence between the line profiles of these two wavenumbers was found in all regions of the images examined. As previously mentioned, the line profile obtained at 1285 cm−1 [Fig. 5(e)] is attributable to collagen. However, given the relative paucity of collagen in lymph nodes, the discriminating metric of Table 1 possibly arises from variations in the levels of nucleic acids and collagen in the tissue, with the signal from the nucleic acids dominating the discrimination. This would be consistent with the relative discrimination between OSCC and lymphoid tissue obtained from FTIR data [Fig. 3(c) compared to Fig. 3(b)].
Taking the peak in the line profile of the topography as a reference for the centre of the tumour, the nucleic acid line profile shows a small central reduction in intensity in the centre of the metastasis with two peaks in intensity ∼25 μm on either side, which is consistent with the increased keratinisation at that sub-site. Two further reductions in intensity are observed at ∼50 μm from the centre and correlate with the periphery of the metastasis, with each of these features ∼25 μm in width, roughly corresponding to 2–3 layers of cancer cells. If this signal were based solely on absorbance by nucleic acids, this would appear counter-intuitive because the more differentiated, keratinised core of the tumour most likely contains fewer, mitotically inactive nuclei compared with the tumour periphery.33 However, if we use 1252 cm−1 as a wavenumber characteristically absorbed by the phosphate groups in all nucleic acids34,35 and in the phosphate groups of phospholipids,36 we hypothesise that this increase in absorbance reflects a change in the RNA signature and/or an increase in endoplasmic reticulum commensurate with an increased proteinosynthetic events in this sub-site.
The 1285 cm−1 line profile represents a complex pattern of relative absorbance across the whole section, but notably indicates an increase immediately to the right of the tumour centre. The amount and distribution of collagen, including fibre alignment, density, width length and straightness, appear to differ between cancer types and at different sites within a tumour.37,38 These attributes have an effect on invasion, metastasis and apoptosis as well as being a prognostic factor correlated with cancer differentiation, invasion, lymph node metastasis, and clinical stage. Collagen concentration is also influenced by the hypoxic microenvironment39 and affects intensity of immune cell response.40 It is thus plausible that the differences observed in the 1285 cm−1 SNOM line profiles are due to more subtle changes in collagen fibre structure than in concentration and require further investigation.
SNOM images of the tissues obtained at a number of key wavenumbers, with a higher spatial resolution, show variations in chemistry with a feature size down to ∼4 μm. The image obtained from the ratio of the intensities of the SNOM images obtained at the discriminating wavenumbers supports the finding from the FTIR images that the discrimination between the two tissue types is dominated by the contribution from the 1252 cm−1 signal which is representative of nucleic acids. Additional insight into the chemistry is revealed by line profiles of the SNOM intensity obtained at specific wavenumbers, representative of particular chemical moieties, in the region of the OSCC–lymphoid tissue interface. The differences between the periphery and the centre of the metastasis reflect our current biological knowledge, but also raise additional, more subtle, questions at the cellular level.
This study demonstrates that a combination of the MA technique applied to labelled FTIR spectra together with SNOM images obtained at key wavenumbers identified by MA provides insight into the chemistry of tissues.
This journal is © The Royal Society of Chemistry 2021 |