Qingfeng
Zheng‡
a,
Junyi
Li‡
b,
Lin
Yang
c,
Bo
Zheng
c,
Jiangcai
Wang
b,
Ning
Lv
c,
Jianbin
Luo
b,
Francis L.
Martin
d,
Dameng
Liu
*b and
Jie
He
*a
aDepartment of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China. E-mail: prof.jiehe@gmail.com
bState Key Laboratory of Tribology, Tsinghua University, Beijing 100084, China. E-mail: ldm@tsinghua.edu.cn
cDepartment of Pathology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
dSchool of Pharmacy and Biomedical Sciences, University of Central Lancashire, Preston PR1 2HE, UK
First published on 28th November 2019
Patient survival remains poor even after diagnosis in lung cancer cases, and the molecular events resulting from lung cancer progression remain unclear. Raman spectroscopy could be used to noninvasively and accurately reveal the biochemical properties of biological tissues on the basis of their pathological status. This study aimed at probing biomolecular changes in lung cancer, using Raman spectroscopy as a potential diagnostic tool. Herein, biochemical alterations were evident in the Raman spectra (region of 600–1800 cm−1) in normal and cancerous lung tissues. The levels of saturated and unsaturated lipids and the protein-to-lipid, nucleic acid-to-lipid, and protein-to-nucleic acid ratios were significantly altered among malignant tissues compared to normal lung tissues. These biochemical alterations in tissues during neoplastic transformation have profound implications in not only the biochemical landscape of lung cancer progression but also cytopathological classification. Based on this spectroscopic approach, classification methods including k-nearest neighbour (kNN) and support vector machine (SVM) were successfully applied to cytopathologically diagnose lung cancer with an accuracy approaching 99%. The present results indicate that Raman spectroscopy is an excellent tool to biochemically interrogate and diagnose lung cancer.
Raman spectroscopy is an inelastic light scattering method that helps detect molecule-specific vibrations in a sample.3 Each molecular species has its own unique set of molecular vibrations derived from a Raman spectrum, which is applicable as an unlabelled biomolecular fingerprint.4 Water does not interfere with Raman spectroscopy, which is especially advantageous for in vivo measurements. Thus, Raman spectroscopy has already been used for biological characterisation owing to its quantitative and semi-quantitative potential. Furthermore, Raman spectroscopy considerably benefits biomedical diagnoses owing to its non-invasiveness, relatively short acquisition time, and the ability to provide biochemical molecular information.5,6 Although Raman spectroscopy has high chemical specificity, the spectral differences among classes are difficult to visually observe. Multivariate analysis has been used to understand the spectral data.7 Cluster methods including principal component analysis (PCA) and linear discriminant analysis (LDA) can be used to extract biological information derived from Raman spectroscopy. Furthermore, Raman spectroscopy along with classification methods including support vector machine (SVM) and k-nearest neighbour (kNN) methods help classify tissue carcinogenesis.8,9
This study aimed at assessing the efficacy of Raman spectroscopy to analyze lung tissues with or without cancer as a non-invasive diagnostic tool for clinicopathological classification. This approach is expected to elucidate biochemical changes arising from lung tissue carcinogenesis.
Sections for Raman spectroscopy were prepared from the corresponding frozen samples of patients from the biological sample library. Four-micrometre-thick sections were cut and fixed in 95% alcohol at room temperature (18–24 °C) away from light prior to Raman testing. The tissue sections were laid on aluminium foil substrates, and the size of each dissected sample was approximately 0.5 × 0.5 cm2.
All experiments were performed in accordance with the Guidelines about Research Involving Human Subjects, and approved by the ethics committee at Tsinghua University (No. 20190036). Informed consents were obtained from the human participants of this study.
The dataset was first input into PCA, an unsupervised data analysis method to reduce data dimensionality, to determine principal components (PCs), extract key features, and facilitate data visualisation.11–13 Furthermore, the loading vectors (commonly called principal components [PCs]) within this matrix are eigenvectors of the covariance matrix of the data. These PCs were obtained by determining the eigenvectors and eigenvalues of the covariance matrix obtained from the data matrix. The eigenvector with the highest eigenvalue yielded the first PC (PC1), based on the greatest variance in the data. To understand the importance of the order of the PCs, ‘percentage of variance explained’ was calculated. To account for most of the variance in the dataset, the first 10 PCs were selected. However, PCA is an unsupervised method, treating all classes as one without considering class labels, and can only recognise the total variance in a whole dataset.
PCA has a low discrimination power for data categorisation. Therefore, to maximise the spectral differences between classes and efficiently extract class-specific biomarkers associated with biochemical changes, a supervised approach, LDA, was applied for the dataset after PCA with the selected PCs. The first 10 PCs, accounting for >90% of the variance of the selected spectral regions, were selected for the subsequent LDA.14 Herein, a combinatorial PCA–LDA method was used for the multivariate analysis. LDA determines the discriminant function line that maximises the inter-class distance and minimises the intra-class distance, thus yielding an optimal linear boundary separating the different classes.
Cluster plots derived from the first two LD spaces were used to distinguish alterations among the three tissue types from different developmental stages, where each spectrum was defined as a point while point overlap and separation indicate similar or dissimilar features. Furthermore, PCA–LDA cluster vector plots were used to plot a ‘spectrum-like’ graph indicating the wavenumbers at which inter-class alterations occurred, as the peak magnitude (variance) indicated the extent of alteration.8,15,16 Thus, the peak positions have been considered as spectral biomarkers for tissues among different disease stages.
Furthermore, an SVM is a robust classifier, easy to implement, and performs well when applied to the unseen dataset, simultaneously optimising the decision boundary.18–21 Band intensities as each wavenumber derived from the pre-processed Raman spectra were considered as inputs for the SVM model. In the proposed automated system for screening, a linear kernel is utilised. This kernel assists in deriving complex associations between the possible output classes and the Raman spectra. Furthermore, a radial basis function (RBF) kernel was investigated, although its results were not substantially better than those obtained with a linear kernel. Both the factor for the soft-margin function and the RBF kernel sigma were fine-tuned following a grid search method for the validation subset.
The fingerprint region (600–1800 cm−1) of the Raman spectra obtained from the lung tissue dissections is shown in ESI Fig. S2.† As shown in Fig. 2, distinct (despite some overlap) segregations among normal and cancer subtypes were observed in the PCA score plot of the first three PCs. Significant differences in these three spaces were observed between normal and cancer groups (p < 0.001).
As shown in Fig. 3, the loading plot denotes the most pronounced 10 discriminating wavenumbers responsible for score plot segregations in each PC space. In PC1, band intensities corresponding to the biochemical segregations were observed at 1595 cm−1, 738 cm−1, 1121 cm−1, 1644 cm−1, 1297 cm−1, 1179 cm−1, 1405 cm−1, 1436 cm−1, 1332 cm−1, and 1233 cm−1; and in PC2 space, 1305 cm−1, 1011 cm−1, 1578 cm−1, 1441 cm−1, 1472 cm−1, 1228 cm−1, 752 cm−1, 1692 cm−1, 1547 cm−1, and 1152 cm−1. Similarly, the significant band intensities were 1648 cm−1, 754 cm−1, 1132 cm−1, 1600 cm−1, 1341 cm−1, 1444 cm−1, 10009 cm−1, 1383 cm−1, 1309 cm−1, and 1690 cm−1.
Data from the cancer groups were divided into three groups in accordance with the lung cancer stage. Thereafter, the pre-processed spectral dataset was analysed using the PCA–LDA method. In the LD1 space, score plots (Fig. 5) derived from the Raman spectral dataset revealed significant segregations among groups (ESI Table S2†). Moreover, a linear variation corresponding to the cancer stage was observed in accordance with the LD1 score plots. To elucidate the mechanism underlying the inter-group alterations at different stages, cluster vector plots were used to assess the biochemical changes (Fig. 5). In the cluster vector plots, biochemical alterations were observed by comparing the other groups with the control group. Distinct wavenumbers corresponding to the segregation at T1, T2, and T3 in comparison with the control group were recorded (ESI Table S3†). By comparing T1 with the control group, the band intensities corresponding to the biochemical segregations were observed at 1132 cm−1, 1591 cm−1, 750 cm−1, 1314 cm−1, 1343 cm−1, 1266 cm−1, 1199 cm−1, 1007 cm−1, 1373 cm−1, and 1527 cm−1. In comparison of T2 with the control group, the band intensities corresponding to the biochemical segregations were observed at 1132 cm−1, 750 cm−1, 1471 cm−1, 1314 cm−1, 1176 cm−1, 1591 cm−1, 1799 cm−1, 1343 cm−1, 1258 cm−1, and 1000 cm−1. By comparing T3 with the control group, the band intensities corresponding to the biochemical segregations were observed at 1132 cm−1, 750 cm−1, 1591 cm−1, 1314 cm−1, 1343 cm−1, 1265 cm−1, 875 cm−1, 1527 cm−1, 1176 cm−1, and 1007 cm−1. Upon tentative assignments of wavenumbers, the most prominent biochemical alterations during the tumorigenesis were mostly observed in DNA/RNA and protein regions.
Fig. 6 Comparisons of the relative intensity ratios of the selected Raman bands with the corresponding tentative biochemical assignments of the tissue samples. All data are represented as mean ± standard deviation values (statistical analyses are shown in ESI Table S4†). |
Data from the different cancer types were divided into two groups: adenocarcinomas and squamous carcinomas. This pre-processed spectral dataset was then analysed using the PCA–LDA method. In the LD1 space, score plots (Fig. 7) derived from the Raman spectral dataset displayed significant segregations between these two groups (P < 0.001). In addition, cluster vector plots were generated to assess the biochemical alterations between adenocarcinomas and squamous carcinomas (Fig. 7). The band intensities corresponding to the biochemical segregations were observed at 1134 cm−1, 896 cm−1, 1455 cm−1, 822 cm−1, 1363 cm−1, 1627 cm−1, 866 cm−1, 1170 cm−1, 751 cm−1, and 1331 cm−1. Based on the tentative assignments of wavenumbers, the most prominent biochemical difference between adenocarcinomas and squamous carcinomas mostly appeared in the protein regions.
The ratio plots constructed during tumorigenesis revealed that the ratio of both saturated and unsaturated lipids in tissues displayed an increasing tendency during tumorigenesis, indicating that protein-to-lipid and nucleic acid-to-lipid ratios seemed to decrease, while the protein-to-nucleic acid ratio tended to increase (Fig. 6).
Method | |||
---|---|---|---|
kNN | SVM | ||
Control vs. cancer | Sensitivity | 97.34% | 98.06% |
Specification | 97.22% | 99.84% | |
Accuracy | 97.24% | 99.31% | |
Adenocarcinoma vs. squamous carcinoma | Sensitivity | 92.17% | 95.55% |
Specification | 81.38% | 75.86% | |
Accuracy | 90.17% | 92.20% |
When same methods were used to classify cancer stages, adequate data were obtained, which yielded a more accurate classification. Among the three groups (T1, T2, and T3), the classification accuracy of T1 approached 88.53% via the SVM method and 87.69%, while the accuracy of the other groups was markedly low potentially because of the limited data obtained herein.
Furthermore, the classification methods were used to classify adenocarcinoma and squamous carcinoma. The sensitivity and specification approached 92.17% and 81.38%, while those of the SVM model were 95.55% and 75.86%, respectively (Table 1).
The detailed Raman band assignment presented in the loadings of the first 3 PCs indicated biochemical alterations resulting from various bimolecular structures including DNA/RNA, amide II, amide III (random coil, β sheet, and α helix), overlapping carbon–hydrogen (CH and CH2) vibrations in protein and lipids, and carbon–carbon (CC) vibrational modes in lipids.22 The ratio plots indicated that lipid metabolism significantly contributes to tumorigenesis.23,24 Saturated lipid levels in cancer tissues appeared markedly increased, while unsaturated lipid levels displayed a marked tendency to decrease, probably owing to lipid peroxidation during tumorigenesis.25–27 Furthermore, protein ratios suggest disordered protein metabolism during tumorigenesis, potentially leading to pulmonary fibrosis.28–30
The molecular pathogenesis of lung cancer is complex owing to diverse biological phenomena, from cell metabolism to signalling, where a knowledge gap persists. Along with the evidence of tumorigenesis from histopathological and immunohistochemical analyses of lung sections, Raman spectroscopy combined with PCA highlighted remarkable biochemical alterations in cancer tissues in comparison with normal tissues. Furthermore, based on the Raman spectra, distinctive changes in the levels of amide proteins, lipids, nucleic acids, carbohydrates, and glycogen were observed. Raman spectroscopy thus presents great potential for detecting cancerous changes and identifying the biochemical biomarkers of lung cancer, concurrent with the previous reports.5,31–34
The spectral data presented herein constitute a multi-dimensional dataset including abundant information regarding the biochemical characteristics of tissue samples, potentially providing a detailed description of the sample and being applicable for diagnosing lung cancer. Along with multivariate analysis and machine learning methods, patterns of pathological alterations within tissues can be easily detected.35,36
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/C9AN02175B |
‡ Joint first authors. |
This journal is © The Royal Society of Chemistry 2020 |