Yuqing Zhanga,
Xiaojia Ye*b,
Gengxin Xub,
Xiulong Jina,
Mengmeng Luanb,
Jiatao Louc,
Lin Wangc,
Chengjun Huangd and
Jian Ye*a
aSchool of Biomedical Engineering & Med-X Research Institute, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai 200030, China. E-mail: yejian78@sjtu.edu.cn; Tel: +86 021-62934760
bSchool of Mathematics and Information Science, Shanghai Lixin University of Commerce, 2800 Wenxiang Road, Shanghai 201620, China. E-mail: yxj@lixin.edu.cn; Tel: +86 021-67705383
cDepartment of Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai 200030, China
dKey Laboratory of Microelectronics Devices and Integrated Technology, Institute of Microelectronics of Chinese Academy of Sciences, 3 Bei-Tu-Cheng West Road, Beijing 100029, China
First published on 7th January 2016
Non-small-cell lung cancer (NSCLC) comprises ∼75% of all lung cancer and consists of several subtypes. Identification of lung cancer cell subtypes is important for choosing the appropriate therapy plan and reducing mortality. In this study, we have been able to identify and distinguish three subtypes of NSCLC cells (H1229, H460 and A549) and leukocytes on the single-cell level by combining surface-enhanced Raman scattering (SERS) spectroscopy and multivariate statistical methods. After the evaluation of three statistical methods, support vector machines (SVM) shows the best classification performance compared to hierarchical cluster analysis (HCA) and principal component analysis (PCA) methods based on a large amount of cell SERS spectra from Au nanoshells as intracellular nanoprobes. The SVM classification model provides a prediction accuracy of 88.75% for “unknown” independent cell types and an accuracy of ∼95% for the two subtypes mixed samples on a single-cell level. This method combining SERS and SVM could potentially be adapted to the distinction of other types of cancer cells and be applied for conducting non-invasive downstream cell identification after the capture of circulating tumor cells.
At present, there are mainly two types of method to separate and identify cancer cells from blood: one based on physical properties such as size, density and deformability; the other based on biological properties such as protein expression.7,8 However, the methods in the former type including density gradient centrifugation and membrane filtration are difficulty to realize the identification and distinguishing of different types of cancer cells. The methods in the latter type, such as immunomagnetic separation, can only distinguish cancer cells with specific biomarkers and are limited by the spectral overlapping of fluorescent tags.9 Therefore, the development of new method for the non-invasive, highly sensitive and label-free identification and distinction of closely related cell phenotypes is in urgent need.10
Recently, surface-enhanced Raman scattering (SERS) spectroscopy, as a label-free and ultra-sensitive technique for chemical and biomedical analysis,11,12 is emerging as a new powerful tool for the analysis of individual cells, mainly due to its fingerprint spectral characteristic.10,13–15 As SERS can enhance the Raman signals of molecules close to metal surface by as much as 6 to 14 orders of magnitude,16 we can use near-infrared (NIR) light as excitation laser with low laser power and reduced photo-damage to analyze living cells.17,18 In SERS, Au nanoparticles are usually used as optical enhancing materials because of their chemical stability, good biocompatibility and strong enhancement capabilities. Au nanoparticles have been employed as intracellular probes facilitating cancer detection in blood plasma,19,20 cells,21,22 and tissues.23,24
The SERS spectra of cells are derived from cells themselves and no external label is required.25 SERS spectra of cells can be very complicated, containing spectral information from numerous biological molecules.26 Various types of cells have different SERS spectra due to their different biomolecular composition and structure, thus it can be used as the basis for distinguishing at a single-cell level.27,28 Since the spectral differences are often minute and difficult to identify, multivariate statistical methods such as principal component analysis (PCA) and hierarchical cluster analysis (HCA), have been applied to extract characteristic biochemical information presented in the spectra of different types of cells.10,15,28–31 However, few studies were reported about the application of Au nanoparticles as intracellular SERS probes with multivariate statistical methods for the identification and distinction of NSCLC cells with different subtypes.
In this study, the superparamagnetic Au nanoshells with strong NIR SERS effect were used as intracellular SERS nanoprobes, so that we can get strong signals from cells with lower laser power and less photo-damage. We combine SERS spectroscopy and multivariate statistical methods including HCA, PCA and support vector machines (SVM) to identify and distinguish three closely related NSCLC cell types (A549, H1299 and H460) and leukocytes. We have shown successful segregation of different types of cells using SVM analysis at a single-cell level, while HCA and PCA approach are more difficult to realize. A SVM classification model was built and tested for four independent cell types and mixture samples (two subtypes of NSCLC). The high prediction accuracies indicate that SERS spectra in combination with the SVM method can be a highly sensitive method for the distinction of NSCLC cells with different subtypes. Furthermore, if this method is combined with bio-chips, it will have potential to capture, detect and classify circulating tumor cells.
Following the incubation, the cultured cells and leukocytes were washed extensively with phosphate buffered saline (PBS) and fixed with 4% paraformaldehyde for 10 min at room temperature. Then the excess paraformaldehyde was removed by deionized water. The leukocytes were transferred to quartz coverslips and air-dried together with other cultured cells for SERS measurements. For the mixed cell experiments, A549 were pre-incubated with Au nanoshells, then they were trypsinized and separated by an external magnetic field (Magical Trapper, Toyobo, nearly 200 mT) for 2 min after excess Au nanoshells were removed. For Raman experiments, the Au nanoshells labeled A549 were mixed with blank H1299 at two different ratios of 1:
1 and 1
:
9; then they were cultured on the coverslips and fixed with 4% paraformaldehyde.
For PCA the same dataset with same pre-process was used. PCA was used to highlight the major variability existing in the spectral dataset. In this study, the PCA model was also calculated within the spectral region 550–1800 cm−1. Then PCA was used to reduce the dimension of the dataset and the first 7 principal components (PCs) described 95.01% of the variance of the dataset.
SVM with a linear kernel was used to build a differentiation model for the four different cell types. SVMs are a set of related methods for supervised learning, applicable to both classification and regression problems.34 Here, all individual spectra from 200 cells (50 leukocytes, 50 H1299, 50 H460, and 50 A549) were background corrected using the Daubechies wavelet transform (10 Daubechies, 7 transform levels, 10 iterations) and vector normalized. SVM algorithm was trained and tested by using leave-one-out cross validation. This procedure was repeated with each omitted spectrum, discriminating each spectrum in turn. Finally, a probability of prediction was calculated and expressed as a sensitivity and specificity for each group. All 200 spectra could be classified correctly, giving a prediction accuracy of 100%.
Fig. 2A shows the bright field images of representative examples of four types of cells (from top to bottom: H460, A549, H1299, and leukocytes) with SERS nanoprobes inside. In the bright field images, we could easily identify the nanoprobes with black colors in the cells. SERS nanoprobes were mainly located in the cytoplasm, and these results are consistent with previous reports about the cellular location of Au nanoparticles.26,36 Three closely related subtypes of NSCLC cells (H460, A549, and H1299) were adherent on the quartz coverslips, so they were slightly stretched. As leukocytes can't adhere to the coverslips, they were directly dropped onto the coverslip and appeared more or less round. On average the leukocytes are slightly smaller in size than the other cells. However, cancer cells have large heterogeneity including both size and morphology, and they are not universally larger than all leukocytes.37,38 Besides that, cells in blood are all suspended and their size difference will become smaller, therefore the size is not a reliable criterion used for classification.38
Averaged SERS spectra of four types of cells are depicted in Fig. 2B. The shaded areas represent the standard deviations of the means. The SERS spectra were obtained with lower laser power (4.5 mW) and shorter integration time (3 s) compared with normal Raman measurements.30 This can greatly reduce the photo-damage of cells, so that the constituents and viabilities of native cells themselves can remain almost unchanged.17 Therefore SERS has great potential to be used in live cell analysis. In this study, SERS spectra derived from specific cells were used to characterize the cellular biochemical composition of each individual type. 200 individual cells of four different types (50 H460, 50 A549, 50 H1299, 50 leukocytes) were tested and all SERS spectra were used for further statistical analysis. For the sake of convenient comparison, the intensity of the averaged SERS spectra for various types of cell was normalized to obtain the relative intensity from 0 to 1.10 The Raman spectra of the cells can act as molecular fingerprints containing information from various cellular components such as DNA, protein, lipids and carbohydrates.39 Molecular components of cells are very complex,21 and typical molecules of interest are often associated with RNA, DNA, carbohydrates, proteins, or lipids. With the help of SERS probes, these biochemical molecules may typically be assessed based on their individual Raman band assignments. By reference to the SERS band assignments in previous reports,10,26,40 we assigned the observed SERS bands in Table 1. The bands at around 747 cm−1 (T), 833 cm−1 (O–P–O backbone stretching), 1342 cm−1 (A, G), 1582 cm−1 (G, A) are assigned to nucleic acids. The bands at around 645 cm−1 (C–C twist in tyrosine), 833 cm−1 (tyrosine), 1002 cm−1 (phenylalanine), 1158 cm−1 (C–C and C–N stretch), 1252 cm−1 (amide III), 1307 cm−1 (C–N stretch), 1342 cm−1 (C–H deformation), 1414 cm−1 (aspartate and glutamate), 1543 cm−1 (tryptophan) are assigned to proteins and amino acids. And the band at around 1252 cm−1 (CH deformation) is assigned to lipids. Although the SERS spectra of different types of cells look very alike at sometimes, they still have differences on some peak positions and peak intensities. In order to better compare and distinguish them, multivariate statistical methods including HCA, PCA and SVM were carried out.
Raman shift (cm−1) | Nucleic acid | Protein | Lipids |
---|---|---|---|
a str, stretching; def, deformation; bk, backbone. | |||
645 | Tyrosine (C–C twist) | ||
747 | T | ||
833 | O–P–O str DNA bka | Tyrosine | |
1002 | Phenylalanine | ||
1158 | C–C str (and C–N str)a | ||
1252 | Amide III | ![]() |
|
1307 | C–N stra | ||
1342 | A, G | C–H defa | |
1414 | Aspartate, glutamate | ||
1543 | Tryptophan | ||
1582 | G, A |
HCA is a kind of cluster analysis method which aims to build a hierarchy of clusters in data mining and statistics,34 and this method has been successfully used in identification and differentiation of breast cancer cells, leukaemia cells and leukocytes with normal Raman measurements.26 In our study, HCA method was also used to form clusters according to the cell type. Euclidean distance method and Ward's algorithm were used to perform HCA and the spectra in the region 550–1800 cm−1 were vector normalized and averaged.26 Fig. 3 shows the dendrogram for all 200 cells. And we can see two well-separated major clusters: one for leukocytes and one for NSCLC cells. Within the NSCLC cell cluster, each cell type forms its own sub-cluster with some misclassifications. We can conclude from the dendogram that although vast majority of leukocytes were well separated, there were some misclassifications among the NSCLC cells. So the clustering result from the HCA can only separate leukocytes from lung cancer cells but cannot distinguish different lung cancer cells well because they are too closely related. The clustering result is similar to the previous study about lung cancer diagnosing.41 In that study, two large clusters were clearly visible in the dendrogram, but no sub-clustering was evident for different subtypes of lung cancer cells and even no cluster patterns emerged. The possible reasons are small number of samples and great similarity between the samples.41 Thus we speculate that HCA can achieve good results when distinguishing samples without high similarity, but several misclassifications may occur when distinguishing closely similar samples.
Next, we employed the PCA method to evaluate the possibility of distinguishing different types of NSCLC cells and leukocytes. PCA can select less number of significant components through linear transformation of multiple variables. As a classic analysis method, PCA has been widely used in multivariate statistical analysis for spectra.42,43 Herein, the spectral differences in the data sets comprising 200 spectra of four types of cells within the same spectral region from 550 to 1800 cm−1 (consisting of 1910 data points) were analyzed by PCA. First, we can roughly distinguish three types of cells (H1299, H460, leukocytes) using the PCA method with some overlaps between the H460 and leukocytes (see Fig. S1†). Fig. S1† shows the three-dimensional plot using the first, second and third principal components (PC1, PC2, and PC3). Different cell types are approximately separated as indicated by spectra assemblies with different colors and shapes. However, when it turned to four types of cells (A549, H1299, H460, leukocytes) that the distinguish effect became much worse (Fig. 4). In Fig. 4, using principal component scores PC1 (83.88% variation), PC2 (5.95% variation) and PC3 (1.83% variation), the scatter plots of SERS spectra for each cell line were projected into the two-dimensional and three-dimensional images. We can see that the SERS spectra are mainly divided into two groups, one is leukocytes and the other is NSCLC cells, which is caused by the intrinsic difference in biomolecular composition and concentration between normal and cancer cells. This result is consistent with the clustering result from the HCA as shown in Fig. 3. However, the plots groups of NSCLC cells are adjacent to each other and some even were overlapped with each other, which means they cannot be separated well. This result is similar to the previous studies.26,31 In those studies, PCA only separated normal cells from tumor cells but did not distinguish different tumor cells well. That means on high dimensional data of complex samples, especially when more samples are involved and the data have ambiguous distribution of noise features, PCA often cannot achieve good separation results.43
After the failure to discriminate subtypes of NSCLC cells using HCA and PCA method, SVM with a linear kernel were fed with this SERS dataset to create a classification model in order to find a more suitable classification method. Fig. 5 shows the analysis result using three out of the six decision values. It can be seen that four cell types are successfully separated as indicated by spectra assemblies with different colors and shapes. 200 spectra were used to train the SVM model with leave-one-out cross validation and all spectra could be classified correctly, giving an overall accuracy of 100% on a single-cell level. The details of the prediction accuracy of the classification model are presented in Table S1.†
![]() | ||
Fig. 5 Graphical representation of the support vector machines with linear kernel for classification of four cell types. Three out of six decision values are plotted. |
Although the SVM model with 100% accuracy was built, these cells were all involved in building this model. It will be more practical if this model can predict some “unknown” cells which were not involved in building this model but within these four cell types. To further demonstrate the accuracy and practicability of this SVM model, this trained and validated SVM model was used to predict and classify a test set (including additional 80 cells: 20 A549, 20 H1299, 20 H460, 20 leukocytes). It should be mentioned that none of the cells was included in building the SVM model. The detailed predicted results for the individual cell types are shown in Table 2. A predicted accuracy of 88.75% can be achieved on a single-cell level. If we want a better distinguish effect, more samples are needed both for training and testing the model.44 For clinical purpose, the most relevant question is to identify the cancer cells from body's own healthy leukocytes.45 For this purpose the SVM model trained with spectra showed very good characteristics: all NSCLC cells were correctly identified as cancer cells with 100% accuracy.
Predicted labels | |||||
---|---|---|---|---|---|
Leukocytes | A549 | H1299 | H460 | ||
True labels | Leukocytes | 18 | 0 | 0 | 2 |
A549 | 0 | 18 | 0 | 2 | |
H1299 | 0 | 2 | 18 | 0 | |
H460 | 0 | 2 | 1 | 17 |
We further evaluated the performance of this SVM model when there were more than 1 subtype of NSCLC cells in the sample. The mixtures of two subtypes of NSCLC cell (A549 and H1299) with different ratios were used as the test sets. It should be emphasized again that none of these cells was included in building the SVM model. Although fluorescent biomarkers can help to distinguish cells in the mixture, they may also produce strong Raman background in the SERS measurements. In order to verify the accuracy of the SVM model in mixed samples, we mixed SERS probes labeled A549 and blank H1299 at different ratios, thus we can recognize the cell type directly from the bright-field images.
Fig. 6 shows the bright-field images from the mixtures of two subtypes of cell, where A549 can be recognized by SERS probes in the cells. Fig. 6A represents the group in which A549 and H1299 were mixed at a ratio of 1:
1 (A549 indicated by red arrows and H1299 by white arrows). Since the normal Raman signals of H1299 cells (no SERS probes inside) were very weak, we only detect the signal of A549 cells in mixed samples. Table 3A showed the predicted results based on the SVM model and an accuracy of 95% can be achieved on a single-cell level. Fig. 6B represents the group in which A549 and H1299 were mixed at a ratio of 1
:
9, and a prediction accuracy of 100% can be achieved for A549 on a single-cell level (Table 3B). These results indicate the SVM model significantly improves the accuracy of prediction and classification of subtype of NSCLC cells in contrast to the HCA and PCA analysis methods.
![]() | ||
Fig. 6 Bright-field images of the mixed two subtypes of NSCLC cells cultured on quartz coverslips. The ratio of A549![]() ![]() ![]() ![]() ![]() ![]() |
Predicted labels | |||
---|---|---|---|
A549 | H1299 | ||
A | |||
True labels | A549 | 19 | 1 |
![]() |
|||
B | |||
True labels | A549 | 5 | 0 |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ra21758j |
This journal is © The Royal Society of Chemistry 2016 |