Lennard M.
Wurm
ab,
Björn
Fischer
cd,
Volker
Neuschmelting
b,
David
Reinecke
b,
Igor
Fischer
a,
Roland S.
Croner
e,
Roland
Goldbrunner
b,
Michael C.
Hacker
c,
Jakub
Dybaś†
f and
Ulf D.
Kahlert†
*e
aDepartment of Neurosurgery, University Hospital Düsseldorf and Medical Faculty Heinrich-Heine University, Düsseldorf, Germany
bDepartment of Neurosurgery, University Hospital Cologne, Cologne, Germany
cInstitute of Pharmaceutics and Biopharmaceutics, University of Düsseldorf, Düsseldorf, Germany
dFISCHER GmbH, Raman Spectroscopic Services, 40667 Meerbusch, Germany
eClinic of General- Visceral-, Vascular and Transplantation Surgery, Department of Molecular and Experimental Surgery, University Hospital Magdeburg and Medical Faculty Otto-von-Guericke University, Magdeburg, Germany. E-mail: Ulf.Kahlert@med.ovgu.de
fJagiellonian Center for Experimental Therapeutics, Jagiellonian University, Krakow, Poland
First published on 6th November 2023
Label-free identification of tumor cells using spectroscopic assays has emerged as a technological innovation with a proven ability for rapid implementation in clinical care. Machine learning facilitates the optimization of processing and interpretation of extensive data, such as various spectroscopy data obtained from surgical samples. The here-described preclinical work investigates the potential of machine learning algorithms combining confocal Raman spectroscopy to distinguish non-differentiated glioblastoma cells and their respective isogenic differentiated phenotype by means of confocal ultra-rapid measurements. For this purpose, we measured and correlated modalities of 1146 intracellular single-point measurements and sustainingly clustered cell components to predict tumor stem cell existence. By further narrowing a few selected peaks, we found indicative evidence that using our computational imaging technology is a powerful approach to detect tumor stem cells in vitro with an accuracy of 91.7% in distinct cell compartments, mainly because of greater lipid content and putative different protein structures. We also demonstrate that the presented technology can overcome intra- and intertumoral cellular heterogeneity of our disease models, verifying the elevated physiological relevance of our applied disease modeling technology despite intracellular noise limitations for future translational evaluation.
Cancer-associated deaths are one of the leading global health problems affecting all levels of society, gender, and ethnicity.3 Over the last decades, research has revealed that the occurrence, progression, and regrowth of malignant cancers and their metastatic offspring are promoted by tumor cells with stem cell properties, so-called cancer stem cells (CSC).4,5 Ample published evidence exists, that targeting CSCs will help to improve the clinical care of cancer patients.6,7 However, the clinical translation of anti-CSC directed therapies or diagnostics is lagging, primarily due to hurdles regarding specificity, side effects, and effectivity of relevant preclinical discoveries in human application.8,9 In our project, we chose glioblastoma (GBM), the most frequently occurring and aggressive type of primary brain tumor in adults, as our biological model for an unmet clinical need. Current routine clinical treatment of GBM patients is still challenging and features surgical resection of tumor mass as diagnosed by anatomical or metabolic imaging, followed by an adjuvant combination of classical DNA alkylating chemotherapy and multi-cyclic radiation therapy.10 The more accurate and complete the surgical resection is performed – as diagnosed with current clinical imaging and morphological procedures – the better the overall survival time of the patient.
As a consequence of exogenous stress, such as limitations in oxygen or nutrient supply in response to the massive cell growth of GBM parenchyma, a population of GBM stem cells (GSCs) undergoes mesenchymal trans-differentiation, which results in augmented invasive potential of the cells to escape the rate-limiting microenvironment, leading to the fatal infiltrative growth pattern of the disease as also GSCs eluding from neurosurgical resection in the sub- and periventricular zone.11,12 Imaging technologies that can detect invaded GSC residing in brain tissue on a cellular level would provide the basis for improved therapeutic strategies. One approach that has shown promising potential to allow such detection is using label-free spectroscopic methods such as Raman spectroscopy. Raman spectroscopy has been established as a rapid, label-free alternative to the more common but time-consuming neuropathological examination of neurosurgical specimen.13–15 Monochromatic light is directed onto a sample, resulting in inelastic scattering that provides information about the molecular binding structure of biological samples. A Raman spectroscopy fingerprint can ultimately identify a biological phenotype.16 It can potentially be used as a highly repetitive method for intraoperative decision-making and neurosurgical guidance with high accuracy compared to other, likewise rapid approaches.17–22 Due to the large amount of information within each spectrum, further processing methods of the spectra are applied, including multivariate statistics and machine learning.23–25 These approaches enabled the correlation of spectroscopic characteristics to diverse diagnostic and biological features as well as to cell populations, illustrated in Fig. 1.25–29 Recent work indicates that processing of large spectroscopic data by machine learning algorithms to generate user-friendly interpretations of the signals supports the dissemination potentiation of imaging in life science and clinical use. Ascending numbers of studies successfully follow this approach using either small specimen counts but also proving its ability on data retrieved from whole tissue sections.30–32
Here we evaluate a machine learning-assisted classification approach to identify GSCs using a collection of recently described in vitro models, that resemble various different neuropathological relevant, genetic markers of primary tumors.33 Especially heterogenetic GSCs remain underestimated in clinical diagnostic routines due to missing evidence and methodology. Our study is intended to further establish Raman technique for consideration of cancer cell heterogeneity and usage in neurosurgical decision-making.
For sample preparation, cells were washed twice with phosphate buffer to remove fluorescence from the pH indicator. Subsequently, 80 μL of the cell solution stored on ice was pipetted onto a calcium fluoride substrate (Korth Kristalle, CaF2 Raman grade optically polished)35 and the microscope objective was dipped into the drop. Three single-point measurements were performed on each cell at randomized positions. Using a laser power of 20 mW, the exposure time per spectrum was set to 20 s (10 × 2 s accumulated).
Raman reference images of the cross-section of the cells were acquired with 20 mW laser power. Areas of 20 × 20 μm were scanned with a spatial resolution of 500 nm. The exposure time per spectrum was set to 1.0 s. The collection of Raman spectra and compilation of Raman images were conducted using WITec FIVE software (version 5.3.10.102, WITec, Ulm, Germany).
Condition | Total samples (instances) | GSC count | DGC count |
---|---|---|---|
Cluster 1 | 600 | 287 | 313 |
Cluster 2 | 67 | 31 | 36 |
Cluster 3 | 406 | 201 | 205 |
Cluster 4 | 73 | 38 | 35 |
HSR-GBM1 | 229 | 113 | 116 |
JHH520 | 222 | 108 | 114 |
NCH644 | 265 | 128 | 137 |
BTSC-233 | 235 | 101 | 134 |
BTSC-407 | 195 | 107 | 88 |
Data was cut onto fingerprint region from 400–1800 rel. 1 cm−1. Spectra which contained strong fluorescence or cosmic spikes were excluded during measurements or afterward, if applicable. Baseline correction was performed using Rubber band38 to remove background fluorescence. In addition, Vector Normalization was applied. After preprocessing, 1146 single-point measurements remained, which were measured on six different days.
To reconstruct origins of the contributing peaks in the figures, a negative second derivative was calculated by Savitzky–Golay filter (window = 9, polynomial order = 3) which counteracts the derivative-induced noise enhancement.
Results were gained by machine learning algorithms predicting the PCA-transformed testing set. Machine Learning algorithms included Artificial Neural Network (ANN), Support Vector Machine (SVM), Tree Classification, Random Forrest, AdaBoost, Gradient Boosting, k Nearest Neighbor, Stochastic Gradient Descending, Naive Bayes, and Logistic Regression.
The ANN was set up as an MLPClassifier using the scikit-learn library and the Relu activation function. The quantity of hidden layers fluctuated for each classification task, and for overall classification purposes, there were two layers containing 100 and 50 neurons each. In addition to fine-tuning the neural network architecture, we meticulously optimized the hyperparameters of the other algorithms through cross-validation using a grid search approach. In the case of SVM, we primarily leveraged the radial kernel (RBF) while experimenting with different combinations of C, epsilon, and gamma pairs within the range of C values between 0.1 and 10. Concerning the tree classifications, including tree classification, RandomForest, Gradient Boosting, and AdaBoost, we adhered closely to the default parameters. However, for the forest-based algorithms, we maintained a consistently high number of trees/estimators exceeding 1000, while significantly varying the tree depth (excluding AdaBoost of course) and the number of attributes considered at each split. Additionally, we adjusted the learning rate for Gradient Boosting. Nevertheless, modifying these parameters had little effect on the outcomes. Ridge Regularization (L2) and C-values ranging from 0.001 to 1000 were employed for Logistic Regression. Stochastic Gradient Descending is considered to be a form of model training rather than a machine learning algorithm. In our study, we employed the SGDClassifier from scikit-learn, which can be interpreted as a linear SVM and Perceptron classifier with SGD training. Furthermore, Elastic Net Regularization was utilized here.
Clustering of cells was performed using k-means (initialized with KMeans++, 300 maximum iterations, and 10 re-runs) with respect to the silhouette plot as also biological assignment and thus a grouping of k = 4.
Distribution of peaks in cluster 1 correlate to the cell nucleus: the range of 600–800 cm−1 can be associated with the ring stretching vibrations of DNA/RNA.39 The wave numbers 783 cm−1 and 825 cm−1 are typical peaks derived from nucleic acids, their pyrimidine rings, and asymmetrical PO2 double bonds.40 Peak 1087 cm−1 is assigned to symmetric stretching of phosphate.40 1583 cm−1 represents the CC stretching of purine.39 The findings indicate that the spectra are derived from an area with a high DNA/RNA concentration, as expected in the cell nucleus.
Cluster 2 is characterized by the presence of intense Raman peaks at the following positions: 747, 1127, 1312, and 1584 cm−1. These bands can be attributed to cytochrome c,41 suggesting that cluster 2 can be identified as rich in mitochondria.
Clusters 3 seems related to cluster 1, while cluster 3 has higher fatty acid and lipid peaks which can be assigned to C–N bonds in membrane phospholipids.42 Presumably, cluster 3 contains a higher membrane content. In addition to portions of the cell membrane, the distribution pattern of cluster 3 shows water content and correlation with proteins of cytoplasmic origin.43
Cluster 4 has increased values at 863, 1071, 1263, 1301, 1439, 1656, and 1745 cm−1. Peaks in the area of 1000–1200 cm−1 and bands around 1301 cm−1 and 1439 cm−1 are known to originate from fatty acids.43 Bands at 1263, 1301, 1439, and 1656 cm−1 are distinctive of diverse contents attributed to lipids.39 The peak at the wavenumber 1745 cm−1 can be attributed to CO stretching found in the ester group of lipids and phospholipids.44 It is likely that cluster 4 is measured within a fatty cell component with high lipid concentrations like lipid droplets or lipid vesicles. We further guided our findings through reference cell imaging as shown in Fig. 6. As reference, a sketch of cell compartments is shown in Fig. 6a with assignment of cluster 1 to the nucleus, cluster 2 to mitochondria, cluster 3 to the cytoplasm with different densities of biological mass and membrane involvement and cluster 4 to a lipid-rich cell organelle, like lipid vesicles or lipid droplets. The color-coded integral of the Raman intensity (400–3700 cm−1) with a Raman spectroscopic resolution of 500 nm across a 20 μm^2 area illustrates measurements of a BTSC-233 GSC cell (Fig. 6b). K-means clustering (k = 5 including background) illustrates comparable subgroups to the clustering of single-point measurements (Fig. 6c). Fig. 6d and e shows a treated HSR-GBM1 DGC counterpart and its Raman intensity (400–3700 cm−1) and comparable clustering by k-means. The mean spectra of each cluster in treated and untreated conditions are shown in Fig. 6f (untreated) and Fig. 6g (treated) serving as references for our previous endeavoured cell organelle classification.
Primary areas of GSCs in cluster 1 can be described in the areas with the most prominent peaks at 783, 1093, 1307, 1334, 1375, 1485, 1490 and 1576 cm−1. DGCs are characterized most distinguishably by dominant signals at 953, 1001, 1030 cm−1 and 1670 cm−1.
The main differences in cluster 2 predominant in GSCs are around 936, 1110, 1333, 1382 and 1653 cm−1, DGCs show highest peaks at 693, 747, 1174, 1311 and 1592 cm−1.
In cluster 3, the main areas of the GSC fraction are 1050 and 1435 cm−1. The main areas of the DGC fraction are leading at 786, 1256, 1371, 1482 and 1577 cm−1.
Cluster 4 contains GSC leading areas around 1261, 1438 and especially 1657 cm−1 has a high intensity. DGC dominating areas can be observed at 1001, 1128, 1337, 1579 and 1680 cm−1.
Machine learning algorithms perform superior according to Table 2. An overall accuracy of 60.3% can be achieved by k Nearest-Neighbors algorithm (Table 2a). Best predictions within each cell component are 57.3% in cluster 1/nucleus by k Nearest-Neighbors, 72.7% in cluster 2/mitochondria by AdaBoost and Tree Classification, 63.1% in cluster 3/cytoplasm by Naive Bayes and 91.7% in cluster 4/lipid droplets by Stochastic Gradient Descending (Table 2b).
Algorithms | Accuracy (%)a | Precision (%) | Recall (%) | Accuracy of cluster 1 (%)b | Accuracy of cluster 2 (%) | Accuracy of cluster 3 (%) | Accuracy of cluster 4 (%) | Accuracy of HSR-GBM1 (%)c | Accuracy of JHH520 (%) | Accuracy of NCH644 (%) | Accuracy of BTSC-407 (%) | Accuracy of BTSC-233 (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
a Accuracy (main target), precision, and recall of GSC prediction. b GSC classification, accuracy of each cluster/cell compartment. c GSC classification, accuracy of each cell line. | ||||||||||||
Artificial Neural Network | 54.9 | 55.7 | 54.9 | 51 | 63.6 | 58.5 | 66.7 | 62.2 | 58.3 | 51.2 | 59.4 | 65.8 |
Support Vector Machine | 51.6 | 50.3 | 51.6 | 55.2 | 63.6 | 56.9 | 66.7 | 56.8 | 52.8 | 58.1 | 62.5 | 57.9 |
Stochastic Gradient Descending | 47.8 | 51.4 | 47.8 | 47.9 | 63.6 | 53.8 | 91.7 | 62.2 | 55.6 | 55.8 | 59.4 | 52.6 |
k Nearest Neighbour | 60.3 | 62.9 | 60.3 | 57.3 | 63.6 | 58.5 | 33.3 | 59.5 | 63.9 | 58.1 | 71.9 | 52.6 |
Logistic Regression | 47.8 | 51.2 | 47.8 | 45.8 | 54.5 | 52.3 | 83.3 | 59.5 | 55.6 | 55.8 | 56.2 | 52.6 |
Gradient Boosting | 50.5 | 52.9 | 50.5 | 47.9 | 18.2 | 60 | 50 | 54.1 | 36.1 | 58.1 | 62.5 | 63.2 |
AdaBoost | 56 | 59.1 | 56 | 49 | 72.7 | 56.9 | 41.7 | 48.6 | 41.7 | 48.8 | 65.6 | 60.5 |
Random Forest | 53.3 | 56.8 | 53.3 | 50 | 54.5 | 52.3 | 58.3 | 45.9 | 52.8 | 67.4 | 71.9 | 55.3 |
Tree | 54.9 | 57.2 | 54.9 | 50 | 72.7 | 41.5 | 58.3 | 59.5 | 63.9 | 53.5 | 34.4 | 52.6 |
Naive Bayes | 48.4 | 49.7 | 48.4 | 52.6 | 63.6 | 63.1 | 75 | 73 | 55.6 | 62.8 | 62.5 | 42.1 |
In cell lines, the best predictions are 73% in HSR-GBM1 by Naive Bayes, 63.9% in JHH520 by k Nearest-Neighbors, 67.4% in NCH644 by Random Forrest, 71.9% in BTSC-407 by Random Forrest and 63.2% in BTSC-233 by Gradient Boosting (Table 2c).
As we are aiming to use the algorithms in surgical practice on larger areas of tissue, the time of acquisition of data that might be defining of GSC residence must be as short as possible. Time can be saved significantly by narrowing the focus of data acquisition to a few selected peaks instead of the entire spectrum acquisition. Considering the described results on putative biological stratification and scaling of the algorithms, we performed a feature selection of distinct peaks from our previous analysis and biological review for the most promising cell compartments. Selecting the top five LDA scaling values, as seen in Fig. 7a, achieves 65.3% accuracy by ANN (Precision 0.681, Recall 0.653) in cluster 2. Narrowing the number of peaks in lipid organelles while still using PCA achieves likewise results of 91.7% by ANN. Selection of up to 10 out of all leading lipid-rich peaks results in a maximum accuracy of 69.4% by SVM (Precision 0.708, Recall 0.694), selection of single peaks achieves an accuracy of up to 63% for 1001 cm−1. Interestingly 1001 cm−1 (associated with tryptophan) scores 80% (Precision 0.81; Recall 0.80) by Tree classification in cluster 2 (cytochrome c-rich compartment).
We hypothesize these statements based on the fact that very recently, machine learning- and deep learning-supported Raman spectroscopy diagnostics has innovated a wide range of sectors in biology and medicine, such as food quality control50,51 or pathogen analytics52 incl. the establishment of a rapid SARS-CoV2 diagnostics that circumvents the long waiting times for the results when using amplification-based PCR tests, meanwhile maintaining high specificity and sensitivity of the test.53 Regarding cancer, machine learning-assisted Raman spectroscopy diagnostics was recently shown to allow the stratification of tumors that are resistant to immune therapy,54 raising hopes to improve the economic effectivity of these revolutionary but certainly very cost-intense cancer therapies. Moreover, machine learning Raman spectroscopy can also classify neural differentiation stages of human induced pluripotent stem cells55 or neural stem cells.25
Based on our experimental setup, we focus our discussion on the clustered cell organelles as well as a few novel selected peaks that we found of most relevance and interest due to the amplitude of differences between GSC and DSC and the previous work of others that associate those signals with biological processes.
In general, GSCs have prevailing areas indicating mainly a greater lipid concentration and different protein structure of amide I and III: 1304 cm−1 is assigned to the CH2 deformation of lipids, 1435–1439 cm−1 is attributed to lipids CH2 scissoring, CH2 bending mode, and CH2 deformation. 1447 cm−1 is a common peak of lipids, fatty acids, phospholipids, and methylene groupings such as CH2 and CH3. 1650–1658 cm−1 is typical for the v(CC) stretch of lipids and fatty acids. It is also assigned to a-helix structure and amide I. 1550 cm−1 can be further described by tryptophan and NADH.42 Significantly higher peaks of DGCs at 407 and 411 cm−1 can be attributed to saccharide.55,56
Savitzky–Golay filtered negative second-derivative confirms noticeable differences by finding an increase of 1207 cm−1 and 1250 cm−1 in DGCs, which complies with the ring breathing mode of hydroxyproline, tryptophan, phenylalanine, adenosine and tyrosine, as also guanine, cytosine and amide III. 1745 cm−1 is increased in GSCs and assigned to the CO stretch of lipid esters, triglycerides, and phospholipids.42
Regarding machine learning classification, main scaling values of LDA can be correlated to protein (963, 1591 cm−1), especially amide 1 (1655, 1685 cm−1), ribose (1018 cm−1), nucleic acids (1655 cm−1) and lipids (1655 cm−1), which follows our findings. Besides confocal measurements, our machine learning algorithms faced the challenge of heterogenic cell lines. Considering this limitation, a general prediction of 60.3% across all organelles is the result of overfitting. This is underlined by the inability of the most promising algorithms like SVM and ANN, which are particularly vulnerable to overfitting despite parameter optimization.
To accentuate the impact and challenge towards the biological limitation in GBM, we would like to discuss our cell line response: regarding tumor heterogeneity, we found that the described cell line transcriptomes of our previous studies and the spectral profiles share affinities, as well as a strong correlation with the cell line response to BMP4 as published earlier by our laboratory.34 Exemplary, the more responsive 407p can be more accurately classified with 71.9% compared to JHH520 with 63.9%.
As in compliance with our measurement setup, we overcome limitation of confocal measurements by automatic cell organelle classification. This further allows a detailed description of differences in GSCs and DGCs in each compartment: it has been described that cluster 1 has its origin in the nucleus while cluster 3 showed similarities and was described to have a cytoplasmic origin due to the water distribution and absence of characteristic peaks, as also partly membrane origin. GSCs seem to have higher content of nucleic acids in cluster 1, because of higher areas at 784, 1483–1491 and 1575–1579 cm−1. On the other hand, DGCs have higher nucleic acid content in cluster 3 in similar areas at 784, 1371, 1487, 1575–1583 cm−1. GSCs also have higher amounts of lipids in cluster 1 around 1375 cm−1, and CH2 groups around 1431–1447 cm−1 in cluster 3. DGCs are leading in phenylalanine in cluster 1 at 1001–1005 and in cluster 3 at 1583 cm−1. They also show higher amide I bands at 1665–1573 cm−1 in cluster 1. Thus, we confirmed the former findings by identifying higher lipid quantity in GSCs and showed larger amide I bands in DGCs cytoplasm.39 Additionally, the shift of nucleic acids from cluster 1 to cluster 3 during differentiation could be a potential conduct of GSCs. The propagation of nucleic acids in cluster 3, an estimated membrane content like rough endoplasmic reticulum, could be debated as increased transcription because of higher RNA content near the ribosomes. Because nucleic acids were identified as a main target for stem cell identification in previous findings, further research is needed to reveal deeper insights into the changes in nucleic acids in stem cells.57 We find main intensities in GSCs at 1439 cm−1 and 1650–1662 cm−1 in the liposomal cluster 4. These areas are described as lipids, CH2, CH3, v(CC) cis of phospholipids, triglycerides, cholesterol band, ceramide backbone, and C
C groups of unsaturated fatty acids. The quotient of the peak intensities at 1656 cm−1/1444 cm−1 is often used as an approximation of the unsaturation degree in fatty acids. In this cluster, GSCs have a 3.31% higher amount of unsaturated lipids.43,58
After clustering, we extracted great results of 91.7% in lipid organelles, which we discussed as lipid droplets. This matches the reports published about lipid droplets in stem cells being differently configured than their differentiated progeny.45,59–61 One inference for the obtained favourable performance against the limited quantity of measurements may be the very intense Raman signal of the lipid bands, as well as the biological valence of these organelles. Regarding clinical usage, an intracellular measurement inside a lipid organelle would lead to a confident hit above 90%. Building on this potential, we strongly recommend that this structure be verified and subjected to more rigorous analysis in follow-up work. Expanding on these foundations, additional follow-up work could elaborate on our features to improve classification success. This could be achieved through examining feature importances from random forest and XgBoost, the coefficient values from linear SVC or employing penalised logistic regression (such as LASSO or elastic net penalty) to reduce the number of necessary features for modelling purposes.
Importantly, our data also suggests the realization of GSC detection of selected stand-alone markers as described in results, which could give a huge opportunity for intraoperative application. We speculate that the selection of 4–5 peaks out of over 1400 peaks in general or with focus on enhanced cytochrome signals and lipid bands will give a significant reduction of time for recording compared to total spectra recording allowing the rapid scanning of various cm of length, feasible to be conducted in an intraoperative setting.
From a surgical standpoint of view, our study has limitations, when it comes to speculating on the clinical potential of our results per se. Firstly, although our applied disease model systems have recently been shown to present a solid basis for repeatable research and recapitulate core molecular parameters of patient tumors,33 facts which we find a fundamental basis to argue any potential translational relevance of our in vitro data, the entire data was generated in pure experimental conditions. Confirmatory studies on fresh tumor material, whether upon short-term in vitro processing or direct tumor resection specimens, are needed to verify our hypothesis. Since we envision to use hand hold Raman devices,62 we do not think ex vivo applications such as simulated with imaging on xenograft in vivo models of cancers are relevant to benchmark our assay for its applicability. Secondly, the Raman microscope applied is a high-end instrument purchased primarily to perform highly sensitive spatially resolved analyses in materials science. It requires verification if our Raman signals can be detected equally with a putatively transferable system in the operation room.
Footnote |
† These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2023 |