Qiang
Chen‡
a,
Tao
Shi‡
a,
Dan
Du
b,
Bo
Wang
b,
Sha
Zhao
b,
Yang
Gao
a,
Shuang
Wang
c and
Zhanqin
Zhang
*b
aDepartment of Cardiovascular Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
bDepartment of Anesthesiology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China. E-mail: zhangzhanqin12@sina.com
cInstitute of Photonics and Photon-Technology, Northwest University, Xi'an, China
First published on 14th April 2023
The symptoms of cardiac myxoma (CM) mainly occur when the tumor is growing, and the diagnosis is determined by clinical presentation. Unfortunately, there is no evidence that specific blood tests are useful in CM diagnosis. Raman spectroscopy (RS) has emerged as a promising auxiliary diagnostic tool because of its ability to simultaneously detect multiple molecular features without labelling. The objective of this study was to identify spectral markers for CM, one of the most common benign cardiac tumors with insidious onset and rapid progression. In this study, a preliminary analysis was conducted based on serum Raman spectra to obtain the spectral differences between CM patients (CM group) and healthy control subjects (normal group). Principal component analysis-linear discriminant analysis (PCA-LDA) was constructed to highlight the differences in the distribution of biochemical components among the groups according to the obtained spectral information. Principal component analysis was combined with a support vector machine model (PCA-SVM) based on three different kernel functions (linear, polynomial, and Gaussian radial basis function (RBF)) to resolve spectral variations between all study groups. The results showed that CM patients had lower serum levels of phenylalanine and carotenoid than those in the normal group, and increased levels of fatty acids. The resulting Raman data was used in a multivariate analysis to determine the Raman range that could be used for CM diagnosis. Also, the chemical interpretation of the spectral results obtained is further presented in the discussion section based on the multivariate curve resolution-alternating least squares (MCR-ALS) method. These results suggest that RS can be used as an adjunct and promising tool for CM diagnosis, and that vibrations in the fingerprint region can be used as spectral markers for the disease under study.
Raman spectroscopy is a non-destructive and non-invasive technique to assess the chemical composition of a sample based on the physical mechanism of Raman scattering generated by rotational and vibrational modes of molecular bonds. Therefore, the shift of various biomolecules (proteins, nucleic acids, lipids, and carbohydrates, among others) can provide detailed information about chemical composition at the molecular level, which has shown excellent potential for applications in disease diagnosis and progression monitoring.9 For instance, Claudio Durastanti et al. successfully distinguished diagnostically between samples taken from tumor cells and healthy cells based on Raman spectroscopy combined with multivariate statistical analysis.10 Furthermore, in the case of the early detection of Alzheimer's disease by using body fluids, Ralbovsky et al. used Raman spectroscopy combined with machine learning to diagnose saliva and serum as having potential biomarkers for the diagnosis of early Alzheimer's disease.11,12 Raman spectroscopy thus allows for the acquisition of biochemical information about biological macromolecules, cells, tissues and organs, as well as the analysis of bodily fluids such as urine, saliva, blood and tears to reveal the relationship between the changing characteristics of the Raman spectra and clinical conditions in patients.13–16 The capability of non-destructive detection and rapid analysis based on confocal Raman microspectroscopy enables the application of this technique as an adjunct to the diagnosis and treatment of diseases associated with biological tissues or bodily fluids.
Despite the potential of confocal Raman microspectroscopy for clinical diagnosis, no studies have been reported on the use of serum Raman spectroscopy to diagnose CM. In the present work, Raman spectroscopy was applied to CM diagnosis with the aim of improving patient diagnostic accuracy as an adjunct diagnostic tool. Initially, serum samples were taken from clinical samples of normal and CM patients and Raman spectra were acquired. PCA-LDA16 and PCA-SVM17 models were constructed based on spectral differences and PCA downscaling analysis to optimally separate and determine the diagnostic accuracy of Raman spectral features in CM patients. In addition, a correlation between serum Raman spectra and echocardiographic data in CM patients was similarly demonstrated, and in this manner we provide a detailed understanding of CM spectra based on serum samples to facilitate their clinical application possibilities.
Characteristics | Normal (n = 21) | CM (n = 23) | P-value |
---|---|---|---|
Age | 46.19 ± 15.03 | 51.78 ± 14.43 | 0.22 |
Gender (male/female) | 10/11 | 10/13 | |
Tumor volume, mm3 | — | 42![]() ![]() |
SVM, developed by Vapnik and Burges, is considered superior to traditional linear methods due to its ability to represent nonlinear features in data.23,24 As a result, SVM is attracting increasing interest for the classification of spectral data.25–28 Up to now, SVM detection of CM has not been investigated. Initially, the SVM algorithm was used to identify changes in serum spectral information in the normal and CM patients. SVM separates different types of sample data through an optimal hyperplane and represents the maximum “decision partitioning interface” between each class by introducing a kernel function. It provides balanced classification performance, makes the data more recognizable, and minimizes structural risk by converting the original linear non-separable data vector into a high-dimensional separable feature space. However, multiple physical parameters, including spatial and temporal domains, can lead to complex calculations and inefficiencies in the implementation of SVM algorithms.29,30 Nevertheless, including multiple physical parameters in both spatial and temporal domains would generate complex computations and inefficiencies in the implementation of SVM algorithms.31
To overcome these problems, the Raman spectral data can be reconstructed and reduced by performing PCA, reducing the dimensionality of the data set while retaining the important biochemical components of the spectrum. The most significant principal component information was then used as an input variable to further construct a PCA-SVM model with linear, polynomial and Gaussian radial basis as kernel functions. Performing a feature transformation such as PCA prior to SVM classification of Raman spectra can be used as a denoising step to improve classification performance. PCA-SVM classification models have proven applications in the analysis of Raman spectral data and therefore offer significant classification advantages. At the same time, a grid search method was used to find the best parameters for the SVM model, namely the penalty coefficient (C) and the kernel parameter (γ), and for the PCA-SVM model, a tenfold cross-validation method was used. The PCA-LDA algorithm discriminant model was then constructed to further classify the model, and its sensitivity, specificity and overall accuracy were determined based on the leave-one-out cross-validation (LOOCV) method. Linear discriminant analysis (LDA) is a supervised machine learning technique for classification problems, but LDA is not applicable to situations where the independent variable is much larger than the samples, whereas PCA is applicable. Therefore, PCA is used for dimensionality reduction and then LDA for discriminant analysis, which can simultaneously reduce and discriminate the original data to improve the classification efficiency. In addition, the LDA uses the mean values of each category, which allows the differences between categories in the data to be obtained, and the algorithm is superior, allowing better validation and assessment of the classification performance of serum samples from the normal and CM groups. In addition, LOOCV is a general cross-validation error estimation method, which is an unbiased estimate of the true error rate of the classifier, and the “leave-one-out” method has its own advantages, involving the use of a single observation from the original sample as the validation (test) set and the rest of the observations as the training set. The process is repeated so that each observation in the sample is used once as validation data.32
Furthermore, by applying the alternating least-squares window for multivariate curve resolution from Joaquim's work,33 we analyze and interpret the obtained spectra from a non-mathematical statistical, i.e., chemical, point of view, and obtain important information about the pure compounds in both groups. As an iterative self-modeling method, MCR-ALS does not require prior knowledge of its properties and composition. It can provide a profile that satisfactorily reflects the chemical significance.34 MCR-ALS was performed in the 800–3100 cm−1 band, and all spectral data have been background subtracted and normalized.
Peak shift (cm−1) | Vibrational mode | Major assignments |
---|---|---|
851 | Ring breathing | Tyrosine |
952 | C–C stretching vibration | α-Helix, proline, valine |
1008 | C–C symmetric stretch | Phenylalanine |
1153 | C–C stretch mode | Carotenoids |
1208 | Ring vibration | L-Tryptophan, phenylalanine |
1346 | CH3, CH2 wagging | Tryptophan, adenine, guanine |
1457 | C–H bending | Protein |
1518 | C–C stretch mode | Carotenoids |
1571 | C=C bending vibration | Phenylalanine, acetoacetate |
1663 | C–C stretching vibration | Amide I |
2929 | C–H stretching vibration | Lipids |
3059 | C–H stretching vibration | Lipids |
The intensity of the phenylalanine Raman peak at 1008 cm−1 is slightly reduced in the CM group compared to the normal group.36,37 The peak at 1153 cm−1 was caused by the C–C stretching mode vibration of carotenoids,38 and the same result can be found at 1518 cm−1,38 also attributed to the C–C stretching mode vibration of carotenoids. In addition, the intensity of the peaks at 1153 and 1518 cm−1 in the CM group was significantly decreased compared to the normal group, indicating a sharp decrease in carotenoid content in the CM group, and in addition, the intensity of the Raman peak of carotenoids may indicate that this is the resonance Raman peak. In contrast, the intensity of the Raman peak at 1663 cm−1 is slightly increased compared to the normal group,39 indicating an increased amide content in the CM group. Similar results of stronger Raman peak intensity, also shown in the Raman peak at 2929 cm−1, can be observed,40 with higher intensity of the serum Raman peak in CM group, compared to the normal group, which were attributed to increased protein and fatty acid content. Other prominent Raman bands, such as 952, 1208, 1457 and 1571 cm−1, are associated with protein CH-bond vibrations.41 In order to highlight the changes in the serum spectra due to CM, Raman differential spectra were obtained, as shown in Fig. 1B. The differential spectra show a decrease in the intensity of the Raman bands associated with proteins such as phenylalanine and carotenoids, with significant differences observed at 952, 1008, 1153, 1208 and 1518 cm−1, and these changes are derived from the vibrational modes of the C–H or C–C bond of the protein.37 The most striking observation is the negative feature of carotenoids at two positions, 1153 cm−1 and 1518 cm−1.38 The more negative features in the difference spectra indicated a decrease in carotenoid content in the CM group. On the other hand, as shown in Fig. 1B, tryptophan and protein contents represented by 1346 and 1457 cm−1 showed a weak increase in intensity in the CM group. At 2929 cm−1, there is a significant increase in Raman intensity.40 These indicate that the CM group has an increased content of the aforementioned substances compared to the normal group due to the lesion causing the associated content in the serum.
To investigate whether there is a difference between the mean spectra of the normal and CM groups, a t-test was performed on the mean of 2 independent samples, with the spectral intensity as a continuous variable. The mean spectra of the two groups came from normalized normal and CM groups. Normal tests were performed for each group prior to the independent t-test, and a Mann–Whitney U test was performed for groups with a non-normal test or uneven result variance. The statistical significance of the Raman peak data was evaluated for 30 samples from the randomly selected normal and CM groups with Raman peaks at 1008, 1153, 1457, 1518, 1663 and 2929 cm−1. Violin plots for the randomly selected normal and CM groups show the normalized spectral intensities with intermediate values (solid lines), as shown in Fig. 1C, with p-values less than 0.05 for each of the two groups indicating statistical differences.
![]() | ||
Fig. 2 (A) PCA score plots for the first two principal components of the normal and CM groups. (B) PCA loading spectra for PC1 and PC2 from the normal group and CM group. |
In order to estimate the salient spectral features of the different serum spectra between the normal and CM groups, the first few PCs were used as input variables to LDA to generate a valid spectral discriminative diagnostic model. Fig. 3A and B depict the linear discriminant scores and the ROC curves of the PCA-LDA model for serum spectra based on the PCA-LDA algorithm, respectively. Additionally, the number of classes required for the LDA projection of the resulting two sample groups is dimension-1. As shown in Fig. 3A, the normal group spectrum was distributed on the positive side of the first discriminant function, while the CM group spectrum was distributed on the negative side. As shown in Fig. 3B, the area under the ROC curve for both the normal and CM groups was 0.9349, which in turn quantifies the performance of the differential diagnostic model for serum spectra. A higher degree of convexity of the ROC curve represents better model performance. As shown in Table 3, the LOOCV confusion matrix for the spectral classification of the normal and CM groups is presented. As shown in Table 4, the discriminative model achieved sensitivities of 90% and 81.43% for normal and CM groups, specificities of 81.43% and 90%, and an overall classification accuracy of 85%.
![]() | ||
Fig. 3 (A) A scatter plot representing the linear discriminant scores of the normal group and CM group. (B) ROC curve results for the normal and CM groups of the PCA-LDA model. |
Actual\predicted | Normal | CM |
---|---|---|
Normal group | 45 | 5 |
CM group | 13 | 57 |
Indicators\group | Normal | CM |
---|---|---|
a Overall accuracy: 85%. | ||
Sensitivity | 0.9000 | 0.8143 |
Specificity | 0.8143 | 0.9000 |
Actual\predicted | Normal | CM |
---|---|---|
Normal group | 7 | 3 |
CM group | 0 | 14 |
Indicators\group | Normal | CM |
---|---|---|
a Overall accuracy: 87.5%. | ||
Sensitivity | 0.7000 | 1 |
Specificity | 1 | 0.7000 |
Actual\predicted | Normal | CM |
---|---|---|
Normal group | 7 | 3 |
CM group | 0 | 14 |
Indicators\group | Normal | CM |
---|---|---|
Sensitivity | 0.7000 | 1 |
Specificity | 1 | 0.7000 |
Actual\predicted | Normal | CM |
---|---|---|
Normal group | 10 | 0 |
CM group | 1 | 13 |
Indicators\group | Normal | CM |
---|---|---|
a Overall accuracy: 95.8333%. | ||
Sensitivity | 1 | 0.9286 |
Specificity | 0.9286 | 1 |
In what was presented as an exploratory study, the differences in serum Raman spectra between patients with CM and normal subjects were identified and clarified based on confocal Raman microspectroscopic techniques and multivariate analysis methods. In Fig. 1 and multivariate analysis methods (PCA-LDA and PCA-SVM), the results showed that the serum spectra between patients with CM and normal groups indeed derived from changes in the content of, for example, carotenoids and related proteins, such as the Raman peaks at 1008, 1153, 1208, 1518, 1663 and 2929 cm−1. The marked increase in peak intensity at 2929 cm−1 indicates an increased lipid content in the CM group, and we hypothesize that such an elevated peak may represent an essential and significant feature in the classification of serum spectra of CM patients and healthy individuals. The Raman bands at 2929 cm−1 were derived from fatty acids, and the disruptive effects of fatty acid accumulation in cardiac cells lead to mitochondrial dysfunction,42 and mitochondrial dysfunction may also cause reduced activity of biochemical properties,43 and the phenylalanine content can characterize the level of biochemical activity of the substance,44,45 and the reduced intensity of the Raman peak at 1008 cm−1 may indicate a weaker biochemical activity in the CM group. Another report suggested that changes in serum phenylalanine concentrations were a powerful predictor of the risk of cardiovascular disease.46 The intensity of the carotenoid peaks at 1153 and 1518 cm−1 was lower in the CM group than in the normal group, indicating a significantly lower carotenoid content in the CM group. Such a sharp spectral difference suggested that the carotenoid functional groups between 1145 and 1170 cm−1 and 1506 and 1534 cm−1 could be used as spectral markers to distinguish serum samples of CM patients from those from normal subjects.47 Carotenoids have antioxidant activity and protect the body from reactive nitrogen species (RNS) and reactive oxygen species (ROS) causing damage.48,49 As the main precursor of vitamin A in the human diet, high dietary intakes of carotenoids and their concentrations have been associated with changes in cardiovascular and cerebrovascular diseases and mortality of specific etiologies.50,51 On the other hand, a reduction in carotenoid content (reduced intensity of the Raman peak) may similarly cause increased morbidity and mortality from cardiovascular disease. Our results indicated that fingerprint spectra of proteins such as carotenoids and lipids could be used to discriminate between the serum of CM patients and normal subjects. From an exploratory point of view, as spectral markers of changes in CM patients, specific biochemical pathways should be significantly characterized and highly visible in the Raman spectra, such as the fingerprint spectra mentioned above. Furthermore, PCs were used as input variables for the SVM algorithm to discriminate between the CM and normal group spectra for classification. A high accuracy of 95.83% was achieved when using the RBF non-linear kernel PCA-SVM model (87.5% for both linear and polynomial kernels), as indicated in the entries in Table 4. These results demonstrate that, after optimizing the kernel function parameters of the SVM algorithm, the RBF kernel PCA-SVM model has better applicability and higher accuracy than the linear kernel or polynomial kernel PCA-SVM models for both the CM and normal groups. The aforementioned ideas for dimensionality reduction are particularly crucial in applications where large amounts of data need to be processed quickly, since PCA-based dimensionality reduction analysis for hundreds of spectral variables is drastically reduced to two variables and hence the computational effort is greatly reduced. Moreover, most of the sample source features and the relationships between their variables were nonlinear, and the RBF kernel is exactly a nonlinear kernel.52 Thus, based on Raman spectral analysis and multivariate algorithms (PCA-LDA and PCA-SVM), serum Raman spectral features and variations were obtained and serum spectra from the CM and normal groups were successfully classified.
However, it should be emphasized and taken into account that the results of the PCA-based reduced-dimensional multivariate analysis only express the correlation between the principal axes and that these results were based on a statistical perspective rather than a chemical one.53 Hence, we attempted to use multivariate curve resolved-alternating least squares (MCR-ALS) to discuss the chemical variation in our obtained results and to interpret them from the point of view of alteration in chemical composition and distribution of the main biochemical components to aid in the analysis of the content variation in the spectra with respect to the CM and normal groups. As shown in Fig. 5A and B, based on the principles of MCR-ALS, the mixture of chemical components from serum samples in the CM and normal groups was resolved into individual component contributions, i.e. the MCR-ALS-resolved concentration distribution spectra could be compared with the standards (normal group) to identity the chemical identity (CM group). In the biochemical component concentration distribution profile (Fig. 5A), the first 70 points were randomly selected from the CM group and the last 50 points were randomly selected from the normal group, for a total of 120 points. The biochemical substances represented by group two were higher in the CM group and lower in the normal group. Referring to Fig. 5B, it can be seen that the blue color represents the lipid Raman peak at 2929 cm−1, which indicated that lipids were generally higher in serum samples from the CM group than from the normal group. In addition, the most significant changes were in the biochemical substances represented by component 1. As shown in Fig. 5A, the biochemical components of component 1 were more distributed in the normal group, and the substances represented by component 1 were 1008, 1153 and 1518 cm−1, as shown in the component spectrum of Fig. 5B. The above results were consistent with our spectral analysis and multivariate analysis method, which indicated the feasibility of the differential diagnosis of CM based on multivariate Raman spectroscopy combined with serum Raman spectroscopy.
In summary, we have made an illustrative attempt to analyze the relationship between serum spectral results and the distribution of biochemical components among patients with CM and among metabolisms, and the deeper relationship between serum spectral results and CM should be further investigated and explored in preclinical models. From the fitted results of the Raman bands investigated in this study, the normalized intensity of the measured Raman bands varied significantly. Moreover, based on the obtained Raman spectra, we have extracted from the intensity variations with different groups of Raman spectra, normalized peak area and full width at half maximum (FWHM), to discuss the correlation between CM volumes and Raman spectra. The correlation coefficients between areas and CM volumes (r = −0.37, −0.43, −0.45, 0.45 and 0.31 for area 1 (973–1033 cm−1), area 2 (1119–1187 cm−1), area 4 (1481–1550 cm−1), area 5 (1642–1698 cm−1) and area 6 (2841–3012 cm−1), respectively) were moderated, but the associations were clearly statistically significant (p = 2.14 × 10−3, 0.41 × 10−3, 0.19 × 10−3, 0.17 × 10−3 and 0.01, respectively). Several correlation coefficients between FWHM and CM volumes (r = 0.49, 0.37 and 0.46 for FWHM 1 (973–1033 cm−1), FWHM 2 (1119–1187 cm−1) and FWHM 4 (1481–1550 cm−1)) were moderated, but the associations were also clearly statistically significant (p = 4.05 × 10−5, 2.39 × 10−3 and 0.13 × 10−3, respectively). This analysis suggested that peak area and FWHM might be available factors that are determinants of CM volumes. The relevant results and discussion are given in the ESI (Fig. S1–S3†).
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3ay00180f |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2023 |