Identification of hypermucoviscous Klebsiella pneumoniae strains via untargeted surface-enhanced Raman spectroscopy

Li-Yan Zhang *ab, Jia-Wei Tang b, Ben-Shun Tian b, Yuanhong Huang a, Xiao-Yong Liu a, Yue Zhao b, Xu-Xia Cui b, Xin-Yu Zhang b, Yu-Rong Qin b, Guang-Hua Li *b and Liang Wang *bcde
aLaboratory Medicine, Ganzhou Municipal Hospital, Guangdong Provincial People's Hospital Ganzhou Hospital, Ganzhou, Jiangxi Province, China. E-mail: zhangliyan@gdph.org.cn
bLaboratory Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong Province, China. E-mail: 13922128311@139.com
cDivision of Microbiology and Immunology, School of Biomedical Sciences, The University of Western Australia, Crawley, Western Australia, Australia
dCenter for Precision Health, School of Medical and Health Sciences, Edith Cowan University, Perth, Western Australia, Australia. E-mail: healthscience@foxmail.com
eSchool of Agriculture and Food Sustainability, University of Queensland, Brisbane, Queensland, Australia

Received 18th June 2024 , Accepted 30th August 2024

First published on 2nd September 2024


Abstract

Klebsiella pneumoniae is one of the most common causes of hospital-acquired infections, especially due to the emergence of the hypervirulent K. pneumoniae (hvKp) strains. Multiple methods have been developed to discriminate hvKp strains from classical K. pneumoniae (cKp) strains, such as the presence of candidate genes (e.g., peg-344, iroB, and iucA), high level of siderophore production, hypermucoviscosity phenotype, etc. Although the string test is commonly used to confirm the hypermucoviscosity of K. pneumoniae strains, it is a method lacking rigidity and accuracy. Surface-enhanced Raman spectroscopy (SERS) coupled with machine learning algorithms has been widely used in discriminating bacterial pathogens with different phenotypes. However, the technique has not be applied to identify hypermucoviscous K. pneumoniae (hmvKp) strains. In this study, we isolated a set of K. pneumoniae strains from clinical samples, among which hmvKp strains (N = 10) and cKP strains (N = 10) were randomly selected to collect SESR spectra. Eight machine learning algorithms were recruited for model construction and spectral prediction in this study, among which support vector machine (SVM) outperforms all other algorithms with the highest prediction accuracy of hmvKp strains (5-fold cross validation = 99.07%). Taken together, this pilot study confirms that SERS, combined with machine learning algorithms, can accurately identify hmvKp strains, which can facilitate the fast recognition of hvKP strains when combined with relevant methods and biomarkers in clinical settings in the near future.


1. Introduction

Klebsiella pneumoniae (Kp) is a Gram-negative, encapsulated, and non-motile human pathogen that is also widely distributed in the environment.1,2 The bacterium has two distinct pathotypes: classical Klebsiella pneumoniae (cKP) and hypervirulent Klebsiella pneumoniae (hvKp).3,4 The cKP strains can cause diverse infections including but not limited to bacteremia, pneumonia, and urinary tract infection, especially in neonates, the elderly, and immunocompromised individuals,5 while the hvKp strains are much more virulent and mainly cause community-acquired infections in young and healthy individuals with liver abscess, pneumonia, meningitis, and endophthalmitis.6 Considering the vast differences between the two pathotypes regarding their infection capacity and susceptible population, it is important to differentiate cKP and hvKp strains in clinical settings, which could guide further clinical treatment.

Current methods for discriminating cKP and hvKp strains include string test, colony morphology, Galleria mellonella infection model, and mouse lethality assay.7–9 However, these methods are labor-intensive and time-consuming due to complex procedures, which demand the development of simple and rapid methods.10,11 Early studies attributed the phenotype of hypermucoviscosity as a trait for hvKp,12 which led to the interchangeable usage of hmvKp and hvKp.13 This hypermucoviscosity phenotype was thought to be attributed to the carriage of a virulence plasmid that harbors two CPS regulator genes, rmpA and rmpA2, and several siderophore gene clusters.14 However, later studies found inconsistency in the relationship between hypervirulence and hypermucoviscosity.6,15 Therefore, multiple factors should be considered to discriminate hvKp from cKP, including but not limited to genetic structure, virulence-associated capsule type (K), siderophores for iron acquisition, hypermucoviscosity, etc.13

Currently, hypermucoviscosity is still considered a plausible phenotypic feature of hvKp. However, the commonly used string test for hmvKp recognition is subjective and semi-quantitative. Therefore, it is worth developing a detection method to accurately identify hmvKP. Previously, Zhou et al. systematically reviewed the application of SERS technology in bacterial detection, showing SERS technology as an effective method for distinguishing closely related bacteria.16 Das et al. successfully achieved sensitive detection of R6G molecules at concentrations as low as 10−12 M and obtained reproducible SERS signals at bacterial concentrations as low as 100 CFU mL−1 by optimizing the SERS substrate.17 Almeida et al. confirmed the feasibility of SERS technology in distinguishing Acinetobacter baumannii and Klebsiella pneumoniae and achieved high prediction accuracy by incorporating machine learning algorithms.18

In this study, we developed a method based on SERS coupled with machine learning algorithms for the accurate screening of hmvKp strains. In particular, the average SERS spectra with characteristic peaks for both hmvKp strains and cKp strains were generated for comparison. Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) algorithm was then applied to cluster SERS spectra into two independent sets. Eight machine learning algorithms were applied to construct predictive models, based on which the prediction accuracy of the Support Vector Machine (SVM) algorithm achieved the highest prediction accuracy at 99.07% (5-fold cross-validation). Taken together, we conclude that SERS coupled with the machine learning method has promising potential in accurately identifying hmvKp strains in clinical settings.

2. Methods and materials

2.1 Screening of hypermucoviscous K. pneumoniae strains

All K. pneumoniae strains were collected at the Department of Laboratory Medicine, Guangdong Provincial People's Hospital, China. 10 hmvKp and 10 cKP strains were randomly selected for analysis in this study. All the Kp strains were grown in commercial Luria Bertani (LB) liquid culture to the exponential growth phase and harvested by centrifugation at 4500 rpm for 8 min, followed by separating the supernatant and pellet. Pellets were resuspended in 2 mL sterile distilled deionized water (ddH2O). Bacterial concentration was determined by a traditional plate-counting test. A suitable amount of bacterial liquid culture is picked up with a sterile loop, inoculated on blood agar plates, and then incubated in a 37 °C carbon dioxide incubator for 24 h. The semi-quantitative string test method was then conducted to identify hmvKp strains. In particular, hmvKp-positive strains generated a string longer than 5 mm long by stretching bacterial colonies on an agar plate using an inoculation loop or needle.

2.2 Preparation of silver nanoparticles as SERS substrate

The methods to synthesize silver nanoparticles (AgNPs) have been well-recorded in a previous study.19 In particular, 200 mL of ddH2O and 33.72 mg of silver nitrate (AgNO3) were added to a triangular flask while being stirred and heated to boiling. 8 mL of sodium citrate (Na3C6H5O7) was then added to the solution, which was heated for 40 min at 650 rpm. After that, heating was terminated while stirring continued until the solution cooled down to room temperature (RT). 1 mL of the prepared solution was then transferred to a clean Eppendorf (EP) tube and centrifuged for 7 min at 7000 rpm. The supernatant was discarded, and the pellet was resuspended with 100 μL of ddH2O, which was then stored at RT without light for long-term use.

2.3 Collection of bacterial SERS spectra

SERS spectra were collected using the InVia™ Confocal Raman Microscope (Renishaw, UK). The Raman spectroscope was equipped with a 785 nm diode laser, achieving a spectral resolution of less than 1 cm−1. A bacteria sample of 10 μL was mixed with 10 μL of AgNPs, which was then incubated for 15 min at RT to ensure that silver nanoparticles sufficiently interacted with bacterial samples before dropping the mixture on a silicon wafer. The instrument's wavelength was calibrated automatically using an interior silicon wafer plus manual adjustment of the external silicon wafer by setting the silicon peak at 520 cm−1. Bacterial samples were excited with a near-infrared 785 nm diode laser in a range of 500–1800 cm−1. The Raman excitation light was focused onto the sample using a 50× objective lens with a laser power of 150 mW. To ensure the stability and reproducibility of the results, a fixed integration time of 20 seconds per spectrum was implemented. For each Kp strain, a total of forty-five spectra were collected under controlled conditions with constant room temperature, guaranteeing the consistency of spectral acquisition for each sample.

2.4 Average SERS spectra and deconvolution analysis

The average intensity of all replicated SERS spectra for each bacterial sample (n = 45/sample) at each Raman shift was calculated to generate average SERS spectra. The spectral standard deviation (SD) was computed and visualized in each average SERS spectrum to indicate the stability and repeatability of the experimental data. The software Origin (OriginLab, United States) was used to plot the average SERS signal, in which the shaded error bands represented standard errors. The wider the error band, the worse the reproducibility. Spectral characteristic peaks were generated using LabSpec6 (HORIBA Scientific, Japan). Specifically, the GaussLoren function was used to fit peaks with parameters set to level = 15% and size = 20. To explore the differences in SERS spectra between hmvKp and cKp, spectral deconvolution was conducted to process the average SERS signal. Specifically, the function of fit peaks pro in Origin software was used to fit characteristic peaks, and the function Vogit as the convolution form of Gaussian and Lorentzian functions was used to generate a deconvolution sub-band for each average SERS spectrum. The biological meanings of all the characteristic peaks were sourced from literature.

2.5 Clustering analysis of SERS spectra

To explore the inherent differences of SERS spectra between hmvKp and cKP samples, two clustering algorithms, T-distributed Stochastic Neighbor Embedding (TSNE) and Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), were implemented for analyzing and visualizing the bacterial spectra. For TSNE analysis, the TSNE function from the scikit-learn library (version 0.21.3) was utilized with the parameters set to n_components = 2 and learning_rate = 60. The first two dimensions obtained from TSNE dimensionality reduction were used as the X and Y axes for data visualization. In OPLS-DA analysis, the data matrix was imported into the SIMCA software version 13.0 32 bit (Umetrics, Sweden). The model type was set to OPLS-DA, and then the auto_fit function was performed. The software automatically calculated R2X, R2Y, and Q2 as comprehensive metrics to evaluate the model's performance.

2.6 Supervised learning analysis of SERS spectra

To identify hmvKp strains accurately, we compared the performance of eight machine learning algorithms, including Adaptive boosting (AdaBoost), Bootstrap Aggregating (Bagging), Decision Tree (DT), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGB). The selection of hyperparameters and the generation of decision models are crucial factors in machine learning for sample prediction. In this study, we pre-defined the parameter ranges for each machine learning algorithm. For details on these hyperparameters, please refer to ESI Table S1. We employed the GridSearchCV function (cv = 5) to find the optimal combination of hyperparameters, which performed a grid search over the specified parameter space. By plotting the grid search gradient, we could assess whether the model parameters were being sufficiently iterated. The resulting parameter combination was used to train the predictive model. Before model training, we used the train_test_split function to divide the dataset into an 80% training set. Within the training set, 20% of the data was allocated as a validation set to detect potential overfitting. The remaining 20% was reserved as a test set, allowing us to evaluate the model performance on unknown data.

2.7 Model performance evaluation

To evaluate the recognition and prediction capabilities of the eight supervised machine learning algorithms, we computed six evaluation metrics for each model, including accuracy (accuracy_score), precision (precision_score), recall (recall_score), F1 (f1_score), 5-fold cross-validation (cross_val_score), and area under curve (roc_auc_score). In particular, the average metrics for precision and recall were set to micro and macro, respectively. Considering the interdependence of precision and recall, the F1 score was computed as the harmonic mean of the two metrics, and the average parameters were weighted. We employed a 5-fold cross-validation (CV) method to prevent overfitting, splitting the training dataset into five equally-sized subsets using the cross_val_score function with parameter cv = 5. AUC differs from the above metrics, which do not depend on the threshold selection. The larger the area under the curve, the better the model effect is. In this study, the roc_auc_score function was used to calculate AUC values. These evaluations provided insights into the model's performance and discriminating capabilities across various thresholds. Additionally, we created confusion matrices for each model to distinguish between hmvKp and cKp. These confusion matrices were generated by the confusion_matrix function, providing a detailed representation of the model's predictions for different types of SERS spectra in matrix form.

3. Result

3.1 Average and deconvoluted SERS spectra

Different bacteria exhibit distinct variations in Raman intensities and the distribution of characteristic peaks, which could be utilized to discriminate bacterial species and phenotypes. In this study, we calculated the average Raman intensities at each Raman shift to generate the average SERS spectra for hmvKp (Fig. 1A) and cKp (Fig. 1B). To assess the reproducibility of the data, the standard deviations of SERS spectra for both hmvKp and cKp were calculated, and they were presented by shaded regions in each average SERS spectra. The results showed that the reproducibility of SERS spectra was in good condition. To further assess the reproducibility of the SERS spectra, a comprehensive analysis was conducted to visualize variations in three representative characteristic peaks for each SERS spectrum, that is, 653 cm−1, 730 cm−1, and 1329 cm−1. The Relative Standard Deviations (RSD) for the representative characteristic peaks for both bacteria types were within acceptable ranges (Fig. 1C and D), demonstrating that the SERS spectra collected in this study had good robustness. Since the average SERS spectra of hmvKp and cKp showed minimal differences, we utilized the spectral deconvolution method to further refine the distinctions between the two average SERS spectra, and the characteristic peaks for each spectrum were shown in Fig. 1E and F, respectively. The two SERS spectra were observed to have seven common spectral characteristic peaks. However, there were noticeable variations in the intensities of these peaks. In particular, the peak at 653 cm−1 was associated with COO-bending in tyrosine.20 Research shows that when tyrosine kinase mutations are encoded on the chromosome, both increased mucoid and increased cell-free extracellular polysaccharide (EPS) are observed.21 The 730 cm−1 peak was identified as Adenine.22 When this component is methylated, it serves as a diagnostic biomarker for the pathogenicity of K. pneumoniae.23 The relative signal intensity differences of the two peaks were more pronounced in hmvKp than in cKP. In addition to the shared characteristic peaks, each average SERS spectrum exhibited distinctive features. For example, the observed characteristic peak at 1218 cm−1 in cKP samples was also ascribed to Adenine, as shown in Fig. 1G.24 This substance plays a key role in virulence gene expression.23 Conversely, the distinct peak at 1690 cm−1 was attributed to the vibrational model of Amide I of protein in hmvKp (Fig. 1H).25 The vibration of such proteins may be related to the inner membrane of hmvKp, which could lead to the association of capsular hyperproduction with hypermucoviscosity and hypervirulence.26 Further details on characteristic peaks are provided in ESI Table S2.
image file: d4ay01137f-f1.tif
Fig. 1 Average and deconvoluted SERS spectra of hmvKp and cKp strains. (A and B) Average SERS spectra of hmvKp (N = 450) and cKp (N = 450). Error bands were present in the shaded region. The narrower the error band, the higher the repeatability of the SERS spectrum. The X-axis represents Raman shifts in the range of 530–1800 cm−1, while the Y-axis represents the relative Raman intensity. a.u., which means artificial unit and has no real meaning. (C and D) The RSD of 653, 730, and 1328 cm−1 characteristic peaks for each SERS spectrum of hmvKp and cKp strains. (E and F) Deconvoluted SERS spectra of hmvKp and cKp strains. (G and H) Characteristic peaks of SERS spectra and their biological meanings in hmvKp and cKp strains.

3.2 Clustering analysis of hmvKp and cKp SERS spectra

Clustering analysis allows for the identification of common characteristics between data of the same type, thereby dividing different types of SERS spectra into separate clusters.27 In this study, we utilized two clustering algorithms, TSNE and OPLS-DA, to analyze the SERS spectral data of hmvKp and cKp strains. According to the results presented in Fig. 2, it could be seen that the TSNE was unable to accurately differentiate SERS spectra of hmvKp and cKp strains, and there was no clear discernible boundary between different sample points (Fig. 2A). In contrast, OPLS-DA had prior knowledge and learned the relationship between partial data of the same type, enabling it to better discriminate between two types of SERS spectra (Fig. 2B). The clustering effects of OPLS-DA analysis were quantified by three parameters, namely, R2X (cum) = 0.996, R2Y (cum) = 0.873, and Q2 (cum) = 0.864, which indicated that the algorithm could effectively identify hmvKp and cKp spectra, revealing the associative rules within the data.
image file: d4ay01137f-f2.tif
Fig. 2 Clustering analysis of SERS spectra of hmvKp and cKp strains through TSNE and OPLS-DA algorithms. (A) Scatterplot of SERS spectra via TSNE algorithm. (B) Scatterplot of SERS spectra via OPLS-DA algorithm.

3.3 Machine learning analysis of hmvKp and cKP SERS spectra

3.3.1 Parameter optimization. Eight machine learning algorithms were recruited for model development based on the training of SERS spectral data to accurately predict hmvKp and cKp strains. The selection of hyperparameters for machine learning models is pivotal. In this study, we predefined the hyperparameter ranges for each model and utilized the GridSearchCV function to iterate and evaluate the performance of different hyperparameter combinations. Fig. 3 shows that the recognition accuracy of each model improved with the combination and iteration of parameters. Among these algorithms, SVM achieved optimal performance with only a few parameter combinations and remained stable. AdaBoost, Bagging, and XGBoost also reached the “steady state” quickly after parameter optimization but exhibited some fluctuations. QDA, on the other hand, showed less significant performance improvement even after extensive parameter combinations.
image file: d4ay01137f-f3.tif
Fig. 3 Hyperparameter optimization of machine learning algorithms during model construction. The color scheme enhances the visibility of the performance gradient. (A) AdaBoost. (B) Bagging. (C) Decision tree. (D) Linear discriminant analysis. (E) Quadratic discriminant analysis (QDA). (F) Random forest. (G) Support vector machine. (H) eXtreme gradient boosting.
3.3.2 Evaluation of supervised machine learning algorithms. After determining the optimal parameter combination for each algorithm, to establish an effective predictive model for differentiating cKp and hmvKp strains through SERS spectrum analyses, we conducted a comprehensive evaluation of eight different machine learning algorithms by using six performance metrics to assess the discriminative ability of the model. According to the results presented in Table 1, except for QDA, all algorithms achieved a prediction accuracy rate of over 95%, indicating the effectiveness of machine learning algorithms in identifying bacterial samples via SERS spectra. It is worth noting that SVM demonstrated the highest recognition accuracy (accuracy = 99.74%) and had the best overall performance in the remaining metrics. However, due to the high dimensionality of SERS signals, the QDA algorithm encountered difficulties handling the SERS features, resulting in a significant increase in error rate (accuracy = 87.50%).
Table 1 Comparison of the predictive abilities of eight supervised machine learning algorithms in the discriminative prediction of cKp and hmvKp strains based on SERS spectral data
Algorithms Accuracy Precision Recall F1 5-Fold CV AUC
SVM 99.74% 99.74% 99.69% 99.69% 99.07% 0.9998
RF 98.50% 98.50% 97.49% 98.50% 98.98% 0.9849
Bagging 98.00% 98.00% 97.99% 98.00% 98.62% 0.9735
XGBoost 97.50% 97.50% 97.51% 97.50% 95.00% 0.9700
DT 96.50% 96.50% 96.51% 96.51% 92.98% 0.9339
AdaBoost 95.50% 95.50% 95.59% 94.49% 91.02% 0.9223
LDA 95.50% 95.50% 95.45% 94.49% 90.98% 0.9291
QDA 87.50% 87.50% 87.51% 87.50% 74.96% 0.8751


3.4 Confusion matrix

A confusion matrix was utilized to comprehensively evaluate the prediction performance of eight machine learning algorithms on hmvKp and cKp SERS spectra. In this study, we revealed that the SVM algorithm exhibited superior capability in the classification task, achieving a 100% prediction accuracy for hmvKp and cKp spectra in the test dataset. The remaining algorithms also demonstrated relatively stable performance, albeit with some instances of misclassification. For example, the RF algorithm exhibited a 1% error rate for hmvKp and cKp SERS spectra. However, these misclassifications accounted for a low proportion, indicating that these algorithms could be potential tools for distinguishing different bacterial SERS spectra. It is worth noting that the QDA algorithm performed poorly with high-dimensional feature data and sensitivity to redundant features, resulting in misprediction rates of 12% for hmvKp and 13% for cKp, respectively (Fig. 4).
image file: d4ay01137f-f4.tif
Fig. 4 Confusion matrix for eight machine learning algorithms in prediction hmvKp and cKp strains. (A) AdaBoost. (B) Bagging. (C) DT. (D) LDA. (E) QDA. (F) RF. (G) SVM. (H) XGBoost. For each confusion matrix, the rows correspond to phenotypes (hmvKp or cKp) determined through experimental validation, while the columns correspond to phenotypes predicted by supervised machine learning algorithms (predicted classes).

4. Discussion

hvKp can cause life-threatening infections in healthy individuals at multiple sites with subsequent metastatic spread; in contrast, cKp is more likely to affect hosts with underlying diseases in hospital settings.4 Therefore, there is a practical demand in clinical settings to accurately discriminate hvKp and cKp strains. However, the identification of hvKp is less appreciated in the clinical laboratory at the current stage due to the lack of clear definitions for the hypervirulent pathogen,6 though it was widely found that hypermucoviscosity is a representative phenotype.6 As far as we know, there is currently no study using Raman spectroscopy to differentiate hmvKp strains from cKp strains. In addition, numerous studies showed that the SERS technique generated much better fingerprint spectra than Raman spectroscopy for bacterial species, which could be easily analyzed via machine learning algorithms.27,28 Therefore, we explored the possibility of using the SERS technique combined with machine learning algorithms to accurately identify hmvKp strains for the first time.

In this study, we first analyzed the average SERS spectral signal and the representative characteristic peaks of hmvKp and cKp strains; high consistency was observed in spectral reproducibility within each group. However, the spectral similarity between Kp strains is often high, making it difficult to differentiate based solely on the average SERS distribution patterns. Deconvolution, as an effective method for extracting spectral features, has previously been successfully used to deconstruct the Raman spectral signals of SARS-CoV-2 and C. auris clades/subclades.29,30 In this study, deconvolution is employed to elucidate the subtle differences between the SERS spectra of hmvKp and cKp.Unique Raman shifts were identified via spectral deconvolution at 1218 cm−1 (Amide III) for hmvKp and at 1690 cm−1 (Amide I) for cKp, though the differences are not sufficient to explain the molecular compositions of the two strains.

Dimensionality reduction and spectral visualization are techniques used for qualitative analysis of the internal structure and patterns in SERS spectral data, allowing for the differentiation of highly similar spectra. T-SNE and OPLS-DA algorithms have been widely used in multiple studies,31 such as in the classification of antibiotic resistance profiles of Staphylococcus aureus32 or the identification of different bacterial types in urinary tract infections.17 In this study, the results confirmed that the SERS spectral data from hmvKp and cKp were grouped into separated clusters during OPLS-DA analysis, indicating the two strains were intrinsically different regarding their capsular compositions. However, the predictive capability of this method for unknown data (Q2) is only 0.864, which requires further investigation via advanced instrument and computational methods. In addition, to identify hmvKp and cKp based on the SERS spectra from unknown Kp strains, supervised learning algorithms are needed rather than clustering SERS spectra into different groups.28,33–35 Machine learning, with its powerful feature extraction and data analysis capabilities, has greatly alleviated the difficulty of processing SERS signals. For instance, Tseng et al. developed a machine-learning model for rapidly identifying SERS signals from eight common bacteria in clinical blood samples, which can swiftly determine bacterial Gram-staining classification and antibiotic resistance, effectively guiding early antibiotic use.36 In addition, Rathnayake et al. created an intelligent model for multiplex detection of periodontal pathogens, achieving a 95.6% accuracy in identifying three types of bacteria in samples.37 As for the eight supervised learning algorithms constructed and compared in this study, SVM performed best and achieved the highest predictive accuracy. The results were further confirmed by confusion matrix analysis, which yielded consistent results with the SVM model as the best predictor at 100% accuracy.

However, there are still some challenges in applying SERS technology in clinical practice. For example, the preparation and processing of SERS substrates are still in the laboratory stage, and methods for synthesizing substrates with high stability and strong reproducibility need to be established by forming industry committees dedicated to SERS testing.38 The number of strains and spectra involved in the study is low, which generates a small amount of data for model training and validation, leading to limitations for model robustness and generalization. Therefore, in future studies, data sharing among different research teams, as well as the construction of large Raman spectral databases, is proposed as a solution to this bottleneck.39 Furthermore, the portability of SERS systems presents additional challenges. There are significant differences in data between different Raman spectrometers, even within the same model from the same manufacturer, due to subtle variations in frequency, intensity, and fluorescence levels of optical components.40 Therefore, adopting established system transfer standards would greatly aid in addressing this issue.41 In sum, although SERS technology currently still has a distance from the laboratory to the clinical setting, through this study, it was confirmed that the low-cost, non-invasive, and label-free SERS technique could be applied for the differentiation of hmvKp and cKp in high accuracy when combined with a machine learning algorithm.

5. Conclusion

Hypermucoviscosity is an important pathogenic phenotype during the infection of K. pneumonia. It could be used to define the notorious hvKp strains when combined with other phenotypes, genotypes, and biomarkers. Previously, the hypermucoviscosity phenotype was regularly confirmed in K. pneumoniae when a string stretched from a colony via a loop greater than 5 mm long. However, the method is semi-quantitative and suffers from high false results. In this pilot study, we developed a method for accurate screening of hmvKp strains through the combination of label-free SERS techniques coupled with machine learning algorithms. The analysis of average SERS spectra revealed subtle differences in the distribution of characteristic peaks, while the clustering algorithm OPLS-DA showed effectiveness in identifying SERS spectra from hmvKp and cKp strains, revealing the associative rules within the data. Moreover, eight machine learning algorithms were recruited to construct predictive models with optimized parameters based on SERS spectra, and the comparative analysis showed that the SVM model outperformed all other algorithms and had the best prediction performance, achieving an accuracy of 99.74% with a five-fold cross-validation rate at 99.07%. Taken together, this pilot study shows that the label-free SERS technique, when combined with machine learning algorithms, can accurately identify hmvKp strains, which holds the potential in the fast recognition of hypervirulent K. pneumoniae strains when coupled with other phenotypes, genotypes, and biomarkers in clinical settings in the near future.

Data availability

The raw data that were used in this study to support the conclusions is freely available under request.

Author contributions

LYZ, GHL, and LW conceived the project, designed the experiments, and provided the platform and resources. LYZ and LW contributed to project administration and student supervision. LYZ, JWT, BST, YHH, XYL, YZ, XXC, YRQ, and XYZ conducted experimental and computational investigations. All the authors wrote and revised the manuscript. All the authors approved the submitted version of the manuscript.

Conflicts of interest

The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgements

This study was supported by the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2021A1515220022, 2022A1515220023), Research Foundation for Advanced Talents of Guandong Provincial People's Hospital (Grant No. KY012023293), and Ganzhou Science and Technology Bureau Project (Grant No. GZ2022ZSF252).

References

  1. S. S. Magill, E. O'Leary, S. J. Janelle, D. L. Thompson, G. Dumyati, J. Nadle, L. E. Wilson, M. A. Kainer, R. Lynfield, S. Greissman, S. M. Ray, Z. Beldavs, C. Gross, W. Bamberg, M. Sievers, C. Concannon, N. Buhr, L. Warnke, M. Maloney, V. Ocampo, J. Brooks, T. Oyewumi, S. Sharmin, K. Richards, J. Rainbow, M. Samper, E. B. Hancock, D. Leaptrot, E. Scalise, F. Badrun, R. Phelps and J. R. Edwards, N. Engl. J. Med., 2018, 379, 1732–1744 CrossRef PubMed.
  2. N. Raffelsberger, M. A. K. Hetland, K. Svendsen, L. Småbrekke, I. H. Löhr, L. L. E. Andreassen, S. Brisse, K. E. Holt, A. Sundsfjord, Ø. Samuelsen and K. Gravningen, Gut Microbes, 2021, 13, 1–20 CrossRef PubMed.
  3. J. C. Catalán-Nájera, U. Garza-Ramos and H. Barrios-Camacho, Virulence, 2017, 8, 1111–1123 CrossRef PubMed.
  4. C. M. Marr and T. A. Russo, Expert Rev. Anti-Infect. Ther., 2018, 17, 71–73 CrossRef PubMed.
  5. M. K. Paczosa and J. Mecsas, Microbiol. Mol. Biol. Rev., 2016, 80, 629–661 CrossRef CAS PubMed.
  6. T. A. Russo and C. M. Marr, Clin. Microbiol. Rev., 2019, 32, 1–42 CrossRef PubMed.
  7. G. Li, J. Shi, Y. Zhao, Y. Xie, Y. Tang, X. Jiang and Y. Lu, Eur. J. Clin. Microbiol. Infect. Dis., 2020, 39, 1673–1679 CrossRef CAS PubMed.
  8. D. Mai, A. Wu, R. Li, D. Cai, H. Tong, N. Wang and J. Tan, BMC Microbiol., 2023, 23, 369 CrossRef CAS PubMed.
  9. F. U. Ciloglu, M. Hora, A. Gundogdu, M. Kahraman, M. Tokmakci and O. Aydin, Anal. Chim. Acta, 2022, 1221, 340094 CrossRef CAS PubMed.
  10. T. Rajarathinam, S. Kim, D. Thirumalai, S. Lee, M. Kwon, H.-j. Paik, S. Kim and S.-C. Chang, Biosensors, 2021, 11, 439 CrossRef CAS PubMed.
  11. T. Rajarathinam, D. Thirumalai, S. Jayaraman, S. Kim, M. Kwon, H.-j. Paik, S. Kim, M. Kang and S.-C. Chang, Micromachines, 2022, 13, 1428 CrossRef PubMed.
  12. C.-T. Fang, Y.-P. Chuang, C.-T. Shun, S.-C. Chang and J.-T. Wang, J. Exp. Med., 2004, 199, 697–705 CrossRef CAS PubMed.
  13. J. E. Choby, J. Howard-Anderson and D. S. Weiss, J. Intern. Med., 2019, 287, 283–300 CrossRef PubMed.
  14. M. D. Alcántar-Curiel and J. A. Girón, Virulence, 2015, 6, 407–409 CrossRef PubMed.
  15. Y.-C. Lin, M.-C. Lu, H.-L. Tang, H.-C. Liu, C.-H. Chen, K.-S. Liu, C. Lin, C.-S. Chiou, M.-K. Chiang, C.-M. Chen and Y.-C. Lai, BMC Microbiol., 2011, 11, 1–8 CrossRef PubMed.
  16. X. Zhou, Z. Hu, D. Yang, S. Xie, Z. Jiang, R. Niessner, C. Haisch, H. Zhou and P. Sun, Advanced Science, 2020, 7, 2001739 CrossRef CAS PubMed.
  17. S. Das, K. Saxena, J.-C. Tinguely, A. Pal, N. L. Wickramasinghe, A. Khezri, V. Dubey, A. Ahmad, V. Perumal and R. Ahmad, ACS Appl. Mater. Interfaces, 2023, 15, 24047–24058 CrossRef CAS PubMed.
  18. M. P. de Almeida, C. Rodrigues, Â. Novais, F. Grosso, N. Leopold, L. Peixe, R. Franco and E. Pereira, Biosensors, 2023, 13, 149 CrossRef CAS PubMed.
  19. L. Wang, J.-W. Tang, F. Li, M. Usman, C.-Y. Wu, Q.-H. Liu, H.-Q. Kang, W. Liu, B. Gu and J. L. Jacobs, Microbiol. Spectrum, 2022, 10, 1–11 Search PubMed.
  20. S. Bashir, H. Nawaz, M. Irfan Majeed, M. Mohsin, A. Nawaz, N. Rashid, F. Batool, S. Akbar, M. Abubakar, S. Ahmad, S. Ali and M. Kashif, Spectrochim. Acta, Part A, 2021, 258, 1–13 CrossRef PubMed.
  21. S. Khadka, B. E. Ring, R. S. Walker, L. R. Krzeminski, D. A. Pariseau, M. Hathaway, H. L. Mobley and L. A. Mike, Msphere, 2023, 8, e00288–e00223 CrossRef PubMed.
  22. X. Chen, M. Tang, Y. Liu, J. Huang, Z. Liu, H. Tian, Y. Zheng, M. L. de la Chapelle, Y. Zhang and W. Fu, Microchim. Acta, 2019, 186, 1–8 CrossRef PubMed.
  23. C.-T. Fang, W.-C. Yi, C.-T. Shun and S.-F. Tsai, J. Microbiol., Immunol. Infect., 2017, 50, 471–477 CrossRef CAS PubMed.
  24. S. Bashir, H. Nawaz, M. I. Majeed, M. Mohsin, S. Abdullah, S. Ali, N. Rashid, M. Kashif, F. Batool, M. Abubakar, S. Ahmad and A. Abdulraheem, Photodiagn. Photodyn. Ther., 2021, 34, 1–12 CrossRef PubMed.
  25. R. Ullah, S. Khan, Z. Ali, H. Ali, A. Ahmad and I. Ahmed, Photodiagn. Photodyn. Ther., 2022, 39, 1–5 CrossRef PubMed.
  26. J. C. Catalán-Nájera, U. Garza-Ramos and H. Barrios-Camacho, Virulence, 2017, 8, 1111–1123 CrossRef PubMed.
  27. J.-W. Tang, Q.-H. Liu, X.-C. Yin, Y.-C. Pan, P.-B. Wen, X. Liu, X.-X. Kang, B. Gu, Z.-B. Zhu and L. Wang, Front. Microbiol., 2021, 12, 1–14 Search PubMed.
  28. M. Usman, J.-W. Tang, F. Li, J.-X. Lai, Q.-H. Liu, W. Liu and L. Wang, J. Adv. Res., 2022, 91–107 Search PubMed.
  29. G. Pezzotti, F. Boschetto, E. Ohgitani, Y. Fujita, M. Shin-Ya, T. Adachi, T. Yamamoto, N. Kanamura, E. Marin and W. Zhu, Advanced Science, 2022, 9, 2103287 CrossRef CAS PubMed.
  30. G. Pezzotti, M. Kobara, T. Nakaya, H. Imamura, T. Fujii, N. Miyamoto, T. Adachi, T. Yamamoto, N. Kanamura and E. Ohgitani, Int. J. Mol. Sci., 2022, 23, 11736 CrossRef CAS PubMed.
  31. W. Liu, J.-W. Tang, J.-Y. Mou, J.-W. Lyu, Y.-W. Di, Y.-L. Liao, Y.-F. Luo, Z.-K. Li, X. Wu and L. Wang, Front. Microbiol., 2023, 14, 1–11 Search PubMed.
  32. X. Chen, M. Tang, Y. Liu, J. Huang, Z. Liu, H. Tian, Y. Zheng, M. L. de La Chapelle, Y. Zhang and W. Fu, Microchim. Acta, 2019, 186, 1–8 CrossRef PubMed.
  33. W. Liu, J.-W. Tang, J.-W. Lyu, J.-J. Wang, Y.-C. Pan, X.-Y. Shi, Q.-H. Liu, X. Zhang, B. Gu, L. Wang, K. C. Carroll and K. Rebrošová, Microbiol. Spectrum, 2022, 10, 1–13 CAS.
  34. J.-W. Tang, J.-W. Lyu, J.-X. Lai, X.-D. Zhang, Y.-G. Du, X.-Q. Zhang, Y.-D. Zhang, B. Gu, X. Zhang, B. Gu and L. Wang, Microchem. J., 2023, 189, 1–10 CrossRef.
  35. L. Wang, X.-D. Zhang, J.-W. Tang, Z.-W. Ma, M. Usman, Q.-H. Liu, C.-Y. Wu, F. Li, Z.-B. Zhu and B. Gu, Comput. Struct. Biotechnol. J., 2022, 20, 5364–5377 CrossRef CAS PubMed.
  36. Y.-M. Tseng, K.-L. Chen, P.-H. Chao, Y.-Y. Han and N.-T. Huang, ACS Appl. Mater. Interfaces, 2023, 15, 26398–26406 CrossRef CAS PubMed.
  37. R. A. Rathnayake, Z. Zhao, N. McLaughlin, W. Li, Y. Yan, L. L. Chen, Q. Xie, C. D. Wu, M. T. Mathew and R. R. Wang, Int. J. Biol. Macromol., 2024, 257, 128773 CrossRef CAS PubMed.
  38. J. W. Tang, Q. Yuan, X. R. Wen, M. Usman, A. C. Y. Tay and L. Wang, J. Interdiscip. Nanomed., 2024, e20230060 Search PubMed.
  39. J.-y. Lin, H.-t. Liu and J. Zhang, Chemosphere, 2022, 307, 136092 CrossRef CAS PubMed.
  40. N. Blake, R. Gaifulina, L. D. Griffin, I. M. Bell and G. M. Thomas, Diagnostics, 2022, 12, 1491 CrossRef PubMed.
  41. S. Fornasaro, F. Alsamad, M. Baia, L. A. Batista de Carvalho, C. Beleites, H. J. Byrne, A. Chiadò, M. Chis, M. Chisanga and A. Daniel, Anal. Chem., 2020, 92, 4053–4064 CrossRef CAS PubMed.

Footnotes

Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4ay01137f
These authors contributed equally to the study.

This journal is © The Royal Society of Chemistry 2024
Click here to see how this site uses Cookies. View our privacy policy here.