Mert Tunca
Doganay
a,
Purbali
Chakraborty
a,
Sri Moukthika
Bommakanti
a,
Soujanya
Jammalamadaka
a,
Dheerendranath
Battalapalli
a,
Anant
Madabhushi
bc and
Mohamed S.
Draz
*ade
aDepartment of Medicine, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA. E-mail: mohamed.draz@case.edu
bDepartment of Biomedical Engineering, Emory University, Atlanta, GA, USA
cAtlanta Veterans Administration Medical Center, Atlanta, GA, USA
dDepartment of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA
eDepartment of Biomedical Engineering, Cleveland Clinic, Cleveland, OH 44106, USA
First published on 20th September 2024
Artificial intelligence (AI) is revolutionizing medicine by automating tasks like image segmentation and pattern recognition. These AI approaches support seamless integration with existing platforms, enhancing diagnostics, treatment, and patient care. While recent advancements have demonstrated AI superiority in advancing microfluidics for point of care (POC) diagnostics, a gap remains in comparative evaluations of AI algorithms in testing microfluidics. We conducted a comparative evaluation of AI models specifically for the two-class classification problem of identifying the presence or absence of bubbles in microfluidic channels under various imaging conditions. Using a model microfluidic system with a single channel loaded with 3D transparent objects (bubbles), we challenged each of the tested machine learning (ML) (n = 6) and deep learning (DL) (n = 9) models across different background settings. Evaluation revealed that the random forest ML model achieved 95.52% sensitivity, 82.57% specificity, and 97% AUC, outperforming other ML algorithms. Among DL models suitable for mobile integration, DenseNet169 demonstrated superior performance, achieving 92.63% sensitivity, 92.22% specificity, and 92% AUC. Remarkably, DenseNet169 integration into a mobile POC system demonstrated exceptional accuracy (>0.84) in testing microfluidics at under challenging imaging settings. Our study confirms the transformative potential of AI in healthcare, emphasizing its capacity to revolutionize precision medicine through accurate and accessible diagnostics. The integration of AI into healthcare systems holds promise for enhancing patient outcomes and streamlining healthcare delivery.
Tribute to George WhitesidesI joined George at a challenging time in my scientific career, searching for guidance and inspiration. George became that mentor, and working in his lab was both life-saving and life-changing for me. His approach to science—always thinking from unexpected angles—left a lasting impression. What stood out most was not just his unique perspective but the system he developed to manage both the lab and the science, which was unlike anything I had encountered before.I never left a meeting with George without feeling more inspired. We shared many enriching discussions, particularly around innovation and technology. His probing questions pushed me to think more deeply, especially at the intersections of biology and engineering. Most significantly, George took the time to personally guide and support my career development—an experience his administrative staff often described as exceptional. This personal investment, combined with the support of his outstanding lab management team, made my time in his lab one of the most inspiring periods of my career. Mohamed Draz |
POC diagnostics represent a cornerstone of modern healthcare, offering timely and accessible testing solutions, particularly in resource-limited settings.11–13 The integration of AI into microfluidic systems presents a promising avenue for enhancing the accessibility and efficiency of POC testing.14,15 By harnessing advanced ML and DL algorithms, AI enhances the sensitivity, specificity, and multiplexing capabilities of microfluidic devices, enabling rapid and accurate detection of a wide range of diseases and biomarkers directly at the POC.16–18 An important approach where AI is utilized to enhance microfluidic systems is in image processing. ML and DL learning models excel at image classification and pattern recognition tasks and can support microfluidic devices to perform rapid and multiplex assays, allowing for comprehensive screening or testing using minimal resources.19–21 This integration addresses critical gaps in healthcare access and empowers a new level of POC diagnostics, equipping frontline providers with actionable insights and revolutionizing the delivery of healthcare services.
Recent advancements have demonstrated superior performance in identifying disease biomarkers, detecting cancer,22 viruses,23 bacteria,24 and other pathogens,25 underscoring the robustness and clinical relevance of AI-integrated microfluidic platforms in modern healthcare settings. However, despite these advancements, there remains a gap in the comparative evaluations of different AI algorithms in testing microfluidics, and the optimal approach for maximizing their performance in this context remains unclear, particularly in the POC diagnostics.26–31 In POC settings, practical constraints such as cost, power consumption, memory limitations, and computational efficiency are crucial, making the choice of algorithm highly impactful. For instance, logistic regression is relatively simple, with a complexity of O(n × m), where n is the number of samples and m the number of features. It requires moderate computational power and memory, making it a good fit for POC settings that have limited central processing unit (CPU) power and memory.32 Decision trees, with complexity O(n × m × log(n)),33 and random forests, which add an additional factor for the number of trees (O(k × n × m × log(n)),34 where k is the number of trees), require moderate resources. They build tree structures that evaluate multiple features at once. While computationally more demanding than logistic regression, they can still be feasible in many POC setups, especially with fewer trees. Naive Bayes classifiers are computationally efficient due to their independence assumption for features, with complexity O(n × m). This makes them ideal in resource-limited environments. However, this simplification can sometimes reduce predictive performance if feature independence is not a valid assumption.35 On the other hand, support vector machines (SVMs), especially with non-linear kernels, can have significantly higher complexities (O(n2) to O(n3)), making them less suitable for constrained environments without powerful CPUs or graphics processing units (GPUs). However, using linear kernels or approximation methods (e.g., linear SVM or fast SVM) can reduce the computational load, making SVMs a more viable option for POC.36K-Nearest neighbors (K-NN), while simple in terms of training complexity (O(n × m)), can become computationally intensive during inference due to distance calculations between all data points. Optimization techniques like KD-trees (K-dimensional trees) or Ball-trees can speed up inference, making K-NN more feasible for real-time POC applications.37 Neural networks and deep learning models (e.g., convolutional neural networks (CNNs)) typically have a higher complexity of O(n × m × d), where d is the depth of the network. These models require substantial memory and processing power, particularly using GPU/TPU resources (where TPU stands for tensor processing units), which are not commonly available in POC devices. However, methods like dropout, batch normalization, weight pruning, and model distillation can help reduce the computational burden, allowing for more lightweight versions of these models to be deployed on smaller devices.38 Foundation models, like large-scale AI models (e.g., generative pre-trained transformers (GPT), bidirectional encoder representations from transformers (BERT)), present an even bigger challenge due to their high computational demands during both training and inference. These models often require substantial GPU clusters or high-performance computing (HPC) environments, making them impractical for resource-constrained POC settings. In such cases, pre-trained models fine-tuned for specific tasks or more compact versions of these models (e.g., TinyBERT, DistilBERT) might be used instead.39 This trade-off between computational demands and resource availability emphasizes the importance of balancing model performance with resource constraints in POC settings.
We employed a model microfluidic system, featuring a single microfluidic channel loaded with 3D transparent objects of bubbles. This model is designed to rigorously challenge the performance of commonly used AI models and provide insights into their effectiveness in real-world diagnostic scenarios. We integrated various ML and DL algorithms into our study, including CNNs like MobileNetV2, ResNet101V2, and DenseNet169, alongside commonly used ML models in healthcare applications such as Naive Bayes, logistic regression, KNN, SVM, and random forest.40–44 Among the six evaluated ML algorithms, the random forest model performed best, achieving 95.52% sensitivity, 82.57% specificity, and 97% AUC. Similarly, among the nine DL models, DenseNet169 stood out, achieving 92.63% sensitivity, 92.22% specificity, and 92% AUC. Such a comparative study is critical in gaining a comprehensive understanding of the strengths and weaknesses of different algorithms, informing algorithm selection, optimization, and deployment decisions across diverse domains and applications.45–48
In our study, we investigated the efficacy of AI algorithms, including both ML and DL, to facilitate the process of testing microfluidics within POC settings. We employed a microfluidic system comprising a single microfluidic channel to rigorously assess a set of 15 AI models recognized for data analysis and image classification across biomedical and diagnostic domains. Our experimental setup incorporated testing configurations featuring varying densities of bubbles. Bubbles as a readout was selected to probe the imaging and analytical performance of the examined algorithms. Despite bubbles being less prevalent than conventional color-based or fluorescence-based readouts, their inherent 3D transparency poses challenges, as they may be mistaken for non-targeted constituents within the sample matrix, microfluidic system or the testing environment and background. In addition, transparent bubbles can introduce challenges such as refraction and variable light scattering, which may impact imaging accuracy and algorithm performance. By using these bubbles, we aimed to simulate complex real-world imaging conditions and evaluate how well the AI models could handle such complexities. Colorimetric readouts, though linear and would allow comparatively easier workflow, fail to sufficiently encapsulate the intricacies necessary for discerning strengths and weaknesses of the tested algorithms. Meanwhile, fluorescence, although known to support high specificity and sensitivity testing, remains impractical for widespread POC adoption due to the need for bulky equipment and specialized setup to achieve the required sensitivity and specificity in most analyses.
Our set of AI algorithms included ML models, such as Naive Bayes, logistic regression, k-nearest neighbors (KNN), support vector machines (SVM), and random forest, alongside DL CNNs such as MobileNetV2, ResNet101V2, and DenseNet169. By combining traditional ML algorithms with state-of-the-art CNN architectures, we created a diverse ensemble of models that can collectively leverage different aspects of the data. This ensemble approach is essential to enhance robustness and generalization performance, particularly in scenarios where the dataset may be limited or the target features are challenging to discern (i.e., bubbles). The incorporation of traditional ML algorithms stemmed from their robustness in handling various types of features, including those extracted from images, and their suitability for the often constrained datasets characteristic of microfluidic diagnostics at POC settings. The CNN architectures like MobileNetV2, ResNet101V2, and DenseNet169 have unparalleled ability to capture intricate spatial relationships within images, which is crucial for discerning subtle patterns like challenging signals such as bubbles. This aligns with the evolving field of diagnostics, which is moving towards inventing and incorporating more versatile readouts like bubbles to allow for more sensitive and unique detection capabilities, distinct from common ones like color and fluorescence. These CNN architectures offer distinct trade-offs in terms of model size, computational efficiency, and classification accuracy, offering flexibility in addressing the specific nuances of the dataset.
To investigate the capabilities of the selected set of ML and DL algorithms in testing microfluidics, we captured 19097 images of our microfluidic model with bubbles in various settings, including different environments, lighting conditions, times of the day, and backgrounds (Fig. 1). We labeled the captured images either positive or negative, based on the number of bubbles, around a threshold value of 10 bubbles per microchip, to train our ML and DL models (Fig. 1a). Out of the 19097 labelled images (Fig. 1b), 15530 images were utilized for training using Python running on Lambda Vector GPU Workstation (Intel i9-10900x CPU, NVIDIA RTX A6000 GPU) system.
To test the performance of ML models, we used 1595 randomly selected images, excluding those used for training, to evaluate their classification accuracy. We employed standard performance metrics, including accuracy, precision, recall (i.e., sensitivity), specificity, F1 score, and Matthews's correlation coefficient (MCC) (Table S1†), obtained from each model to determine their effectiveness.56 We conducted all statistical analyses and data visualizations using TensorFlow and TensorBoard tools with necessary Python libraries as matplotlib, NumPy, Keras, Sklearn, pandas, torch.57,58 The comparison primarily centered around specificity and sensitivity values, which are metrics influencing overall performance and gives information about other metrics.
Our analysis of the ML models revealed that logistic regression and random forest models exhibited exceptional sensitivity (>90%), while K-nearest neighbors and random forest models demonstrated high specificity (>80%) (Fig. 2a). The results showed that the highest sensitivity value was obtained from the random forests (95.52%) and the highest specificity value was obtained from K-nearest neighbors (89.68%) ML models. we assessed the confusion matrix to better understand the positive and negative predictions. Out of 1595 images, 1447 were classified correctly, with 45 false negatives and 103 false positives. The model primarily made errors in the classification of negative samples (Fig. 2b and S1†). The ROC analysis of the trained models indicated that the random forest (AUC: 97%) (Fig. 2c) and K-nearest neighbors (AUC: 90%) have highest area under the ROC, which represents the diagnostic ability of the model (Fig. S2†). Additionally, the random forest model outperformed others in terms of F1 score (92.8%) and accuracy (90.72%). This shows that the random forest provides most balanced results between precision and sensitivity with highest accuracy. Consequently, the most effective model was observed as random forest with notable metrics as 95.52% sensitivity, 82.57% specificity, 90.72% accuracy, 90.3% precision, 92.8% F1 score, 79.95% MCC, and 97% AUC (Table S1†).
To test the performance of DL models, we continued by evaluating the performance of the selected CNNs architectures using the same dataset of 1595 images. The performance evaluation step was conducted using developed Python algorithms with the help of Pandas, NumPy, Sklearn, Matplotlib, Keras and Tensorflow libraries.57 The deep learning models utilized for this evaluation included MobileNetV2, EfficientNetV2B0, EfficientNetV2B2, DenseNet169, DenseNet201, InceptionV3, ResNet50V2, EfficientNetB5, and ResNet101V2. In selecting these deep learning models, we prioritized those that does not require significant computing power and thus ensure compatibility for evaluation and testing microfluidics at POC. We also ensured that the chosen models were commonly employed for computer vision tasks, prioritizing ease of integration and robust performance on POC compatible mobile devices.19
Our results indicated that DenseNet169, EfficientNetB5, and EfficientNetV2B0 exhibited outstanding sensitivity values of 92.63%, 95.82%, and 91.93%, respectively (Fig. 3a and S3–S5†). ResNet50V2 (89.17%) and InceptionV3 (88.49%) demonstrated high specificity values, while DenseNet169 displayed an exceptional specificity of 92.22% (Table S2†). The confusion matrix revealed further insights into the performance of these algorithms. DenseNet169 algorithm excelled in detecting negative samples, accurately classifying 545 out of 591, while also achieving the second-highest performance in positive classification with 930 out of 1004, resulting in the highest overall performance at 92% (Fig. 3b). Other algorithms including EfficientNetB5 correctly identified 962 out of the tested 1004 positive samples. However, it misclassified 293 negative samples as positive, resulting in a 50.4% performance rate for negative samples and an overall performance rate of 79%. EfficientNetV2B0 exhibited similar performance, albeit with a 7% overall performance rate downgrade, reflecting a 4% difference in true positive performance rate and an 11% decrease in true negative performance rate. The results of MobileNetV2, EfficientNetV2B2, DenseNet201, InceptionV3, ResNet50V2, and ResNet101V2 algorithms are shown in Fig. S4 and S5† with misclassification rates <38%. The ROC analysis of the trained DL models, ResNet50V2 (AUC: 96%), ResNet101V2 (AUC: 96%), InceptionV3 (AUC: 95%) and DenseNet169 (AUC: 92%) and DenseNet201 (AUC: 90%) had the highest area under the ROC (Fig. S6 and S7†). Additionally, the DenseNet169 model outperformed other models in terms of F1 score (93.94%) and accuracy (92.48%) (Table S2†). Overall, DenseNet169 outperformed other models with the performance metrics and gives the applicable model with 0.92 AUC (Fig. 3c).
We compared the performance of random forest and DenseNet169, as these models had outperformed others in our evaluations. To challenge them further, we used a set of 184 microchips prepared with varying numbers of bubbles. A new test set of images was created under different environmental conditions than those used during training. This test set included images taken against different backgrounds (including black, red, brown, metallic grey, and dark blue), rotation, and brightness. This approach allowed us to assess user experience in suboptimal conditions, ensuring a thorough and comprehensive evaluation of the models' performance in real-world microchip testing scenarios. The generated positive and negative prediction rates were analyzed against the ground truth values of bubbles per chip to evaluate the performance of each model. The results revealed that the DenseNet169 DL model achieves prediction rates with better performance compared to the random forest ML model with 80.4% and 88.2% accuracy; 77.98% and 91.81% precision; 81.51% and 87.84% F1 score; 75.3% and 92.31% specificity; and 61.03% and 76.69% MCC for random forest and DenseNet169, respectively. The confusion matrix and ROC analyses, on the other hand, confirmed that the DenseNet169 DL algorithm is the optimal prediction model for testing our microfluidic model, outperforming the random forest ML algorithm by 87% in AUC and 92% in accuracy classifying true positive and true negative (Fig. 4b and c).
To demonstrate the effectiveness of incorporating AI in real-world sample testing scenarios using POC-compatible systems, a mobile application capable of running the DenseNet169 model seamlessly was developed, without the need for further optimization. The application features a simple interface for initiating model evaluation and presents results in terms of positive and negative prediction rates, along with images of the tested microfluidic chips (Fig. S8†). Out of 250 images, 212 were classified correctly, 29 were classified as false negatives, and 9 were classified as false positives. The model primarily made errors in classifying positive samples. The performance metrics were as follows: accuracy: 84.8%, precision: 93.23%, sensitivity/recall: 81.05%, F1 score: 86.71%, specificity: 90.72%, and MCC: 70.09. The deep learning model achieved an AUC value of 0.90, highlighting its superiority in testing our microfluidic model with bubbles (Fig. 5b). Furthermore, upon examining the confusion matrix alongside sensitivity and specificity values. Results showed that the DenseNet169 deep learning model achieved 81.05% sensitivity and 90.72% specificity (Fig. 5a). Heatmap analysis was conducted using images with bubble counts ranging from 0 to 100. The results indicated a higher margin of error around the threshold of 10 bubbles, particularly chips with around 20 to 30 bubbles are ∼30% misclassified as negative.
Our study provides a comprehensive evaluation of both ML and deep learning DL algorithms in the context of microfluidics testing under POC settings. Among the ML models, random forest emerged as the top performer with a sensitivity of 95.52%, specificity of 82.57%, and an AUC of 97%, showcasing its strong capability in accurately classifying microfluidic device images. The high sensitivity and specificity values underscore random forest's effectiveness in distinguishing positive from negative samples even in challenging imaging conditions. However, the higher rate of false positives indicates a potential area for improvement. In contrast, DL models, particularly DenseNet169, exhibited outstanding performance with sensitivity and specificity values of 92.63% and 92.22%, respectively. DenseNet169's consistent high performance across different testing conditions, including variations in background and lighting, highlights its robustness and adaptability, making it highly suitable for real-world POC diagnostics where consistent and reliable performance is crucial.
Despite the promising results, several challenges must be addressed to facilitate the widespread adoption of AI in microfluidic POC diagnostics. One key issue is the misclassification of samples with a marginal number of bubbles, especially around the threshold of 10 bubbles, which was evident in the heatmap analysis. Further refinement of the AI models and incorporating additional features or training data will be necessary to enhance accuracy in borderline cases. Combining multiple algorithms can also help overcome these challenges. For example, employing ensemble techniques that integrate models like U-Net for image segmentation and Canny edge detection for edge detection could improve precision in detecting subtle features. Additionally, integrating algorithms such as YOLO (You Only Look Once) for real-time object detection and HOG (histogram of oriented gradients) for robust feature extraction can further enhance the accuracy and reliability of microfluidic POC diagnostics. Such hybrid approaches can leverage the strengths of different algorithms, providing a more comprehensive and accurate analysis.
Moreover, integrating AI models into mobile applications for POC testing will necessitate ensuring seamless operation across a wide range of devices and environmental conditions, with a strong emphasis on user-friendliness and reliability. This integration is pivotal for achieving the robustness required for practical deployment in diverse healthcare settings. The successful implementation of AI in microfluidic POC diagnostics has far-reaching implications for the healthcare industry, especially in resource-limited settings where access to sophisticated medical infrastructure is often constrained. By enabling rapid, accurate, and on-site testing, AI-driven POC systems address one of the most pressing challenges in modern medicine: the need for timely and precise diagnostics. By democratizing access to high-quality diagnostic tools, AI-integrated POC systems empower frontline healthcare providers with actionable insights, fostering a more equitable distribution of medical resources. This shift supports personalized medicine approaches, tailoring treatment plans to individual patient profiles based on accurate and immediate diagnostic data. Ultimately, the widespread adoption of AI-enhanced microfluidic POC diagnostics can transform healthcare delivery, making it more accessible, efficient, and responsive to the needs of diverse populations worldwide.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4lc00671b |
This journal is © The Royal Society of Chemistry 2024 |