Kaeul
Lim
and
Arezoo
Ardekani
*
School of Mechanical Engineering, Purdue University, West Lafayette, Indiana, USA. E-mail: ardekani@purdue.edu
First published on 16th August 2024
Nanoparticle (NP)-based technologies have gained significant attention in targeted drug delivery, encompassing chemotherapies, photodynamic therapy, and immunotherapy. Hyperspectral imaging (HSI) emerges as a label-free, minimally invasive, and high-throughput technique for quantitative NP analysis. Despite its growing importance, the application of HSI to nanoparticle analysis, especially for label-free characterization and classification, remains limited. Here, we propose a novel method integrating hyperspectral imaging with a spectral noise reduction method and machine learning (ML) for robust nanoparticle classification. There are many challenges to extracting information from noisy and overlapping particles in HSI data. To surmount these challenges, we propose a spectral angle matching (SAM) algorithm to effectively denoise hyperspectral datasets. Complementing this, we employ a support vector machine (SVM) algorithm for classification, leveraging preprocessed HSI data to extract unique spectral signatures. Our hyperspectral imaging classification of multiple nanoparticle types reveals distinct spectral characteristics inherent to each class. The classification accuracy reaches 99.9% for single nanoparticle types, highlighting the efficiency of our method. In the case of classifying multiple particle types, the overall accuracy also reaches 99.9%. Visualization of the NP classification map further demonstrates the efficacy of our model. The application of the SAM-SVM algorithm in hyperspectral analysis outperforms traditional SVM methods in classifying multiple samples, highlighting the potential of our nanoparticle analysis. Our findings not only address the challenges posed by noisy and overlapping particles but also demonstrate the potential of hyperspectral imaging in advancing real-time and label-free detection systems for diverse biomedical applications.
HSI can capture a large spectral range, from the ultraviolet to the infrared, providing abundant information for each image pixel.10,11 Hyperspectral camera sensors measure the light reflected, absorbed, and scattered by materials illuminated by a light source. HSI is a powerful tool expanding into the realm of biological and medical sciences, which holds a proven record of success in astronomy, geosciences, agriculture, and environmental monitoring, among other applications. Since HSI was originally developed for remote sensing and space applications,12,13 machine learning-based classification for HSI is mostly focused on remote sensing data;14–18 only a few studies investigate pharmaceutical applications at the nanoscale.19–22 Combining the spatial-scanning hyperspectral imaging methodology with dark field microscopy is considered highly advantageous for optical studies of nanoscale materials. As an emerging imaging modality for medical applications, HSI offers great potential for classifying nanoparticles without the need for labeling.7,23–25
HSI yields a hyperdata cube with continuous spectral and spatial information in one measurement, allowing non-contact sensing. When the HSI system scans a sample, it provides higher spectral resolution and more continuity between spectral bands than traditional multispectral fluorescence microscopes.8 The spectral signature is the consequence of molecular absorption and particle scattering, allowing to distinguish between materials with different characteristics. In problems with an unknown target spectrum, the continuous spectral data is enough to reveal each sample's unique spectral signature. This allows distinguishing spatially and spectrally overlapping components in nanoparticle samples.26,27 Moreover, this method is sensitive to subtle spectral changes, ensuring discrimination of chemical or biological entities.
HSIs often suffer from various types of noise, such as random noise, stripe noise, and dead pixels.13,28,29 These issues can potentially hinder classification model development and yield misleading results. Therefore, it is critical to address these challenges and ensure the availability of high-quality data for the subsequent analysis.30,31 Various preprocessing methods are commonly applied to hyperspectral data to improve the performance of the classification model. One of the most common preprocessing techniques in signal and image processing is the application of spectral range normalization, such as Min–max scaling and Standard Normal Variate (SNV) normalization. So far, there is no preprocessing method applied to the nanoscale. This paper proposes a modified spectral angle mapper (SAM) based image preprocessing method to denoise hyperspectral images. The SAM utilizes spectral angular information of hyperspectral image data and calculates the spectral similarity between the image spectrum and the reference spectra signature. The SAM is suitable for building a fast, efficient, and universal framework to adapt to different HSI data. The reference spectrum can either be attained from a manufacturer or extracted directly from the HSI data.32,33
We show the effect of the HSI denoising technique as a preprocessing step for HSI classification. The image preprocessing step before machine learning-based classification can greatly enhance classification performance by extracting only representative features. Various machine learning-based methods have been developed for hyperspectral image classification based on statistical parameters such as artificial neural networks (ANNs),34 minimum distance classifiers,35k-nearest neighbors (k-NN),36 Gaussian maximum likelihood estimators,37 convolutional neural networks (CNNs),38 and support vector machines (SVMs).39 Each of these methods brings unique strengths in dealing with hyperspectral data.
Particularly, SVM has been widely applied to identify features with the multiclass problem and is an effective method of statistical learning theory. The challenge of limited training samples relative to the abundance of spectral bands, coupled with the high correlation among these bands, often compromises classification accuracy. The inherent noise among spectral bands also makes hyperspectral image classification challenging. The attraction of this method lies in its ability to locate the optimal hyperplane between the class of interest and the rest of the classes. It achieves separation in a new high-dimensional feature space by considering only the training samples that lie on the edge of the class distributions, referred to as support vectors. This technique can overcome the difficulties present in classifying the limited hyperspectral data. Using kernel functions further enhances the classifier's flexibility, making it robust against outliers.
Conversely, CNNs, while favored for their automatic feature extraction from images, tend to require a large number of labeled datasets for effective generalization. Deep learning models are computationally intensive and often require significant computational resources, especially during training. In this paper, we mainly propose the effect of the preprocessing step alongside machine learning-based classification. The classification was executed using the SVM technique driven by its distinct advantages over other methods. SVM possesses effective generalization ability, making it adept at recognizing patterns and making accurate predictions on previously unseen data.40–42
Addressing the challenge of overfitting in small datasets, the risk of overtraining or overfitting models is a common concern. SVM effectively mitigates this risk, demonstrating resilience against overtraining even when faced with limited data points. This characteristic ensures that our classification model remains reliable, avoiding excessive adaptation to the peculiarities within the training set. This instills confidence in the accuracy of our nanoparticle classification results. SVM can minimize the risks of overtraining with small datasets and efficiently handles the computational demands of hyperspectral image analysis. This strategic choice enhances the reliability and effectiveness of our machine learning-based approach to nanoparticle classification.
In this paper, we present a novel approach aimed at reducing image noise within HSI data, focusing on its applicability in nanoparticle classification. We introduce the Spectral Angle Matching-Support Vector Machine (SAM-SVM) method, which combines spectral similarity analysis to enhance the accuracy of HSI image classification. Through quantitative comparisons with traditional SVM techniques without the preprocessing step, our study aims to improve the classification performance while ensuring time efficiency, particularly in the context of complex nanoparticle analysis. The proposed SAM-SVM method has the potential to revolutionize hyperspectral imaging for nanoscale applications, offering a rapid, label-free classification approach crucial for advancements in drug delivery and biomedical applications.
The remainder of this paper is organized as follows: the experimental setup, preprocessing procedures, machine learning model, and comprehensive analysis of findings. By addressing the unique challenges posed by nanoscale materials, our work contributes to the advancement of hyperspectral imaging for precise nanoparticle characterization, filling a crucial gap in the current state of research.
The proposed nanoparticle spectral analysis consists of HSI image acquisition, preprocessing, extraction of spectral profiles, building an ML-based classification model, and then nanoparticle classification. The simultaneous identification of multiple nanoparticles within a sample is conducted. The overall analysis process is shown in Fig. 2.
Dark-field images were recorded by using an enhanced dark-field illumination system (CytoViva, Auburn, AL) attached to the Nikon ECLIPSE Ni-E microscope. The system consisted of a CytoViva 150 dark-field condenser in place of the microscope's original condenser attached via a fiber optic light guide to the lamp source. A 60× oil immersion color-corrected objective (Nikon UPlanAPO fluorite, N.A. 1.35–0.55) was integral to the system. A 150 W DC-regulated halogen fiber optics light source (Dolan Jenner DC-950, Massachusetts, USA) was used, which covers a wavelength range from approximately 360 nm to 2400 nm. The hyperspectral image of the sample was acquired with a resolution of 2 nm in the wavelength window of visible near-infrared (VNIR, 400 nm–1000 nm) using a 60× objective lens in each 5.2 nm-sized image pixel.
The acquired HSI image undergoes correlation in both the spatial and spectral domains. The hyperspectral data is first preprocessed to adapt them for subsequent feature extraction. Spectral information representing the physicochemical properties of the sample is extracted directly from the segmented objects in the image, serving as the main region of interest. In most circumstances, the extracted spectral data contain noise and variability, and this variability is one of the most challenging problems in HSI data analysis. If the extracted data exhibits a low signal-to-noise ratio, preprocessing steps become imperative. The denoising process employs a combination of background subtraction and inpainting methods for both spatial and spectral enhancement. Fig. 2 shows the nanoparticle classification workflow. The image preprocessing module is illustrated in Fig. 2b–d. An example of the preprocessing result is shown in Fig. 3.
Spectral angle mapper (SAM) is a similarity measure used for HSI data, which groups samples according to a library of reference spectra.43,44 Reference spectra can be selected from a library of reference spectra or estimated based on the particle segmentation data. Since HSI is beneficial as a label-free method, it is crucial to be able to extract reliable reference spectra from unknown samples. The SAM-based algorithm also proposes a reliable method for estimating the reference spectra and validating the method.
The SAM algorithm determines the spectral similarity between the reference spectra and test spectra by calculating the angle between the two spectra, treating them as vectors in a space with dimensions equal to the number of bands.45–47 The spectral angle α is calculated by using:
![]() | (1) |
The spatial distribution of illumination intensity becomes inhomogeneous and nontrivial for a large captured area. Normalization of spectra can lead to minimizing bias from nonuniform spatial illumination, different particle types, or different light intensities. Measurement normalization greatly affects performance, especially when machine learning is used. Moreover, it is necessary in order to convert HSI measurements into reflectance ratios. In addition, in the case of NP spectra, noise affects the location of a peak instead of the height of the peak. Therefore, that peak shift might disappear with spectral filtering.44
The normalized reflectance spectrum is utilized for the preprocessing step. Normalizing the spectral values of every pixel Pλ at a certain wavelength (λ) to the sum of the spectral values of all pixels at all wavelengths (n) using the following equation results in a spectral value Xλ, which is independent of the illumination spectral power distribution, illumination direction, and object geometry.48 This way, bias is removed from reflectance measurements.
![]() | (2) |
To extract spectral data, particle segmentation was carried out, the main purpose of which was to separate only nanoparticles from the background. Nanoparticles are identified and differentiated based on the spectral angle value in each pixel. The entire scanned region (1024 × 200 pixels) was designated as the region of interest (ROI) for each particle sample. The sample concentration is 0.099 mg mL−1, with a sample volume of 15 μL. The result of particle segmentation is shown in Fig. 2d. The average spectra of all segmented particle pixels within each ROI were then calculated and presented in Fig. 2e. We acquired more than three sample datasets to extract representative mean spectra of each nanoparticle shown in Table 1.
Dataset | Traditional SVM accuracy (%) | SAM-SVM accuracy (%) |
---|---|---|
Yellow green fluorescent particles | 80.0 | 99.9 |
Europium chelate fluorescent particles | 80.0 | 99.9 |
Flash red fluorescent particles | 78.5 | 99.9 |
This preprocessed method is validated through a spectra comparison between the estimated average spectra in Fig. 4 and the fluorescent particle manufacture catalog's spectra. The estimated mean spectra in Fig. 4 are in agreement with the catalog spectra, specifically its spectra peak location. Yellow-green fluorescent particles show a reflectance peak at 515 nm, and europium chelate fluorescent particle shows a reflectance peak at 605 nm. In this manner, we can also obtain the representative spectral signature of an unknown sample. The average reflectance spectra of three nanoparticles were used to generate a ground-truth map for the SVM model in the ML-based classification step.
![]() | ||
Fig. 4 Spectral validation (a) yellow-green fluorescent particles (b) europium chelate fluorescent particles. |
SVM works by mapping data of low-dimension space into a higher-dimension space in which a separating hyperplane is constructed to realize linear classification. It separates the data into different categories by finding the best hyperplane and maximizing the distance between points. In practice, most classification problems cannot be solved by using a simple hyperplane as the decision boundary. In such a case, a more complex and elaborate decision boundary is required. By introducing a kernel function, the computational complexity will be effectively reduced. The most typical transformation function is the radial basis function (RBF) kernel.
The RBF kernel is one of the most powerful, useful, and popular kernels in the SVM family of classifiers. Unlike linear or polynomial kernels, RBF is more complex and efficient at the same time that it can combine multiple polynomial kernels of different degrees to project the nonlinearly separable data into higher dimensional space so that it can be separable using a hyperplane. The RBF kernel works by mapping the data into a high-dimensional space by finding the dot products and squares of all features in the dataset and then performing the classification using the basic idea of linear SVM. For projecting the data into a higher dimensional space, the RBF kernel uses the so-called radial basis function, which can be written as:
K(X1,X2) = exp(−γ‖X1 − X2‖2) | (3) |
As noted above, both the spectral and the spatial features influence a pixel's class label prediction. On the other hand, as the geographically close pixels tend to belong to the same class, predicting the class label of a pixel should take into account the class labels of the surrounding pixels. Hence, a good hyperspectral image classification method should consider both the spectral and spatial features together. Fig. 5, 6 and 7 show preprocessed images and ground truth images for the training of the SVM model. We split the data into train and test (80:
20) to ensure the classification algorithm is able to generalize to unseen data well. For multi-class classification, the regularization parameters C and γ are decided. We also choose pseudo-random number generation for shuffling the data for probability estimates.52,53 The major task of the confusion matrix, also known as the error matrix, is to compare whether the classification result matches the ground truth or not.
Each pixel in hyperspectral images contains a spectrum covering the whole spectral range of the hyperspectral imaging system. In order to better evaluate the performance of this method in the experiment, Overall accuracy (OA) is used to evaluate the classification performance of the model. Overall accuracy refers to the percentage of correctly predicted sample pixels compared to the total number of pixels. The number of correctly classified pixels is distributed along the diagonal of the confusion matrix, and the total number of pixels is equal to the total number of pixels of the ROI.
![]() | (4) |
The advantage of acquiring spectral and spatial information simultaneously provides the feasibility of predicting chemical, physical, and category information of each pixel within the samples, based on the established calibration models. The average spectra from all pixels within the sample were used for visualization in this study. Generating classification maps made it possible to visualize category information for the samples, which was beneficial for a convenient and intuitive distinction of different nanoparticle types.
Particle size | 500 nm | 300 nm | 100 nm | 44 nm |
---|---|---|---|---|
Overall accuracy (%) | 99.9 | 99.9 | 99.9 | 99.9 |
The main objective is to identify and classify different NPs based on the signature of spectrum profiles. In the training phase of this work, 80% of pixels were randomly selected and labeled by ground-truth data to determine the weights and biases. The other 20% of pixels were then used in the testing phase to evaluate the classification performance.
Table 1 shows the classification accuracy of two different methods for the single particle classification case. The overall accuracy of our SVM model after applying SAM is improved by over 20% compared with the traditional SVM model, which is visually shown in Fig. 5. Fig. 5 shows the input data including (a) the original HSI image before image preprocessing, (b) the ground truth image to train the machine learning methods, (c) a classification example of the traditional SVM method, (d) a classification example of our SAM-SVM method, (e) the confusion matrix of the traditional SVM result, and (f) the confusion matrix for our method. The results highlight the significance of the preprocessing step to achieve a better classification performance. The overall accuracy is 99%, which is the ratio of pixels correctly classified and the total number of testing pixels on the three datasets, corresponding to the single particle classification case. The confusion matrix analysis shows that the SAM-SVM model is performing well. The classification results in Fig. 5(d) clearly show that the present method works well in the presence of noisy points or missing data points. The classification accuracy is much higher than the traditional SVM method, as shown in Table 1.
Fig. 6 illustrates the classification results for two different nanoparticles analyzed by using our SAM-SVM method. Two classes of interest are considered, namely: NP1 (europium chalate fluorescent nanoparticle) and NP2 (yellow green fluorescent nanoparticle). The ground truth image was generated by applying the SAM method based on the reference spectral profile presented in Fig. 4. Accurate classification becomes particularly demanding in the analysis of nanoparticle mixtures due to overlapping particles, necessitating the segmentation of individual nanoparticles. To address this issue and minimize the misclassification of overlapping particles, we employed the SVM classification method using spectral features. In Fig. 6, we can see that the particle classification prediction accuracy exceeds 90%. Only a small number of pixels were found to be misclassified, attesting to the robustness and effectiveness of our method.
![]() | ||
Fig. 6 Classification maps for the two fluorescent particle mixture case (a) raw image (b) ground truth image (c) classification result (d) the confusion matrix for our result. |
Fig. 7 illustrates the classification results for three different nanoparticles. Three classes of interest are considered, namely: NP1 (europium chelate fluorescent nanoparticle), NP2 (yellow green fluorescent nanoparticle), and NP3 (plum purple fluorescent nanoparticle). We can find that the particle classification accuracy exceeds 90%.
![]() | ||
Fig. 7 Classification maps for the three fluorescent particle mixture case (a) raw image (b) ground truth image (c) classification result (d) the confusion matrix for our result. |
While SVM methods offer a solid foundation for classification, they typically struggle with the high degree of spectral similarity found among nanoparticles. In specific, it was limited to classifying overlapped nanoparticles with multiple types. Similarly, approaches employing CNNs, despite their prowess in feature extraction, require substantial labeled datasets and are computationally intensive. In summary, our SAM-SVM model exhibits outstanding nanoparticle classification performance, driven by its robust preprocessing steps and optimized classification parameters. The SAM-SVM model demonstrates exceptional performance, particularly in the challenging task of classifying nanoparticles with overlapping features.
The principal advantages of this technique are its non-contact, non-invasive, and label-free nature. This not only establishes its practicality but also positions it as a transformative tool for the characterization of nanoscale materials. In future studies, this technique can offer a reliable and efficient method for distinguishing nanoparticles in diverse practical applications. In the pharmaceutical industry, our method can significantly enhance the development and quality control of drug delivery systems by enabling the characterization of nanoparticle carriers. Similarly, in materials science, the ability to accurately classify nanoparticles opens up new avenues for creating advanced materials with tailored properties. We aim to broaden the scope of our method to include the classification of various biological particles. This expansion has the potential to catalyze substantial progress in biomedical research, particularly in the characterization and development of nanoparticle-based therapeutics and diagnostic tools. By harnessing the capabilities of HSI and ML, our work lays a foundation for cutting-edge research at the intersection of nanotechnology and biomedical sciences, promising breakthroughs in diverse applications.
This journal is © The Royal Society of Chemistry 2024 |