Open Access Article
Jonas Bals and
Matthias Epple
*
Inorganic Chemistry, Centre for Nanointegration Duisburg-Essen (CENIDE), University of Duisburg-Essen, 45117 Essen, Germany. E-mail: matthias.epple@uni-due.de
First published on 19th January 2023
The automated analysis of nanoparticles, imaged by scanning electron microscopy, was implemented by a deep-learning (artificial intelligence) procedure based on convolutional neural networks (CNNs). It is possible to extract quantitative information on particle size distributions and particle shapes from pseudo-three-dimensional secondary electron micrographs (SE) as well as from two-dimensional scanning transmission electron micrographs (STEM). After separation of particles from the background (segmentation), the particles were cut out from the image to be classified by their shape (e.g. sphere or cube). The segmentation ability of STEM images was considerably enhanced by introducing distance- and intensity-based pixel weight loss maps. This forced the neural network to put emphasis on areas which separate adjacent particles. Partially covered particles were recognized by training and excluded from the analysis. The separation of overlapping particles, quality control procedures to exclude agglomerates, and the computation of quantitative particle size distribution data (equivalent particle diameter, Feret diameter, circularity) were included into the routine.
Usually, SEM images are analysed by experienced human reviewers to count and measure the depicted particles, e.g. to determine the particle size distribution of a given sample.2 Another particle property which is generally of interest is the particle shape. If images show an assembly of particles with different shapes, their classification into different categories (e.g. sphere, cube, rod, or triangle) is usually done manually. This is a time-consuming process which may be biased by a human reviewer.3 To accomplish a fast and objective analysis, there is a strong need for non-biased, rapid methods to analyse SEM images.4
Machine learning has been applied to detect and identify objects of high variability in SEM images.5,6 In principle, this permits a more objective and usually much faster analysis than a human assessment.7–12 Semantic segmentation of images into coherent areas of the same class (e.g. foreground and background) is possible by convolutional autoencoders (CNNs) like the UNet architecture.13 UNet and its various developments were highly successful in biomedical image processing, and several attempts were made to apply it to nanoparticle analysis as well.14 Further efforts to classify particles according to their shape were carried out by dedicated classification networks.15,16
The workflow presented here includes two steps performed by convolutional neural networks (CNN) to accomplish a full analysis of SEM images of nanoparticles. The first step is to label each pixel as either foreground or background (segmentation). Coherent areas of foreground pixels (“particles”) are then cut out and processed by a second neural network to determine the shape of the individual particle (classification). In this step, partially covered particles are identified and removed from the classification. A subsequent analysis of particle size, diameter, and circularity of all particles shown in an image is finally performed.
The loss function measures the difference between prediction and true label. A small difference between prediction and true label leads to a small loss and a better performance of the network. Each presentation of all images of a given training dataset to the network is called an epoch. The network parameters are changed several times during one epoch. The adaptation is performed after a mini-batch (a part) of images has been shown to the network. An optimizer algorithm uses the loss function to gradually adjust the parameters of the network. A high loss results in a strong adaptation of the network parameters. Training ends when the network is no longer improving its adaption to the training data, i.e. the loss function does not further decrease. This can require several hundred training epochs.19
Two different workflows were generated here, one for SE images and one for STEM images. This was necessary because the image types represented by SE and STEM images are strongly different. Both workflows shared the UNET++ architecture for segmentation.20
The segmentation training dataset consisted of 30 SE and 12 STEM images, respectively. We also used 32 SE images published by Ruehle et al.21 Validation datasets contained 16 SE images and 3 STEM images, respectively. These images had typical sizes of 2000 × 1600 pixels. The particles in both image types were typically separated by 1 to 3 pixels, i.e. the particle density was high (as common in scanning electron microscopy).
Because the input of UNet++ was fixed to 512 × 512 pixels, we randomly cut out patches of the training images. For data augmentation, we artificially altered each image by random rotation, flipping, intensity variation, shearing, and zooming (up to 15% each) before cutting out image patches. The number of patches per image depended on the image size. Bigger images resulted in more patches. Approximately 450 patches were cut out of the 30 SE images. Approximately 180 patches were cut out of the 12 STEM images. The random extraction of patches from each image was performed in each epoch. An epoch was finished after each patch of each image was processed once by the CNN. Fig. 1 illustrates this step.
![]() | ||
| Fig. 1 Representative SEM image from the training dataset for segmentation containing ZnO nanorods (2048 × 1886 pixels; SE mode). Orange boxes show typical cut-out patches used for training. | ||
Several problems emerge when training a neural network with SE and STEM images of nanoparticles.
First, in STEM images, the particles can touch or overlap and even form continuous aggregates (agglomerates) with no separating background between them. These aggregates cannot be separated into individual particles. For a precise size analysis, agglomerates must therefore be excluded.3 We achieved this by explicitly training the classification network to identify agglomerates.
Second, in SE images, the particles can also overlap but borders of particles are usually well distinguishable. The particles are separated by a thin line, sometimes as narrow as one pixel.
Third, particles in SE images can be present in different intensities ranging from dark (overshadowed particles) to bright (particles close to the electron detector; see Fig. 1). The neural network must adapt to these differences.
The problem extends to overlapping particles. Humans tend to impose an expected geometry to a partially covered particle, e.g. by assuming that a particle of which 80% are seen as sphere is implicitly considered as a sphere. However, if there is no information on the missing 20%, this assumption is based on subjective expectations and not on experimental data. Of course, no method can supplement pixels which are not visible because they are behind another particle in the front. In that case, the true particle shape (ground truth) is unknown.
The preparation of samples is therefore decisive to obtain good segmentation and classification results. A low particle density on the sample holder usually leads to well separated particles. Nevertheless, the assessment of a large number of particles (>1000) is necessary to ensure a reliable statistical representation of a given sample. Unfortunately, images showing many overlapping particles (often by solvent evaporation during sample preparation) are most common when particles are depicted by SEM (see Fig. 1).
Ronneberger et al. introduced weight loss maps to overcome the problem of a narrow separation of objects.13 Weight loss maps are matrices of the same size as the image that give each pixel an individual weight. In our case, the weight of each background pixel is given by its distance to the next particle edge (distance-based weight loss maps). These pixels are particularly important to separate adjacent particles. By giving these separating background pixels a higher weight, we forced the neural network to focus on the immediate background around each particle (Fig. 2). Weight loss maps were computed for ground truth annotation masks which were manually prepared before the network was trained. Due to the higher weights for separating background pixels between particles, the network predominantly learned to segment those areas. Thus, in the application of networks to any new image, no weights were needed because the network was already sensibilized (i.e. trained) to those areas. Notably, calculating weight maps requires considerable computing time.
![]() | ||
| Fig. 2 SE image depicting SiO2 microspheres (A1) with corresponding segmentation map (B1), and distance-based weight loss map calculated by eqn (1) (C1). STEM image of Au nanoparticles (A2) with segmentation map (B2) and intensity-based weight loss map (C2). Each background pixel (black) of the segmentation map was assigned with an individual weight loss depending on the distance to the edges of the two nearest particles. These weights varied between 1 and 11 for SE images and 1 and 13 for STEM images. The general weight of foreground pixels (white) was 1. | ||
In STEM images, background pixels between two touching particles can be very bright (see Fig. 2-A2). The model erroneously merges such particles because it cannot identify a separating border between them. To train the model to distinguish between bright background and particles we included the image intensity into the training process. The combination of distance-based weight loss maps and the pixel brightness led to a model which successfully separated touching particles. We denote this approach intensity-based weight loss maps in the following. Eqn (1) shows the modified version of Ronneberger et al.13 for the weight of each pixel w(x)
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
![]() | (6) |
| Pixel error = 100 − accuracy | (7) |
| CNN | Type | Precision | Recall | Accuracy | IoU | F1 | Pixel error | Rand error |
|---|---|---|---|---|---|---|---|---|
| Unet | SE (distance-based weight loss maps) | 97 ± 1% | 97 ± 2% | 97 ± 2% | 94 ± 2% | 97 ± 1% | 3 ± 2% | 2 ± 1% |
| STEM (intensity-based weight loss maps) | 96 ± 4% | 93 ± 5% | 99.5 ± 0.5% | 90 ± 6% | 94 ± 3% | 0.5 ± 0.5% | 1.0 ± 0.9% | |
| Unet++ | SE (distance-based weight loss maps) | 97 ± 1% | 96 ± 5% | 97 ± 7% | 93 ± 2% | 96 ± 5% | 3 ± 3% | 2 ± 2% |
| SE (intensity-based weight loss maps) | 98 ± 2% | 93 ± 3% | 96 ± 2% | 91 ± 3% | 95 ± 1% | 4 ± 12% | 3 ± 16% | |
| SE (particles only) | 98 ± 2% | 95 ± 3% | 94 ± 2% | 93 ± 3% | 96 ± 1% | 13 ± 12% | 18 ± 16% | |
| STEM (distance-based weight loss maps) | 96 ± 4% | 91 ± 7% | 99 ± 1% | 88 ± 1% | 93 ± 7% | 1 ± 4% | 1.1 ± 0.6% | |
| STEM (intensity-based weight loss maps) | 96 ± 6% | 96 ± 4% | 99.7 ± 0.4% | 92 ± 7% | 96 ± 4% | 0.3 ± 0.4% | 1 ± 1% | |
| STEM (particles only) | 99 ± 1% | 96 ± 4% | 96 ± 3% | 94 ± 4% | 97 ± 2% | 13 ± 28% | 8 ± 12% |
In addition, we calculated the Rand error.24 The Rand error measures the degree in which two segmentations (the true label and the model prediction) disagree whether a given pixel belongs to the same object in both segmentations. Therefore, the Rand error measures how well particles are separated, with 0% indicating a good separation and 100% an unsuccessful separation.
The introduction of intensity-based weight loss maps significantly improved the segmentation performance of UNet++ for STEM images. However, SE images did not benefit from the application of intensity-based weight loss maps. SE images usually have a wide distribution of grayscales. In SE images, background and particles share the same range of pixel intensity, whereas in STEM images background and particles are strongly different (two distinct peaks of pixel intensities). The background between touching particles in STEM can reach the same intensity as the particles. This is not the case for SE images where particles often surrounded by a darker rim due to lower electron excitation. The models UNet and UNet++ performed almost equally on both types of images. UNet++ uses only 7.7 million parameters compared to 31 million parameters of UNet.25 Thus, it is much faster than UNet without compromising the segmentation ability. Overall, the segmentation procedure was very efficient. The introduction of intensity-based weight maps led to a significant improvement in the IoU by 4% for STEM images. This is due to the better separation of touching particles by segmentation of background pixels.
The classification of single particles was performed by two different CNNs: Alexnet for STEM images and ResNet34 for SE images. We initialized both networks according to He et al.26 As optimizing algorithm, we used ADAM27 with the default settings of Tensorflow.28 As loss function, we used the cross-entropy loss function introduced by Fisher.29 Both networks were tested against their validation dataset.
After segmentation of raw particles from the background, the particle bounding boxes were slightly enlarged and cut out from the SEM image. For each particle, the cut-out area was larger than the close-fitting bounding box to provide surrounding information for the following classification network (Fig. 3). As classification networks, we used AlexNet for STEM images because of the limited data (<1000 images per class).19 Larger networks require a more variable dataset which was not available for STEM images. A variation in terms of particle orientation and colour distribution is not given in STEM images which are just 2D representations with a very limited distribution of either very dark (background) or very bright (particle) grayscales. For SE images, the larger ResNet34 was the preferred option due to its performance in the ILSVRC 2015.30 For the even deeper variants like ResNet50, we observed a lack of data variation. We were not able to train ResNet50 to the same extent as it was possible with ResNet34.
000 images of manually classified SiO2, ZnO, Ag, Au, and TiO2 nanoparticles in the classes sphere, sphere-like, cube, rod, and covered particles were used (80% for training, 20% for validation; ground truth). Two different classes for spherical objects were introduced. The class “sphere” comprised round, ball-shaped objects, whereas the class “sphere-like” comprised a range of deformed, elongated, and indented particles. Data augmentation and class weighing was performed as with AlexNet (see above).
| Accuracy per class | Covered | Circle/sphere | Sphere-like | Rod | Triangle | Square/cube | Pentagon | Hexagon | Agglomerate | Average |
|---|---|---|---|---|---|---|---|---|---|---|
| STEM (2D) | 97% | 96% | — | 95% | 97% | 94% | 91% | 95% | 91% | 95% |
| SE (3D) | 93% | 86% | 89% | 99% | — | 96% | — | — | — | 93% |
![]() | ||
| Fig. 4 Confusion matrix for the shape classification of nanoparticles. Left: results for STEM images; right: results for SE images. | ||
The classification networks were then applied to images which were neither used in training nor in validation. Fig. 5 shows representative results. Because both training datasets contained partially covered particles, both networks were trained to identify and exclude partially covered particles. This procedure assured that particles within one class were similar. The shape classification by CNNs gave an overall high accuracy. The application of the validated networks led to an unexpected behaviour when classifying particles whose appearance differed from the trained morphologies. Images of particles with shapes unknown from the training like stars or octahedra showed similar probabilities for many classes, i.e. the probability distribution was almost evenly spread among a number of classes. In that case, the classification would have been made by the network based on small differences between the class probabilities (e.g. 34% vs. 30% vs. 28%), i.e. decided by a few percent of probability or less. Therefore, as an additional quality control we introduced a confidence limit of 75%. If the probability was below the confidence limit, the particle would be classified as “unknown”. As confidence limit we used the typical human certainty when classifying a given particle of about 75%.3 Note that the classes for particles identified as “covered” and “agglomerate” (STEM only) were defined as individual classes during training. This assignment was not perfect which is not surprising because there are many different shapes for partially covered or agglomerated particles. The incorporation of partially covered particles into the classification and the subsequent numerical particle analysis would have strongly compromised the resulting data. As typical case, a partially covered circle could appear as sickle-like object, leading to a classification as rod and wrong numerical input data.
In general, SEM images without any covered particles are difficult to acquire as most real images contain covered particles. Thus, some degree of covered particles must be tolerated by any practically applicable classification model. The average false classification rate was 5% for STEM images and 7% for SE images (see Table 2). Thus, covered particles classified by the segmentation model as foreground were classified by a 95% chance for STEM and 93% for SE as background particles/covered particles by the classification model. We consider this as an acceptable error. Furthermore, we did not find deviations in the particle diameter size distributions by including minimally covered particles.
| Perimeter/nm | Area/nm2 | Convex hull area/nm2 | Circularity/— | Minimum Feret diameter/nm | Equivalent circle diameter/nm | Circle diameter by human evaluator/nm | |
|---|---|---|---|---|---|---|---|
| Average | 625 | 35 489 |
36 625 |
0.97 | 204 | 213 | 230 |
| Std. dev. | 160 | 4955 | 4630 | 0.03 | 19 | 14 | 17 |
| Number of analysed particles | 1480 | 1480 | 1480 | 1480 | 1480 | 1360 | 100 |
The approach presented here compares well with other methods for image segmentation and classification.11 We have shown previously that a more classical approach of machine learning using random forest classifier is not capable of nanoparticle segmentation in STEM images.3 Furthermore, a classification of shapes by principle component analysis of morphological features is not sufficient to distinguish between particles.33 In general, deep neural networks are superior to shallow classification and regression algorithms like watershed segmentation34 or custom-made feature detectors35 for image segmentation or classification as summarized in ref. 36.
:
20) before SEM analysis.
| This journal is © The Royal Society of Chemistry 2023 |