Metrology of convex-shaped nanoparticles via soft classification machine learning of TEM images

The shape of nanoparticles is a key performance parameter for many applications, ranging from nanophotonics to nanomedicines. However, the unavoidable shape variations, which occur even in precision-controlled laboratory synthesis, can significantly impact on the interpretation and reproducibility of nanoparticle performance. Here we have developed an unsupervised, soft classification machine learning method to perform metrology of convex-shaped nanoparticles from transmission electron microscopy images. Unlike the existing methods, which are based on hard classification, soft classification provides significantly greater flexibility in being able to classify both distinct shapes, as well as non-distinct shapes where hard classification fails to provide meaningful results. We demonstrate the robustness of our method on a range of nanoparticle systems, from laboratory-scale to mass-produced synthesis. Our results establish that the method can provide quantitative, accurate, and meaningful metrology of nanoparticle ensembles, even for ensembles entailing a continuum of (possibly irregular) shapes. Such information is critical for achieving particle synthesis control, and, more importantly, for gaining deeper understanding of shape-dependent nanoscale phenomena. Lastly, we also present a method, which we coin the “binary DoG”, which achieves significant progress on the challenging problem of identifying the shapes of aggregated nanoparticles.


S1. HU MOMENT SOFT CLASSIFICATION METHOD
In this section, we explain in detail the HuSC methodology. This methodology involves two main parts as shown in Fig. 1 in the main text. The first part is the image preprocessing that converts TEM images into particle contour datasets, which are then input to the classification process. The second part is the soft classification algorithm which classifies the nanoparticle contours and ultimately assigns a set of class "responsibilities" to each contour. S1. Image preprocessing S1. Background removal The first step of pre-processing is to separate the particles from the background in the image. The raw image is firstly filtered and smoothed using Fourier filtering method to reduce the image noise. In TEM images, variations of the background can arise from variations in the thickness of the TEM support film, or from non-uniform electron microscope illumination. In order to obtain reliable background-removed images, we adopt a refined approach of dynamic thresholding [1]. The use of the dynamic thresholding is important in achieving accurate separation of the particles from the background. Rather than assuming a global (single value) background for the TEM image, this method measures the local background value within sub-regions of the image before applying the background removal.

S2. Binary image
The background-subtracted image is converted to a binary image by assigning a value 0 to all pixels with a value less than or equal to the background, while assigning a value of 1 to all pixels with a greater value. The resulting binary image forms an unambiguous representation of the locations of the nanoparticles for subsequent processing.

S3. Particle contour identification
Using the binary image, the particles' contours (outlines) are identified using the Canny edge detection algorithm [2]. The algorithm is applied to a Gaussian-smoothed version of the binary image. Edge detection is based on the local gradients within the image, with the edges located at the pixels with locally maximum values of the magnitude of the gradient.

S4. Filtering out overlapping particles
Filtering is applied to remove the contours of overlapping particles, as these contours do not represent the shapes of individual particles. Ensuring that all particle contours represent isolated particles (i.e., not overlapping or aggregated) is necessary for the correct operation of the classification algorithm. We use a straight-forward convexity filter method to filter out the aggregated/overlapping particles. We define the convexity of a contour as the ratio of the area enclosed by the contour and the area enclosed by the convex hull (closest convex approximation) of the contour. This approach assumes that the contours of isolated particles are convex (that is, that the particles themselves have a convex shape) but that contours of aggregated particles are non-convex. Hence by measuring and thresholding the convexity of the particle contours, aggregates can be detected [3], and then removed from the dataset.
A typical threshold value is around 95%.
If the shapes of individual particles are non-convex, then it is possible to devise other simple filters which can work well. For example, since the area enclosed by the contour of a particle aggregate tends to be significantly larger than that of an isolated particle, thresholding of the area values can form an effective filter. A more sophisticated approach using hard classification of contour shapes is also possible.
In Section 3.4 of the main text, we show that the requirement of isolated particles can potentially be lifted by using a "binary DoG" method in which the contours of overlapped particles can be correctly identified. The binary DoG method is not limited to convex shapes.

S1. Hu moments as shape descriptors
For a grey-scale image (or 2D probability density function), ρ(x, y), the (p + q)th order image moments are defined as (S1) In the context of closed contours representing particle shapes, we can assume that ρ(x, y) is binary valued so that the integral is limited to the area A enclosed by the contour S3 Given a contour specified as an order list of coordinates, Green's theorem in the plane greatly aids the calculation of the above integral.
The Hu moments H p,q are, by design, invariant with respect to translation, scaling and rotation [4]. To obtain these, we first define centered moments m ′ p,q , which are the moments of the contour translated such that its center of mass (m 1,0 , m 0,1 ) = (0, 0) (this achieves translation invariance). Using the centered moments, we define normalized moments (this achieves scaling invariance). Finally, using the normalized moments, the seven Hu moment invariants are given by The first two Hu moments have the simple interpretation given in the main text. Generally, we find that discrimination of more complex shapes requires the use of higher order Hu moments.
There also exist a range of other well-established shape descriptors which are compatible with soft classification, as cited in the main text.

S2. Gaussian mixture model soft classification method
Given a set of N particle contours, we can represent each one of them by its (possibly reduced subset of) Hu moments, as described above, so that the set of N contours is represented by a set of points x 1 , . . . , x N in the multidimensional "Hu space." This concept is quite general in that it applies equally well to other shape descriptors, so that we may refer S4 to the set of contours as represented by a set of points in a more general multidimensional "shape space." We assume that the set of (data) points x 1 , ..., x N represents some underlying probability distribution p(x) in the shape space. Following the treatment outlined in Ref. [5], soft classification using a Gaussian mixture model consists of finding the most likely superposition of K Gaussian densities that best reproduces the underlying probability distribution p(x).
The superposition of Gaussians is given by where each Gaussian density N has its own mean µ k , covariance Σ k , and mixing coefficient (weighting) π k . For a given set of points data points x 1 , ..., x N , optimization of the Gaussian mixture model, that is, optimization of the quantities π = π 1 , . . . , π K , µ = µ 1 , . . . , µ K , and Σ = Σ 1 , . . . , Σ K , can be achieved by a maximum likelihood solution, where the loglikelihood function is given as The essential goal of the above procedure is that it enables a soft classification of each data point (contour) x n in terms of a set of K responsibilities p(k|x n ), defined as The responsibility p(k|x n ) is the probability of the unobserved mixture component k given the observed data point x n . Being probabilities, the responsibilities satisfy 0 ≤ p(k|x n ) ≤ 1 and ∑ K k=1 p(k|x n ) = 1 (which is easily seen from their definition above). In the three examples given in the main text, we use K = 2 to accomplish the soft classification of particle contours in Hu space.

S2. ANALYSIS USING MULTIPLE TEM IMAGES
Here we present an extended example of the HuSC metrology analysis that was presented in Section 3.1 of the main text. This extension is simply intended to demonstrate that the method produces consistent results when applied to multiple TEM images, as would be expected, and as would be necessary in an application of the method to perform a detailed analysis of a particular nanoparticle system. Here we show the results obtained from an  Fig. 2). The 4 TEM images of the UCNP sample, shown in Fig. S1(a), were acquired using exactly the same batch of nanoparticles. As in Section 3.1, the HuSC method was applied assuming two shape classes. Once again, we see that in this case, where the particles have distinct, well-defined shapes, the soft classification reduces to an essentially hard classification of particles shapes: hexagonal (magenta) and rod-shaped (cyan). Detailed shape and size analysis, given as the particle shape eigenvalue scatter plot in Fig. S1(d), and the size histogram in Fig. S1(e), respectively, show essentially the same distribution as given in Fig. 2, as expected. Again, the particles in this example fall very close to the ellipse line (dashed line in Fig. S1(d)), with class k = 1 (magenta) focused around aspect ratios in the range 1.0-1.1, and class k = 2 spread between 1.5 to 2.0 (except for one particle). The size distribution of k = 1 is around 58-68 nm, and that for k = 2 around 46-52 nm. These distributions are similar to those presented in Fig.2, with a minor difference of a single particle in class k = 2 that is much smaller (≈ 32 nm) and has a smaller aspect ratio of ≈ 1.25 (seen near the center of lower-left TEM image). A summary of the statistics is given in Table S1, to be compared with Table 1 in the main text.