Abdelwahab
Kawafi
*a,
Lars
Kürten
b,
Levke
Ortlieb
c,
Yushi
Yang
c,
Abraham
Mauleon Amieva
a,
James
Hallett
d and
C. Patrick
Royall
*b
aSchool of Physiology, Pharmacology, and Neuroscience, University of Bristol, Bristol, BS8 1TD, UK
bGulliver UMR CNRS 7083, ESPCI Paris, Université PSL, 75005 Paris, France. E-mail: paddy.royall@espci.psl.eu
cH. H. Wills Physics Laboratory, University of Bristol, Bristol, BS8 1TL, UK
dDepartment of Chemistry, School of Chemistry, Food and Pharmacy, University of Reading, Reading, RG6 6AD, UK
First published on 20th May 2025
Colloidoscope is a deep learning pipeline employing a 3D residual U-net architecture, designed to enhance the tracking of dense colloidal suspensions through confocal microscopy. This methodology uses a simulated training dataset that reflects a wide array of real-world imaging conditions, specifically targeting high colloid volume fraction and low-contrast scenarios where traditional detection methods struggle. Central to our approach is the use of experimental signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), and point-spread-functions (PSFs) to accurately quantify and simulate the experimental data. Our findings reveal that Colloidoscope achieves superior recall in particle detection (it finds more particles) compared to conventional methods. Simultaneously, high precision is maintained (high fraction of true positives). The model demonstrates a notable robustness to photobleached samples, thereby prolonging the imaging time and number of frames that may be acquired. Furthermore, Colloidoscope maintains small scale resolution sufficient to classify local structural motifs. Evaluated across both simulated and experimental datasets, Colloidoscope brings the advancements in computer vision offered by deep learning to particle tracking at high volume fractions. We offer a promising tool for researchers in the soft matter community. This model is deployed and available to use pretrained at https://github.com/wahabk/colloidoscope.
Recently, the case for using smaller colloids has been made,23 which provides access to new phenomena, particularly in the context of the glass transition.21,24,25 The Crocker and Grier method has been implemented in the popular Python package Trackpy.26 From here on this will be referred to as TP.
The method of Crocker and Grier19 constitutes a simple algorithm that takes the “pseudo diameter” (known as w) to be larger than the true radius but smaller than the true diameter of the particles. This is used to preprocess the image with a boxcar filter to remove the background followed by grayscale dilation27 to detect the centroids. Then criteria such as the integrated sphere brightness, the radius of gyration of the particle image or geometric measures such as the eccentricity are used to refine particle positions.
More sophisticated approaches are also available.28,29 Among the pioneers of the field, van Blaaderen and Wiltzius30 sought to improve the tracking accuracy in the axial direction by fitting a Gaussian to the integrated intensities in each plane that constituted a given colloid. This early work examined hard-sphere glasses and crystals, i.e., colloidal solids in which diffusion could be neglected.30 Even though this method made tracking easier, the slow scan rates needed to acquire images with low noise left dynamical information beyond reach. More recently, Jenkins et al.31 pushed the limits of the technique to identify contacts between colloids through ultra-high precision coordinate location. Such precision was achieved by first determining an empirical image of a colloid, which could then be compared to the original image. Further systematic improvements were made by Gao and Kilfoil.32 Despite advances in imaging technologies such as stimulated emission via depletion (STED),33,34 the software to detect the colloids has lagged behind.
Tracking colloids is often specimen dependant,35,36 leading to fragmented tracking methods used by different researchers. A further issue regarding particle tracking, that perhaps has received less attention than it might, is that it can be something of a “dark art”. Analysis can be slow, tedious, expensive, and remains subjective due to the extensive number of user-tuned parameters in tracking software. Many datapoints are discarded because they are of insufficient quality for tracking methodologies. Improved detection algorithms can allow one to use lower laser light, line averaging, resolution, and increasing frame rates. And in this way, we show that detecting smaller particles without needing improved imaging hardware becomes possible. There exists a need for the implementation of state-of-the-art computer vision methodologies for confocal microscopy that can be as specimen agnostic as possible. This desire to produce a method which removes the subjective tuning or parameters in existing approaches is a key aim of the present work.
• Recall is the proportion of particles in the sample which are successfully detected.
• Precision is the fraction of those detected particles which correspond to (real) particles in the sample.
As proposed by Bailey et al.,37 one might imagine that machine-learning techniques could be applied to the vexing problem of obtaining particle coordinates from microscopy images. It is natural to enquire as to the opportunities offered by the explosion in machine learning (ML). ML has revolutionised image analysis in many settings from earth observation38 to molecular imaging.39 However, one area in which there is potential for development is in tracking of concentrated colloidal dispersions imaged in 3D.
Deep learning for computer vision classically involves convolutional neural networks (CNNs). Convolutions are used to replace the fully connected layers of the network by constraining neurons to a specific kernel size – e.g. 3 × 3. This induces a “receptive field” where a hierarchy of neurons analyse each section of an image separately. This reduces the total number of connections and, therefore, parameters in the model, as well as providing translational invariance. We do not consider vision transformers due to their data inefficiency.40
A common approach to detecting colloids in microscopy images is using convolutional neural networks (CNNs), particularly in semantic segmentation. In this method, each pixel of the image is classified into specific categories, such as particle (foreground) or solvent (background). Object detection techniques, such as bounding box detection and instance segmentation, are also frequently employed.36,41–44 These methods, including a Mask-RCNN, detect individual instances of objects by localizing them in the image. However, using Mask-RCNN for colloid detection typically requires a slice-based approach, where the model is applied to 2D slices one at a time. The individual slice predictions are then combined and used as labels for a Watershed algorithm. Benchmarks have shown that the 3D U-Net outperforms this slice-based Mask-RCNN approach.45 This advantage is often attributed to the 3D U-net's ability to leverage the contextual information across slices, whereas Mask-RCNN's region proposal methods, which rely on fully connected layers, lose the benefit of translational invariance inherent to fully convolutional models.
In particle tracking, two main strategies are typically used. The first is semantic segmentation, which is commonly employed in applications like cell counting.44 After segmentation, a post-processing technique, such as Watershed, can be applied to delineate distinct regions and extract particle positions. The second approach, which our work builds on, is heatmap regression. This method, widely used in dense pose prediction (e.g., detecting body poses in images),46 involves defining the problem as detecting a Gaussian distribution in the image and applying L1 regression to fit the heatmap. Different labeling methods are discussed in the Methods section 2.3.
The motion of colloids is an important feature to study. The movement of nearby particles can be correlated.47,48 Often deep learning based techniques actually use linking to improve segmentation in a similar fashion to Newby et al.43 For instance, DistNet2D49 predicts both the Euclidean Distance Map and Geodesic Distance to the Centre Map of the image. These are used as seeds for a Watershed algorithm followed by postprocessing to reduce false positives. Excitingly, DistNet2D uses temporal information by predicting 7 frames at a time (3 before and 3 after). The limitation of 7 2D frames is due to GPU memory, which highlights the difficulty in temporally tracking 3D data. Other methods exist for the detection of colloids using deep learning, but focus on different microscopy approaches such as holographic imaging or stochastic optical reconstruction microscopy (STORM).50,51
This challenge of applying ML methods to PRS has thus received some attention. DeepTrack44 has made great headway in tracking a wider variety of colloids with different shapes such as crescents. However, this method is focused towards dilute samples in 2D. As alluded to above, for colloidal systems, 2D analysis in particular is quite well dealt with using conventional methods. We believe that the case of 3D, hampered by the relatively poor resolution in the axial direction of confocal microscopy, is a significant challenge, especially for concentrated systems. Furthermore, DeepTrack does not provide pretrained hyperparameters and takes the approach of training a unique model for each specimen. When compared to Newby et al.,43 this brings up the question of: how much to genaralise? Generalising to different datasets can aid in overfitting, but will it sacrifice precision or recall?
There is still not a sufficient neural network approach that accurately locates individual particles in a dense colloidal suspension in 3D, to the best of our knowledge. This is because obtaining labeled data, where the 3D fluorescent images are annotated with the exact 3D locations for each individual colloid, is difficult. The 3D locations from both existing tracking software and human operators are not accurate enough to train a sufficient model.
The data deficiency can also be addressed by using simulated data to train the neural network. Applying molecular dynamics (MD) computer simulation, we can generate an ensemble of coordinates for densely packed particles, by repeatedly collecting particle locations from a simulated trajectory. With the simulated coordinates, we can further simulate their corresponding microscopy images, by convolving the pulse functions located at the coordinates with suitable kernels such as the point spread function (PSF).28 A simulation approach facilitates investigating the model's sensitivity to the training data parameters.
The model for this project is a simple 3D U-net.53 A U-net comprises a fully convolutional autoencoder with long skip connections from the encoder to decoder that utilizes dense features without losing spatial information. A residual encoder is used54 with a block approach to test model depth. Loss functions are crucial for training CNNs. They are particularly crucial for small and or dense particle detection, both of which are applied here. Since this model takes a heatmap approach, a simple L1 loss function is used.
The volume of a sample which is imaged forms a compromise between resolution, working distance, as well as photobleaching. A confocal image is typically 16 megapixels (256 pixels cubed or larger) which constitutes our input. Such memory demand exceeds the capacities for most personal computing platforms – both the model and the image need to be run on a GPU for computational tractability. This requires storing both the model and the image in GPU memory simultaneously. Tiling – or a patch based pipeline – is crucial for this work, where each image is broken into smaller regions of interest for predictions. Tiling inference saves on GPU memory and allows for a larger extent of the image in the z direction (i.e. a greater number of xy image planes) when compared to a 2D model with slice-wise inferencing. This method is also more amenable to a larger batch size during training (for improved normalisation), as well as test-time augmentation.
An Attention U-net55 was tested to measure its performance against the standard U-net. The original work introduces attention as a tool to combat class imbalance – defined as an imbalance of foreground and background in the image. This leads to poor model performance to the training biasing towards one class. Attention aims to bias the model towards the important pixels in the image regardless of class. Attention could still be crucial for dense detection, and would be dependent on the labelling method (binary segmentation vs. heatmap). For instance, if each particle prediction is a Gaussian with a small fixed width (e.g. 5 pixels), class imbalance could still be an issue where most of the volume is background when ϕ is low. Conversely, in high volume fractions there might be negligible background in the image.
We optimised the parameters for model training with a grid search, to find out the best combination of learning rate, dropout fraction, activation functions and kernel sizes. Augmentation refers to supplementing the training data in a low-cost manner by e.g. flipping or zooming. While augmentation is used in sparsely labelled or data-limited applications, it still serves as a useful technique for regularisation (avoiding overfitting). The combination of computationally expensive simulation with cheap training-time augmentation allows for a robust model. Finally, histogram normalisation normalises brightness values which are usually skewed to the lower end to avoid photobleaching. A set of simulated and augmented images can be seen in Fig. A1 in the Appendix (ESI†).
To begin, we generate particle positions for amorphous colloidal systems by using a hard-sphere Monte-Carlo algorithm.56,57 To begin, many random positions are generated at a low volume fraction. Then the system is “crushed”, where the volume is slowly decreased, with the particles being randomly moved until no overlap is measured. This process is then repeated until the desired volume fraction is reached. These coordinate sets were generated using Hoomdblue.58,59
After the positions are generated at different volume fractions, to create the simulation the experimental data parameters must first be measured. The signal to noise ratio (SNR) is a useful measure for assessing signal quality. In signal analysis it is usually defined as the mean divided by the standard deviation. Since σ refers to the particle diameter, we define the sample mean brightness as with a standard deviation of s:
![]() | (1) |
![]() | (2) |
A more pertinent measure for object detection is the contrast to noise ratio (CNR). The CNR is commonly used in biomedical imaging to analyse the quality of different imaging modalities such as CT, MRI, PET, or ultrasound. Through various simulated datasets we found that the CNR is a salient measure for image quality, specifically for confocal imaging for colloids where tuning the laser illumination is important for photobleaching. This not only includes the mean of the foreground brightness and foreground noise, but also the background mean brightness b. All the particles in the same system are not the same brightness, with the most challenging detections being particles that are extremely close to each other or touching. The separation of these two particles will depend on the noise, but more so on the contrast between the foreground and background.
We define the CNR as:
![]() | (3) |
With the CNR in hand, we proceed to generate simulated images from the coordinates obtained following the method above. We begin by convolving the “fluorescent cores” of the simulation with the point spread function to mimic the optics of the microscope. The PSF depends on many factors, including the excitation and emission wavelength, refractive-index of the particles and solution, numerical-aperture, magnification of the lens, as well as pinhole radius and shape. In practice, however, these factors have minimal effect on the size of the PSF. The ability to resolve particle coordinates ultimately rests on the size of the particle, but the PSF itself does not depend on particle size. The final blur is a function of the size of the PSF when compared to the size of the particle. The PSF of each lens system is of a constant size, but to investigate the effect of apparent particle radius in pixels, the simulated PSFs are resized by calculating the target pixels per nm.
A crude method to approximate a PSF is using a Gaussian blur [Fig. 2(B)], where the filter variance is larger in the z dimension.30 An improvement is to use a least-squares Gaussian approximation method [Fig. 2(C)].60 The same settings are used for all PSFs since parameters such as excitation wavelength contribute minimally to the PSF shape. An excitation wavelength of 488 nm was used. However, we are unaware of any methods for a least-squares approximation of a STED PSF. Instead, the PSFs were sourced from Huygens Professional version 22.10 (Scientific Volume Imaging, The Netherlands, https://svi.nl). Crucially for this work, the Huygens software provides STED PSFs depending on the power of the xy (doughnut) and z depletion lasers.
Once the SNR and CNR are characterised, the positions are generated, and the PSF is ready, the simulated images can be generated. This section will describe how the positions are used to draw a simulated image of size 64 pixels cubed seen in Fig. 2(G–L). This simulation is simple but allows for easily parameterised datapoints. To generate this kind of simulated image, we proceed as follows:
• We begin with an image twice as large as the target then later coarsen the image by a factor of two for aliasing. Padding is also added to allow the Gaussian blur and point spread function to be convolved in the image and to not lose the signal in the edges.
• Perlin noise61 is used with 4 octaves to generate the background noise, the brightness of which is derived from the CNR measurements. This corresponds to noise intrinsic to the background data, rather than in the imaging process‡.
• The positions and diameters are used to draw the particles in the image.
• The image is zoomed in or resized for aliasing (zooming is more computationally efficient than adding artificial aliasing).
• The PSF is convolved with the simulated image.
• The image padding is cropped.
• Finally, Gaussian noise is added to simulate the foreground (shot) noise. This simulates noise introduced in the image acquisition.
The parameters of the training dataset are sampled from the distributions described in the ESI† (see Table S1).
To isolate the effect of each simulation parameter, a randomly generated parameter set akin to the training dataset would be insufficient, since there would be too much noise in the data to isolate the effect of each parameter. A test dataset was created that isolates each parameter one-at-a-time. We chose to fix the other parameters at a level that remains at a challenging level for the U-net but not sufficient to limit the performance. These values were chosen to represent the periods with the most dramatic fall in precision/recall of TP against each parameter (see Table S2, ESI†).
To visualise examples, we simulate a random image and show multiple methods of drawing training labels in Fig. 2(M–R). The label can either be a smooth Gaussian heatmap or it can be a binary mask. The radius of the label can equal the radius of the particle [Fig. 2(N)], a varying radius can suit object detection problems for polydispersed suspensions. On the other hand, the label can be a smaller fixed radius for every particle - akin to a bullseye [Fig. 2(O) and (P)]. In Fig. 2(Q) and (R) shown are the usual binary semantic segmentation labels, representing foreground and background.
However, semantic labels overlap when it comes to closely packed targets. Therefore, we generate a Gaussian around each particle coordinate in the model. This Heatmap Regression approach (Gaussian labels with an L1 loss function) is taken to aid in dense detection. Furthermore, a smooth Gaussian heatmap permits the use of an L1 loss function for the model. Regression loss functions can be more robust to data imbalance such as in very high or very low volume fractions when compared to binary mask pixel-wise classification.
As noted above, to validate the model on simulated data where the ground truth is known, we use precision and recall. Precision describes the proportion of detections that are true, while recall measures the proportion of all true particles detected.
A common metric for object detection in machine learning is average precision (AP). However, here the important prediction is particle positions rather than the bounding box, since the particle size is already known even if the colloids are polydisperse. AP uses intersection over union (IOU) – the IOU takes the intersection of the prediction with the ground truth divided by the union of both. Instead of IOU, we simply use the distance between the prediction and ground truth normalised by the particle diameter in pixels. AP is usually referred to as APτ where τ represents the percentage of the diameter as which a prediction is regarded as a match or mismatch. We therefore compute AP as follows:
Distance matrix
Dij = |pi − ![]() | (4) |
Mismatch matrix
![]() | (5) |
Precision
![]() | (6) |
Recall
![]() | (7) |
Here, p is the true particle position out of a total N particles, and is the predicted particle position out of
particles.
Precision and recall can be easily measured using the pairwise distances of the predictions and ground truths. AP is defined as the area under the precision and recall curve. The standard threshold is 50% of the particle diameter (AP50). When simply AP is referenced, this is the combined average precision of eleven consecutive thresholds 0%, 10%, 20% … 100%.
If this is a ground truth deficient problem, how can accuracy be validated for experimental data? For this we leverage the simplicity of colloidal systems and the long history of structural measures applied to them.8,63 The radial distribution function (RDF), g(r), is uniquely determined for an isotropic fluid with spherically symmetric pairwise interactions and can be predicted from theory with high accuracy in many cases.63
We showcase the effect of poor detection on precision, recall, and the g(r) in Fig. 3. In experiments, the g(r) provides great insight into the nature of the prediction and can make biases in detection clear. For instance, if nearby particles are labelled too close, this will show as a peak before r/σ = 1 meaning a decrease in precision (more false positives, see Fig. 3(C) and (D)). If the g(r) does not level off at unity for large r, this can hint at long length-scale inconsistencies in the detection such as clustering or boundary effects (see Fig. 3(B) and (D)). This can then be used to validate the detections of both simulated and experimental data.
In addition to the radial distribution function (g(r)), finer details can be probed with higher-order structural measures, which probe the correlations of multiple particles. A suitable method here is the topological cluster classifcation (TCC).64 This identifies groups of particles whose bond topology (identified by a Voronoi decomposition) is identical to isolated clusters. When processing the experimental results where the ground truth for particle locations is not available, The TCC acts as a more sensitive measure for the PRS results compared to the RDF.
Here, we consider a range of experimental samples of different densities, media, sizes, and fluorescent dyes, and to provide evidence of overfitting (or lack thereof) and support the discussion of generalisability in particle tracking applications. These suspensions are refractive index and density matched to neglect opaqueness and bouyancy effects. Described in Table S3 (ESI†) are the composition, size, volume fraction, and polydispersity of the experimental data used to validate the model. A Leica SP8 3D STED II confocal microscope was used to image the specimens.
Labelling the particles and background separately allows the comparison of their probability distributions (see the bottom row of Fig. 4). Since the values are being measured experimentally, false negatives are unavoidable, skewing the distributions closer together. The PSF is anisotropic so drawing perfect spheres of radius r would not capture the correct segmentation mask. Due to these challenges, we use the median of the brightness distribution which is a more robust estimator to outliers than the mean. One can find the CNR to be between 0.2 and 10. The laser illumination power is usually tuned so that particles are relatively dim, to improve the signal to noise ratio and avoid photobleaching. The gain is then adjusted to the maximum range of digitisation.
Using a heatmap approach, firstly the model is tested as a “de-noiser” for TP, wherein the model labels are fed to TP for post-processing as shown in Fig. 5. Over a parameter sweep TP maintains almost perfect precision. However, TP suffers from low recall usually only detecting 50% of the particles due to its assumptions and refinement steps. In particular, it fails in low brightness and dense systems where particle separability suffers.
The U-net improves the recall from TP across all simulation parameters while maintaining almost perfect precision. The volume fraction (ϕ) is the main objective and the U-net improves recall at ϕ = 0.55 from 20% to 30% in simulations (see Section 1.2 for an explanation of recall and precision). The biggest improvements in detection are in detecting colloids with low contrast and low brightness, this is due to TP assuming the particles are in the brightest 30% of the image. While this parameter can be tuned manually and TP can usually perform well at low brightness, the U-net provides a simpler interface during inference and does not require manual tuning. Interestingly, both U-net and TP suffer on particles smaller than 10 pixels in radius.
The limited recall of TP motivates the use of a different postprocessing method. We used a Laplacian of Gaussian (LOG) blob detection algorithm which shows a tradeoff of slightly lower precision for a pronounced higher recall across simulated image parameters (see the bottom row of Fig. 5). Most notably LOG rescues the low recall on particles smaller than 10 pixels. LOG is also less consistent than TP, with a larger spread of recall. This hints at a lower quality of segmentation labels. The LOG algorithm does not contain any refinement steps, meaning the detections are more reliant on the quality of the model's heatmap labels.
The overall results, increased number of detected particles and the physically meaningful radial distribution functions, indicate that the Image → U-net → TP pipeline improved the PRS result across different experimental conditions.
In experimental data postprocessing with the Laplacian of Gaussian we find improved precision, but struggles on images of polydisperse systems [Fig. A2(C) and (D), ESI†] and dilute suspensions (Fig. A2E, ESI†). The radial distribution functions obtained from experimental data have higher first peaks in Fig. A2(A–C) (ESI†), which suggests the model is detecting more of the particles closely packed together, furthermore, the g(r) stabilises around 1, indicating there are no long range inhomogeneities in particle detection.
For the Colloidoscope approach we use only one parameter, the approximate particle diameter of 5 or 7 pixels for extracting the coordinates using Trackpy, and no image processing before tracking. With the TP approach, on the other hand, the intensity loss in the z-direction is first compensated and then a Gaussian filter is applied to make the image smoother. This increases the number of tracked particles and reduces the localisation uncertainty. In addition, an anisotropic diameter is used. Either 5 pixels in the xy-direction and 7 pixels in the z-direction, or 7 pixels in the xy-direction and 9 pixels in the z-direction. This ensures that the broadened intensity distribution of a single particle in the z-direction is not misinterpreted as two particles sitting on top of each other. Such false-positive identifications would lead to a first peak in the g(r) curves at values r ≪ σ.
First, we investigate the influence of photobleaching, i.e. the loss of signal-to-noise ratio and decrease in average brightness of the experimental image [see Fig. 7(a) and (b)] over the course of a 32 frames measurement (approx. 3 min). Fig. 7(d) shows the volume fraction calculated from the number of particles over the measurement period and the averaged intensity, which falls from approximately 98 to 60 for the 8-bit image. With the smaller diameter of 5 pixels, Colloidoscope overestimates the volume fraction due to some misidentifications. Colloidoscope with 7 pixels on the other hand tracks approximately 100% of the particles and therefore has a higher recall than TP with either of the two chosen diameters. As expected, the recall for TP with the smaller diameter is significantly better (ϕ = 0.51). Approximately 98% of the particles are tracked. Compared to Colloidoscope with 7 pixels as the particle diameter, this translates to a loss of about 150 particles in each frame.
To quantify the localisation uncertainty of the tracking routines, the shape of the radial distribution functions is analysed (Fig. 7c). The height of the first peak is significantly influenced by the localisation uncertainty of the particle positions. Colloidoscope and TP with the larger diameter of 7 pixels produce almost identical g(r)'s and agree very well with the computer simulations. However, the Trackpy approach with the smaller diameter of 5 pixels has significantly lower precision, which can be seen from the smaller first peak.
So far, we have considered pair correlations of particles through the radial distribution function g(r). However, higher-order structural correlations provide a more detailed probe of the quality of particle tracking when benchmarked against simulation data. Correctly classifying these structures requires high accuracy detection with high recall. For example, in the case of the fcc cluster, a failure to identify one particle out of 13 will result in the cluster not being detected. The same applies to a particle that is localised with greater uncertainty.
We calculated the averaged population of different cluster types identified from coordinates produced by Colloidoscope and TP and compared them to hard sphere simulation data. Here we consider five clusters each consisting of m particles relevant to the system in question. These are the triangular bipyramid (m = 5), the octahedron (m = 6), the 7-membered defective icosahedron (m = 10) and the face-centred cubic crystal (m = 13). We see in Fig. 7(e) that Colloidoscope with a particle diameter of 7 pixels matches the cluster population of the simulation, maintaining a closer match than TP. Both TP approaches deliver comparable results, which can be explained by similar influence of recall and precision on the cluster population. Unidentified particles and particles tracked with high uncertainty influence the population of higher-order structures in a similar way.
Comparing the results for recall and precision derived from the quantities shown in Fig. 7(c) and (d), one can see that for Trackpy there is a trade-off between the number of particles and the localisation uncertainty with which they are tracked. It opens an optimisation space for every parameter that can be changed in the TP routine. This ranges from the experimental settings during image acquisition, to image preparation, to the selected particle diameters in TP.
To summarise, it can be said that TP can probably reproduce the particle positions produced by Colloidoscope. However, this requires careful fine-tuning of all parameters and probably post-processing in the form of manual filtering. In the test case we have considered, hard spheres, it is straightforward to compare against simulation data. In general, this is not always possible. This brings us to the question of how to actually know which parameters give “more correct” coordinates in the case of TP. An obvious quantity to tune against is the volume fraction. That is to say, to adjust the parameters until the desired volume fraction is obtained. One drawback of this approach is that the volume fraction itself is typically not known to high accuracy, with relative errors of 6% being routine67,68 (although very recently, improvements have been made69). As discussed above, the radial distribution function g(r) has limitations in assessing tracking quality, although higher-order correlations can shed more insight.70 Overall then, such parameter tuning can lead to subjective results.
The findings from simulated data generally translated to experimental data, images usually contain a bad mix of all image parameters (e.g. high density, high noise, and low contrast). This shows the challenging nature of designing simulation datasets. The simulated training dataset included volume fractions up to ϕ = 0.55. Nevertheless the model still generalised well to experimental data up to ϕ = 0.58 [see Fig. 6(C)].
Image quality is an elusive measure and combines all the parameters discussed here. The diffraction limit has been broken for single fluorophores, which may be detected to an accuracy of 20 nm in STORM imaging.50 However, tracking in dense suspensions of colloids remains more challenging. Deconvolution can introduce artefacts which are challenging to simulate, with the model often failing to improve TP results on the deconvolved experimental data [see Fig. 6(A)vs. Fig. 6(B)] where, for example, Colloidoscope detects more particles (3767) from a raw image than Trackpy on a deconvolved image (2771).
Trackpy struggles with polydispersity since it needs the apparent diameter as an input parameter, as its underlying algorithm assumes all particles to be detected have the same size. The LOG can detect particles within a range which shows its superior performance on large and small simulated particles, as well as polydispersed volumes (see the right-most plots in Fig. 5). The processing methods TP and LOG can give further insight into the performance of the model heatmap. In this case the U-net improved the performance of both postprocessing methods. For dense inference, the Attention U-net further matched the high precision of TP, with the benefits of LOG recall. While attention acts to increase model performance in situations with data imbalance (dilute suspensions), it also boosts performance on polydispersed data (Fig. A3, ESI†).
This model is deployed and available to use pretrained at https://github.com/wahabk/colloidoscope. We chose a standard U-net over an Attention U-net since we find it gives better results on data taken from suspensions at high volume fraction. The only parameter that is required is the diameter of the particles. The users are given the option of using TP or LOG, with the choice dependent on whether high precision or high recall is required, often LOG aids in detecting polydispersed colloids. This shows how Deep Learning can also be more accessible in deployment than previous methods. While DeepTrack36 can detect many shapes of colloids (such as crescents), it requires the user to simulate their own data and train their own model, significantly hampering user-friendliness and adoption. This model is deployed pretrained and confidently generalises to all colloidal spheres tested. We set out to address the challenge of tracking colloids in 3d imaging at high volume fraction . However, an interesting future direction would be to train a version of Colloidoscope for 2D data. A further possibility for future work with a 2D version of Colloidoscope could be to test errors using mean-squared displacement data, in that the error leads to a baseline in the MSD. 2D would be ideal here, due to the enhanced speed of imaging.
Further network architectures are being developed to counter the difficulty in instance segmentation using embedding. The issue of determining the number of particles before prediction can be phrased in a second way. It is challenging to know which segmentations (foreground) belong to which particle. This is commonly named the data association problem, and is countered using output embedding. DSNT (Differentiable Spatial to Numerical Transform62) embeds the output by predetermining which particle class goes in which output channel during training. Further approaches counter this by semantically labelling the input, then discriminating different instances using clustering approaches.72,73
While an Attention U-net was used, this model does not benefit from full self attention such as traditional transformers. Transformers were created to address limitations in maintaining context in recurrent neural networks. Furthermore, transformer architectures benefit from positional encoding of patches, instead of analysing patches without the context of surrounding ones. Embedding the input and output can act to alleviate many of the limitations encountered here.74 However, transformers still require large datasets ranging from hundreds of thousands to hundreds of millions, which is beyond our computing budget.40
• Colloidoscope outperformed TP by maintaining high precision and improving recall on simulated and experimental data.
• Colloidoscope demonstrated the ability to improve predictions in high colloid volume fraction and low contrast, thus expanding the scope of imaging and effectively cured the effect of photobleaching when images are collected over a long period of time.
• Furthermore, Colloidoscope offers a more user friendly interface with one single parameter, compared to TP. Colloidoscope easily generalized to multiple suspensions (both silica and PMMA colloids), and simply using raw images Colloidoscope outperforms Trackpy for deconvolved images (see Fig. 6Avs.Fig. 6B).
• Overall, the findings suggest that Colloidoscope is a promising alternative to TP for imaging colloidal suspensions with enhanced performance and user-friendliness.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4sm01307g |
‡ Perlin noise corresponds to four greyscale values which are used for the image (Fig. 2(G)).61 |
This journal is © The Royal Society of Chemistry 2025 |