Namhee Kang
a,
Yeonseo Joo
a,
Hyosung An
b and
Hyerim Hwang
*a
aDepartment of Chemical Engineering and Materials Science, Ewha Womans University, Seoul 03760, South Korea. E-mail: hyerimhwang@ewha.ac.kr
bDepartment of Petrochemical Materials Engineering, Chonnam National University, Yeosu, Jeollanam-do 59631, South Korea
First published on 19th September 2025
Colloidal systems offer a unique experimental window for investigating condensed matter phenomena, uniquely enabling simultaneous access to microscopic particle dynamics and emergent macroscopic responses. Their particle-scale size, thermal motion, and tuneable interactions allow for real-time, real-space, and single-particle-resolved imaging. These features make it possible to directly connect local structural changes, dynamic rearrangements, and mechanical deformation with system-level behaviours. Such capabilities remain largely inaccessible in atomic or molecular systems. This review presents colloidal modelling as a predictive framework that addresses persistent challenges in materials research, including phase classification, dynamic arrest, and defect-mediated mechanics. We describe methodologies for extracting structural, dynamical, and mechanical descriptors from experimental imaging data, show how these features capture governing variables of material behaviour, and illustrate their application in machine learning approaches for phase identification, dynamics prediction, and inverse design. Rather than treating colloidal data as limited to model systems, we emphasize its value as a training ground for developing interpretable and physics-informed models. By linking microscopic mechanisms with macroscopic observables in a single experimental system, colloids generate structured and generalizable datasets. Their integration with data-driven methods offer a promising pathway toward predictive and transferable materials design strategies.
Beyond their imaging accessibility, colloidal systems offer unique tunability in terms of particle size, shape, interaction potential, and external responsiveness (Fig. 1(a)).16–21 This versatility allows for systematic exploration of parameter space under controlled experimental conditions. For instance, poly(N-isopropylacrylamide) (NIPA) microgel spheres exhibit temperature-responsive swelling, allowing fine control over volume fraction and confinement geometry to induce phase transitions.22,23 Similarly, sterically stabilized hard-sphere colloids can closely approximate idealized interparticle interactions, facilitating quantitative studies of crystallization, melting, and glass formation.2 These model systems have proven invaluable for investigating the structure and dynamics of condensed matter in experimentally accessible yet theoretically tractable forms.
![]() | ||
Fig. 1 Colloidal systems as experimental platforms linking microscopic structure to macroscopic behaviour. (a) Tuneable interparticle interactions in colloids, ranging from hard sphere to short range and long-range attractions, result in diverse phase behaviour under controlled conditions.24 (b) Real space tracking provides simultaneous access to structure, dynamics, and mechanics, enabling the extraction of descriptors that govern macroscopic properties. These data flow into a complete machine learning pipeline, including data acquisition, feature extraction, physics-based modelling, and validation. |
This experimental transparency has positioned colloids not only as analogues of atomic systems, but as magnifying platforms for uncovering the microscopic origins of macroscopic behaviour.24 As illustrated in Fig. 1(b), real-space tracking provides simultaneous access to structure, dynamics, and mechanics, enabling the extraction of descriptors that directly govern macroscopic properties. Their ability to resolve structural and dynamical features such as local bond order, strain fields, dislocation networks, and nucleation precursors at single-particle precision, features that are rarely accessible in either atomic systems or coarse-grained simulations. This level of detail has transformed colloids from illustrative models to predictive testbeds for uncovering governing variables that dictate material behaviour across soft and out-of-equilibrium systems.
In parallel, the growth of data-driven approaches in materials research has elevated the importance of high-quality, structured datasets, particularly for machine learning (ML) applications.25–30 While ML has demonstrated success in extracting hidden correlations and guiding materials discovery, its predictive power depends critically on the availability of structured, physically meaningful data.31–36 For soft matter systems, conventional datasets derived from simulations often suffer from coarse-graining, idealized assumptions, or computational expense.37–39 Colloidal systems offer a complementary solution to these challenges by serving as a high-fidelity platform for generating structured datasets with experimental grounding. Real-space colloidal data—ranging from particle coordinates and local order parameters to mechanical responses and temporal trajectories—can be systematically labelled, filtered, and augmented into descriptors suitable for ML pipelines. These data serve as inputs for supervised, unsupervised, and reinforcement learning models, linking microstructure to dynamics, mechanics, and emergent behaviours. Colloidal systems thus operate not only as models of matter but also as data refineries for ML-ready, physics-informed representations. This integration is schematized in Fig. 1(b), where descriptors obtained from real-space tracking flow into a physics-informed ML pipeline: from data acquisition and processing, to feature engineering and physics embedding, to model training and validation. This pipeline underscores how colloids serve as natural data engines, transforming particle-level observables into predictive frameworks.
Recent advances have illustrated how colloidal data can enrich ML capabilities in several domains, enabling the identification of structural motifs, the prediction of dynamic responses, and the optimization of self-assembly protocols.16,40 Both supervised and unsupervised learning approaches have been applied to classify local environments and detect phase transitions.41–46 Reinforcement learning has also begun to support adaptive control of self-assembly processes.47–49 Notably, the most informative outcomes often arise when ML is integrated with experimentally curated datasets that reflect both structural fidelity and dynamic complexity.50–52
This review examines colloidal modelling as a predictive framework for condensed matter and soft materials, with a particular emphasis on its integration with ML. Rather than positioning colloids in contrast to simulations, we emphasize their role as complementary tools, uniquely suited to produce interpretable, structured, and generalizable data for machine learning applications. We survey representative case studies in which colloidal systems have uncovered governing variables that are difficult to obtain by simulation alone. These examples are organized into three thematic areas: (1) structure, (2) dynamics, and (3) mechanics. Through these cases, we argue that colloidal systems are poised to become central contributors to physics-informed ML and the broader effort to build predictive, interpretable models of complex materials.
Supervised learning has been the most widely adopted paradigm. In this approach, models are trained on labelled datasets to classify states (e.g., crystalline vs. amorphous phases) or to predict quantitative properties (e.g., diffusion coefficients).53–57 Techniques range from linear regression and decision trees to more complex deep neural networks. Unsupervised learning, in contrast, does not rely on labels but instead identifies hidden structures through clustering or dimensionality reduction. Reinforcement learning, though less common in materials research, is increasingly applied to sequential decision-making tasks, such as designing synthesis pathways or controlling adaptive experiments.58
These frameworks have already produced tangible advances. For example, supervised models have predicted alloy stability from compositional descriptors, learned formation energies of crystals from large density functional theory (DFT) datasets, and classified phases directly from diffraction patterns.59–62 Unsupervised approaches have revealed hidden classes of glassy dynamics and clustered vast libraries of polymer structures without prior labels.63,64 Reinforcement learning has been demonstrated for automating experimental design and guiding non-equilibrium process control.58,65,66 Together, these cases illustrate how ML complements theory and simulation by accelerating discovery and revealing correlations that might otherwise remain hidden.
More comprehensive reviews of ML applications across materials science, chemistry, and physics covering topics such as high-throughput screening, autonomous laboratories, and generative models for materials design are available elsewhere.67–69 In this review, we instead focus on how colloidal systems contribute uniquely to the development of physics-informed ML frameworks, offering experimentally accessible, high-fidelity datasets that link algorithmic methods with transparent and interpretable physical descriptors.
![]() | ||
Fig. 2 Quantitative extraction of structural and mechanical descriptors from confocal microscopy and traction rheoscopy. (a) Schematic of the experimental setup of confocal microscopy.70 (b) Representative single-particle coordinate data extracted from confocal image stacks. (c) 3-dimensional reconstruction of colloidal spheres from z-stacks, allowing identification of particle centres and local environments.70 (d) Illustration of particle tracking and region-based exclusion for accurate dynamic analysis.82 (e) Application of traction rheoscopy to quantify spatially resolved strain. Left: Cross-sectional confocal image showing colloidal particles and fluorescent tracer particles embedded in the gel. Right: Height-resolved displacement data reveal strain localization near the colloidal–gel interface under shear deformation.87 |
Particle localization from raw image stacks is typically accomplished using robust image analysis algorithms. One widely adopted method is the approach developed by Gao and Kilfoil,82 which combines bandpass filtering and centroid fitting to achieve sub-pixel accurate localization, even in dense and noisy samples (Fig. 2(b)). In more complex experimental contexts, this base routine is augmented with image deconvolution, nonlinear least-squares fitting, and pixel-wise neural network classification. These additional steps enhance the accuracy of particle identification, particularly for distinguishing monodisperse particles from synthesis by-products such as dimers or aggregates (Fig. 2(c)).
Once particles are localized, their trajectories are constructed across frames using distance-based association algorithms.83,84 In dilute systems, where particle displacements are moderate and isolated, classical tracking methods such as Crocker–Grier algorithm, widely implemented in Trackpy, perform reliably by minimizing frame-to-frame displacement. However, challenges arise in dense suspensions, especially those near glass or jamming transitions, where particle displacements vary broadly and neighbouring particles are in close proximity. In such systems, a single cutoff distance for trajectory is inadequate, as small displacement risk exclusion under a large cutoff, while large displacements introduce ambiguity under a small one. As shown in Fig. 2(d), a two-pass tracking strategy addresses this challenge by applying a small cutoff in the first pass to accurately capture low-displacement particles. After removing these from the dataset, the second pass uses a larger cutoff to recover particles undergoing larger displacements. This hierarchical approach resolves a wide range of mobilities while minimizing misidentification. These capabilities are essential for extracting meaningful dynamical quantities such as mean square displacement (MSD), cage rearrangements, and collective flow patterns, particularly in systems exhibiting strong dynamic heterogeneity.
To support large-scale, reproducible analysis, high-throughput pipelines built using open-source tools such as DeconvolutionLab2, TrackMate, TrackPy, and Colloidoscope have recently become standard.83,85,86 For example, TrackPy enables flexible control over linking parameters and supports custom tracking strategies, while Colloidoscope offers deep-learning-enhanced detection for dense, noisy datasets. These tools enable automated particle localization and trajectory reconstruction across large 3D datasets, facilitating consistent feature extraction for downstream structural, dynamic, and mechanical analyses.
A recent advancement in experimental methodology, traction rheoscopy, has enabled direct mechanical characterization of colloidal materials at both the macroscopic and single-particle level (Fig. 2(e)). This technique combines confocal imaging with a compliant, calibrated elastic substrate to simultaneously measure shear stress and internal strain.87 The substrate's deformation is tracked via embedded fluorescent markers, yielding millipascal-level stress resolution, while particle tracking within the colloid provides displacement fields from which local strain tensors can be computed. Using best-fit affine transformations, strain is decomposed into shear, dilation, and rotation components, enabling detailed mapping of stress evolution under load. For example, 1 μm lateral displacement at the surface of a 143 μm-thick silicone gel (G ≈ 4.9 Pa) corresponds to a shear stress of ∼ 34 mPa. Importantly, this approach resolves stresses below 1 mPa and captures spatial stress heterogeneities throughout the colloidal domain, offering rare experimental access to the internal force landscape during deformation.
A widely used family of structural metrics is the bond-orientational order parameter ql and their rotational invariant wl, originally developed by Steinhardt, Nelson, and Ronchetti to identify local symmetries in atomic systems and later adapted for colloidal suspensions.88 Among these, q6 quantifies six-fold symmetry and is commonly used to distinguish between crystalline and liquid-like environment; high q6 values typically correspond to ordered states such as face-centred cubic (FCC), hexagonal closed packed (HCP), or body-centred cubic (BCC), while lower values indicate disorder (Fig. 3(a)).89 The third-order invariant w6 provides finer differentiation between polymorphs such as FCC and HCP (Fig. 3(b)).90 These harmonic-based descriptors are widely employed to classify crystalline motifs during nucleation and growth.
![]() | ||
Fig. 3 Structural descriptors extracted from colloidal experiments. (a) Distribution of Steinhardt bond-order parameters (q4, q6, w6) used to distinguish local crystal symmetries such as face-centred cubic (fcc), body-centred cubic (bcc), hexagonal close-packed (hcp), and icosahedral structures in particle-resolved colloidal data.89 (b) Visualization of crystalline domains identified by bond-order parameters, enabling the mapping of spatial heterogeneity in nucleating systems.90 (c) Time-resolved tracking of crystal growth dynamics using local order parameters and number of crystal neighbours, with particle configurations reconstructed in real space from confocal imaging.8 (d) Detection of discontinuous solid–solid transitions in a binary colloidal monolayer driven by magnetic fields.92 |
In more specific experimental contexts, empirical metrics tailored to the system's geometry provide practical alternatives. For example, in studies of BCC colloidal crystallization and melting, an in-line bond angle order parameter ψi, defined as the number of neighbour pairs with bond angles within 180°, is used to distinguish crystalline from liquid particles. A threshold value of ψi > 3.5 effectively identifies ordered regions, while additional classification by the number of crystalline neighbours Z enables resolution of bulk, interfacial, and liquid phases with single-particle precision, allowing dynamic tracking of melting fronts and interface evolution (Fig. 3(c)).8,91 In another example, Alert et al. employed the lattice angle α as an order parameter in a two-dimension (2D) dipolar colloidal crystal, where a discontinuous change in α under varying magnetic field strength signalled a mixed-order phase transition.92
While traditional and empirically tuned order parameters are effective in classifying local structures into crystalline, hexatic, or amorphous states, more sophisticated descriptors offer deeper insight into structure–property relationships.8,88,91 In addition to these traditional metrics, we discuss a representative selection of higher-level governing descriptors that have been developed that extract deeper thermodynamic or dynamic information from particle-resolved experimental data (Fig. 4). These descriptors have proven powerful in identifying phase transitions and structural heterogeneities.
![]() | ||
Fig. 4 Structural descriptors capturing spatial heterogeneity and anisotropy in colloidal glasses. (a) Anisotropy parameter k derived from Voronoi cell shape analysis distinguishes between isotropic and anisotropic packing in colloidal systems. The onset of glassy dynamics coincides with a transition from anisotropic to isotropic local environments.94 (b) Depth-resolved maps of local density ρ and Debye–Waller factor (DW) reveal the formation of a surface mobile layer in binary colloidal glasses. Structural relaxation is suppressed toward the interior, where glassy arrest dominates.95 (c) Local entropy s2 identifies nucleated dense phases in supercooled colloidal vapors. Snapshots at varying undercooling show the emergence and growth of clusters, with 3D reconstructions highlighting local order.98 |
One such descriptor is the Voronoi anisotropy parameter k = |Rmin|/|Rmax|, shown in Fig. 4(a), which quantifies the elongation of particle's local cage based on the geometry of its Voronoi cell.93,94 A value of k ≈ 1 indicates a highly isotropic cage, while lower k values reflect anisotropic, elongated cages. As shown in Fig. 4, the averaged anisotropy k increases with packing fraction ϕ, capturing the isotropization of cages as systems approach the glass transition. This behaviour was observed in simulations of monodisperse hard-sphere and soft Weeks–Chandler–Andersen (WCA) particles in 2D and 3D, where softer particles form more isotropic cages due to their greater configurational flexibility. These trends were experimentally validated using binary colloidal monolayers of soft poly(N-isopropylacrylamide) (NIPA) and hard poly(methyl methacrylate) (PMMA) particles reproduced the behaviour of 2D WCA and hard-sphere systems, respectively.94 Because k correlates with dynamical heterogeneity, it serves as a predictive structural marker for glassy behaviour; Fig. 4(b) overlays the k with the Debye–Waller factor, a dynamic descriptor that reflects particle vibrational amplitude.
Another descriptor is the local two-body excess entropy s2 and local density ρ, which provide thermodynamic insights into the degree of structural order around each particle.95,96 The local density ρi = πσi2/(4Ai), defined as the inverse area of a particle's Voronoi cell, captures the degree of local packing, while s2,i, computed from local pair correlation functions, quantifies the loss of configurational entropy due to structural correlations.97 Lower s2 values correspond to well-packed, ordered environments, while higher values indicate more disordered, fluid-like regions. As illustrated in Fig. 4(b), these descriptors ρ(y) and s2(y) show that surface pre-melting in a binary colloidal monolayer under slow cooling occurs gradually rather than abruptly. Instead of a sharp interface, a sequence of distinct layers emerges: a dense vapor, a surface liquid, a dynamically arrested surface glassy layer, and finally the bulk glass. The interface positions y0, y1, y2 are defined by the saturation of normalized ρ and s2, highlighting how these descriptors capture subtle transitions across the interface. These descriptors are highly sensitive to subtle changes in structural order and are widely used in studies of vitrification and confinement effects.
A simpler yet effective geometric descriptor is the local coordination number, defined as the number of neighbouring particles within a specified radial cut-off. As shown in Fig. 4(c), this metric was used to distinguish gas, liquid, and solid phases in a colloidal system undergoing phase transitions induced by critical Casimir forces.98 Confocal imaging captured a gradual evolution from a low-density gas phase to high-density liquid clusters and crystalline domains as the temperature approached the critical point. Particles with higher coordination numbers were identified as belonging to denser, more structured regions, while low-coordination particles reflected gas-like surroundings. This simple neighbour-counting method allows for real-time, particle-resolved phase classification based on local density fluctuations and remains highly compatible with optical microscopy-based colloidal experiments.
Together, these descriptors from bond-orientational harmonics to entropy metrics and coordination-based order parameters translate raw positional data into a robust, interpretable structural language. They not only support real-time classification of condensed matter states but also serve as foundational inputs for machine learning models predicting particle mobility, yielding, and self-assembly pathways.
Mean square displacement (MSD) is a fundamental dynamic descriptor that measures the average squared displacement of particles over time.101 It is defined as:
MSD(t) = 〈Δr2(t)〉 = 〈[ri(t0 + t) − ri(t)]2〉 |
Fig. 5(a) illustrates how MSD is experimentally measured using real-time tracking of colloidal particles.102,103 Sequential imaging captures particle positions at high frame rates, while trajectory maps show Brownian motion over time. From this image sequence, MSD is computed as the squared displacement between positions separated by a time lag t, averaged over many particles. The resulting MSD vs. time plot confirms the linear scaling expected for free diffusion, while velocity fluctuation analysis reveals the stochastic nature of thermal motion.8 MSD reveals distinct dynamic regimes: it grows linearly in dilute fluids (free diffusion), saturates in crystals due to confinement within lattice cages, and exhibits a two-step pattern in glassy or supercooled states. These behaviours are clearly visualized in Fig. 5(b), which shows ensemble-averaged MSDs at various volume fractions. As ϕ increases, the overall MSD decreases, and the characteristic relaxation time is delayed. The initial plateau in the MSD curve reflects cage trapping, and the end of the plateau, where the MSD resumes growing marks the onset of cage rearrangements.
![]() | ||
Fig. 5 Dynamic descriptors capturing particle mobility, relaxation, and nucleation. (a) Particle trajectories, mean-square displacement (MSD), and velocity profiles extracted from confocal time-series illustrate translational dynamics in dense colloidal systems.103 (b) MSD scaling as a function of volume fraction reveals the onset of dynamical arrest, consistent with glass-like behaviour.6 (c) Self-intermediate scattering function Fs(q,t) for all particles across a series of temperatures in glass-forming liquids, illustrating the two-step relaxation typical of glass-forming systems.109 (d) Maps of the dynamic Lindemann parameter quantify spatially resolved fluctuations, allowing the detection of soft regions prior to crystallization.10 (e) and (f) Time-resolved mapping of local displacement fields identifies dynamic heterogeneity and nucleation events.111,112 |
Another widely used dynamic descriptor is the self-intermediate scattering function (ISF), Fs(q,t).104 It quantifies temporal correlations in particle positions at a given wavevector q. The ISF is defined as:
Closely related to the MSD, the Lindemann parameter provides a local criterion for detecting melting and structural instability by quantifying relative vibrational amplitudes between neighbouring particles.110 It is defined as:
Recent advances in techniques like traction rheoscopy and confocal microscopy have made it possible to map local stress and strain fields with particle-level precision, offering an unparalleled view into the mechanics of soft matter systems under mechanical load.6,87,113–117 Using these methods, a range of key descriptors can be directly extracted from particle-resolved data, offering complementary perspectives on deformation, stress transmission, and bulk stability.115,118 Osmotic pressure can is an important descriptor of the mechanical properties of colloidal dispersions, as it reflects how interparticle interactions and thermal motion define the system's equation of state.119 It can be reconstructed from particle-level data across many particles, for example by averaging local forces or structural correlations, thereby linking microscopic observables to bulk behaviour.120,121 In this review, however, we focus primarily on particle-resolved descriptors, since they provide direct microscopic insight into local deformation and stress. Here, we highlight representative examples, including the strain tensor, von Mises strain, and reconstructed stress, that have proven particularly powerful in revealing plastic flow, dislocation activity, and defect-mediated mechanics in colloidal systems.
Strain (γ or ε) is a fundamental mechanical descriptor that measures material deformation through relative particle displacements of neighbouring particles.87,122 In colloidal systems, strain can be computed with single-particle resolution by tracking individual displacements. The local strain tensor is calculated from changes in a particle's nearest-neighbour distances, capturing local shear deformation.
A pioneering study by Schall et al. demonstrates that the thermally or externally-driven strain fields can be resolved at single-particle resolution in colloidal glasses.115 By tracking particle trajectories in dense colloidal suspensions and computing local strain tensors from particle displacements, they identified how structural arrangements arise and evolve under shear. As shown in Fig. 6(a), the cumulative shear strain εyz was mapped over time during continuous shear, revealing localised regions of elevated strain (shear transformation zones) progressively coalesce into a system-spanning network. These zones emerge intermittently and spatially heterogeneously, suggesting the discrete nature of plastic events in amorphous solids.
![]() | ||
Fig. 6 Mechanical descriptors extracted from particle-resolved deformation in colloidal crystals. (a) Time evolution of local shear strain during deformation. Top: 2D strain maps (εyz component) reveal spatially heterogeneous transformation zones at 20, 30, and 50 minutes of applied shear. Bottom: 3D renderings highlight particles exceeding a critical shear strain threshold, showing the progressive formation of strain-localized networks.115 (b) Colormap of cumulative strain from 3D tracking in a sheared colloidal crystal, revealing regions of persistent defor-mation.87 (c) von Mises strain maps capturing equivalent shear deformation. Plastic flow is visualized through slip along both classical [111] and unconventional [001] planes, revealing complex strain propagation paths under external loading.123 |
One scalar form used to analyse the particle-level deformation is the von Mises strain (), which quantifies local distortion strain and is particularly useful for detecting slip events and zones of strain localization.87 It is defined as:
εvM = [εxy2 + εxz2 + εyz2 + 1/6((εxx − εyy)2 + (εyy − εzz)2 + (εzz − εxx)2)]1/2. |
By computing the root mean square variation of εvM across time and space, one can visualise the temporal evolution and spatial heterogeneity of shear response within the colloidal solid (Fig. 6(b)). For instance, when a hard-sphere colloidal glass is deformed under shear, traction rheoscopy reveals grid-like patterns of elevated shear strain near embedded metal meshes, highlighting regions where plastic deformation is concentrated.
A recent study reveals particle-resolved mechanical behaviour by directly measuring εvM at single-particle resolution during shear deformation of hard-sphere colloidal crystals.123 As shown in Fig. 6(c), this approach enabled them to capture the onset and evolution of plastic flow; at total strain γ ≈ 0.04, elevated εvM values appear along [111] slip planes, consistent with classical dislocation glide, and at γ ≈ 0.06, strain becomes more localized and activates additional slip along unconventional (001) planes. These spatially resolved strain fields, correlated with dislocation structures and local rearrangements, revealed that colloidal crystals exhibit Taylor-like work hardening despite their entropic elasticity.124–126
While strain captures how a material deforms, stress (σ) quantifies the internal forces that drive such deformation, typically expressed as force per unit area. In soft materials like colloidal assemblies, shear stress (σxz) is particularly relevant under external loading. Recent developments such as traction rheoscopy (Fig. 2(e)) have enabled direct, spatially resolved stress measurements in colloidal solids with millipascal sensitivity.87 These approaches extend beyond global stress–strain characterization to reveal how internal stresses vary across space, time, and microstructure.
Defect-level stress mapping in Fig. 7(a) has revealed how stress concentrates locally around microscopic imperfections.118 In a colloidal crystal with a single vacancy, Lin et al. used SALSA (stress assessment from local structural anisotropy) to reconstruct local stress tensors from confocal microscopy data (Fig. 7(a)). This method estimates the stress by integrating directional correlations in the local pair distribution function g(r), assuming hard-sphere-like interactions. They found that the stress field around the vacancy decays gradually over several particle diameters and exhibits a dipole-like shear stress pattern, consistent with predictions from continuum elasticity. Although the modulus remains nearly uniform, the vacancy induces significant local stress anisotropy and long-range distortion of the surrounding stress field. This highlights the utility of colloidal systems for directly visualising stress transmission around point defects. The same approach has been applied to visualise stress fields around extended crystalline defects such as dislocations and grain boundaries, as shown in Fig. 7(b) and (c). In Fig. 7(b), the local shear stress surrounding an edge dislocation in a two-dimensional colloidal crystal exhibits a quadrupolar symmetry, closely matching predictions from linear elasticity.127 Fig. 7(c) shows stress distributions in a polycrystalline monolayer, where strong heterogeneity in shear and pressure fields emerges near grain boundaries and triple junctions. These measurements reveal how stress localizes along misoriented domains and evolves dynamically as grains rearrange. These examples demonstrate the capability of colloidal systems to directly test continuum elasticity at the particle scale and to capture the complex, collective mechanics of defect networks in soft crystals.
![]() | ||
Fig. 7 Stress and modulus mapping around crystal defects in colloidal solids. (a) Particle-resolved stress field in a 3D colloidal crystal containing a single vacancy. The SALSA method reconstructs the local stress tensor from confocal microscopy data, revealing dipolar shear stress patterns and long-range stress distortion consistent with continuum elasticity predictions.118 (b) Local stress analysis around an edge dislocation and associated stacking fault. The quadrupolar shear stress field surrounding the dislocation, extracted from both experiment and simulation, closely matches theoretical expectations. The shear modulus G decreases near the dislocation core, as shown by position-dependent modulus mapping.118 (c) Stress distributions in a polycrystalline colloidal monolayer. SALSA analysis captures spatial heterogeneity in pressure and shear stress, particularly at grain boundaries and triple junctions, highlighting localized stress concentrations and collective mechanical behaviour in defect-rich soft crystals.118 |
While the focus in this section has been on established and widely used descriptors, it is equally important to consider how new descriptors may emerge to address current limitations. Although the literature contains many specialized variants tailored to particular systems, most can be understood as tuned parameters or extensions of the fundamental quantities discussed here. The rapid progress of particle-resolved imaging and analysis ensures that this list will continue to grow. In particular, the integration of high-fidelity experimental datasets with ML is expected to accelerate the development of new, physics-informed descriptors that extend beyond traditional human-designed metrics. Such advances may uncover hidden order parameters, detect precursor dynamics in glassy or crystallizing states, and link microscopic stress transmission directly to macroscopic response.
ML mitigates these limitations by leveraging the wide range of descriptors available from colloidal experiments. Instead of relying solely on scalar metrics, full particle configurations such as neighbour distances, angular distributions, and Voronoi volumes, can be used to detect subtle correlations and higher order motifs that elude traditional descriptors. This makes ML based classification especially valuable for identifying glassy, polycrystalline, or hierarchically structured phases.54,57,64,129,130
A prominent example is the use of convolutional neural networks (CNNs) for image based classification.131 These networks can be trained to recognize different self-assembled structures, including crystals, liquid crystals, and quasicrystals, directly from diffraction patterns or microscopy images without needing predefined descriptors (Fig. 8(a)). Input images may be rotated or rescaled to improve robustness during training, allowing CNNs to generalize across various orientations and experimental conditions. In parallel, unsupervised learning techniques, such as clustering based on pixel-level features or autoencoder representations, have been used to sort unlabelled images into physically meaningful groups.46 This approach can uncover subtle distinctions such as void-rich defective states versus fully ordered crystals that are often missed by conventional bond-orientational order parameters. Once the unsupervised clusters are interpreted and assigned to meaningful structural labels, the resulting annotations can be used to train supervised CNNs. These models can then rapidly and accurately classify new structures without requiring manually tuning or handcrafted features, enabling high-throughput analysis across diverse assembly conditions.57,132,133
![]() | ||
Fig. 8 Machine learning-based structural classification and nucleation pathway analysis of colloidal assemblies. (a) Schematic workflow of image-based classification using convolutional neural networks (CNNs).128 (b) Structure classification in binary colloidal superlattices (BSLs) using graphlet-based deep autoencoders. Local neighbourhoods are encoded as structural and compositional graphs via branched graphlet decomposition, which are then compressed into low-dimensional latent spaces using autoencoders.134 |
For more complex systems (Fig. 8(b)), particularly in three dimensions, deep autoencoders have been integrated with neighbourhood graph frameworks to handle structural and compositional order.134 Branched graphlet decomposition maps the local environment of each particle onto high-dimensional graphs that encode geometry and particle identity. These graph representations are then compressed into lower-dimensional latent spaces using deep autoencoders, which preserve essential features while filtering out redundancies. This method has proven especially effective for classifying binary colloidal superlattices, distinguishing ideal crystal structures such as FCC-CuAu or BCC-CsCl, as well as irregular configurations including substantial defects and intermediate states. The graphlets serve as the primary structural descriptors in this framework, simultaneously capturing local symmetry, bonding motifs, and species-specific arrangements. By embedding both spatial and compositional information, this strategy enables a detailed, particle-level view of nucleation pathways and ordering transitions, making it particularly powerful for complex or multicomponent colloidal systems.
A major approach involves predicting long-term behaviour from short-time structural descriptors. For instance, supervised models trained on local particle configurations represented through radial and angular distributions, Voronoi geometry, or local entropy can predict which particles will undergo rearrangement.94 These descriptors are reshaped into feature vectors and used to train classifiers or regressors that forecast future mobility.54 Such models reveal correlations between structure and dynamics, providing access to hidden soft spots and indicators of dynamic heterogeneity.
A significant advance in this direction is the development of machine-learned collective variables (CVs) that guide simulations of phase transitions, as shown in Fig. 9(a).135 Recent work uses graph neural networks (GNNs) trained on particle neighbourhoods to efficiently learn CVs like local coordination or Steinhardt bond-order.136,137 These models bypass the need for manual descriptor selection and offer orders-of-magnitude faster computation during enhanced sampling, enabling efficient exploration of rare events like crystallization or spinodal decomposition. Remarkably, GNNs trained on colloidal systems have been successfully transferred to simulate metallic nucleation (e.g., copper), highlighting their cross-domain generalizability.
![]() | ||
Fig. 9 Graph neural network (GNN)-based learning of nucleation pathways. (a) A GNN framework transforms particle coordinates into latent representations through neighbour-based graph construction, edge embedding, and pooling, enabling prediction of nucleation collective variables.135 (b) Application to binary colloidal superlattices reveals a classical one-step nucleation pathway via compositionally ordered (CO) clusters, contrasting the two-step route in Fig. 8.134 |
ML also supports the discovery of self-assembly pathways that defy conventional intuition (Fig. 9(b)). Using unsupervised learning applied to simulated trajectories, researchers have reconstructed phase transition landscapes and uncovered nonclassical nucleation routes.134 For example, in binary colloidal mixtures, graph-based latent embeddings revealed that superlattice formation often proceeds via dense amorphous precursors rather than direct crystallization. This capability to detect hidden intermediates and precursor states, especially in systems exhibiting frustration or metastability, is not easily achievable through conventional trajectory averaging.
As shown in Fig. 10(a), a typical ML-based inverse design pipeline begins with parameter sampling from a distribution and simulation of candidate systems. The resulting configurations are scored using CNNs trained to evaluate phase identity from features such as diffraction patterns. Samples with higher likelihoods of matching the target structure (QC12 quasicrystals) are ranked with higher fitness scores. The parameter distribution is then updated to focus exploration toward regions yielding high-fitness assemblies. This closed-loop optimization continues until convergence on optimal design parameters.131
![]() | ||
Fig. 10 Inverse design of self-assembly using fitness-driven learning. (a) A generative model samples assembly parameters from a probability distribution, evaluates the resulting structures via convolutional neural networks trained on diffraction patterns, and updates the distribution toward high-fitness regions.131 (b) The approach enables discovery of design parameters for complex target structures such as quasicrystals (QC12), identified through iterative refinement of the design space.131 |
The approach illustrated in Fig. 10(b) successfully reverse-engineers quasicrystalline order from a soft repulsive potential, with the ML loop learning interaction rules that favour QC12 phases. Representative structures and their diffraction patterns (Fig. 10(b)) confirm that ML-guided inverse design can recover complex, non-periodic assemblies that are difficult to access through intuition alone. Such frameworks are extensible to experimental colloidal systems, offering a path toward programmable materials design driven by data.
A fundamental challenge lies in the transferability of models across different experimental systems. ML models trained on a specific dataset, such as particles imaged under a particular microscope, using a specific colloidal formulation or interaction potential, often fails to perform reliably when applied to other systems. Variations in particle size, polydispersity, surface chemistry, solvent conditions, and imaging resolution all affect the features that the model learns. This lack of robustness limits the model's utility and underscores the need for more generalizable architectures or domain adaptation strategies that can bridge across experimental platforms. A further challenge arises from the limitations of a single data source. ML models trained exclusively on real-space trajectories may not be robust enough for the entire colloidal size range, particularly for systems below the optical resolution limit. To create more complete and transferable framework, it is crucial to integrate descriptors from other techniques. For nanoscale systems where particle tracking is not possible, descriptors from methods like neutron and X-ray scattering can provide crucial structural information regarding how colloids spatially organize or aggregate. Furthermore, incorporating data from molecular simulations and theory can generate descriptors that extend beyond direct experimental accessibility and help bridge disparate length scales, leading to more comprehensive and powerful ML models.
Closely related is the issue of interpretability. Many powerful ML models, especially deep neural networks, operate as “black boxes” and learn internal representations that are difficult to map back onto physically meaningful quantities. For instance, a model might correctly classify crystalline versus amorphous regions but provide no insight into which geometric or topological features drive the decision. In colloidal science, where the goal is often to gain mechanistic understanding, not just prediction, this lack of interpretability restricts the scientific value of ML outputs. Developing interpretable models or applying techniques such as saliency mapping, feature attribution, or symbolic regression may help bridge this gap.
A further consideration is the interpretability of black-box models. Many powerful ML architectures, especially deep neural networks such as CNNs, often provide predictions without clear insight into the underlying decision process. However, in experimental colloidal science, the input variables used for training such as local density, bond-order parameters, or strain, frequently carry direct physical meaning. This opens the door for interpretability methods that quantify the relative importance of features and connect model outputs back to physically relevant descriptors. For example, Shapley Additive Explanations (SHAP) can attribute predictive power to individual variables, thereby highlighting which structural or dynamical quantities most strongly influence classification or regression outcomes.146–148 Integrating such approaches not only enhances transparency but also ensures that ML frameworks contribute mechanistic understanding, even when build on complex architectures like CNNs.
Bridging the gap between simulation-trained models and experimental data remains a major hurdle. Simulation data are typically clean and fully labelled, whereas experimental data are noisy and incomplete. Overcoming this gap may require sim-to-real transfer strategies, such as training models on hybrid datasets, applying noise injection during simulation, or using domain randomization to improve robustness to real-world variability.
Finally, there is a risk of overfitting to synthetic or curated datasets, particularly when models are trained on limited simulation data that do not reflect the variability of experimental conditions. This can be mitigated through diverse training sets, data augmentation, and cross-domain benchmarking, ensuring that models capture general trends rather than overfitting to idealized scenarios.
Despite these limitations, ML opens several exciting directions for colloid science. Machine learning is uniquely suited to uncovering subtle or hidden patterns in high-dimensional experimental datasets, potentially revealing new intermediate phases, deformation mechanisms, or assembly pathways. It also enables scalable and quantitative analysis of massive datasets, especially in data-intensive experiments. Once trained, predictive ML models can forecast structural stability, self-assembly outcomes, or mechanical response under varying conditions. These models offer fast hypothesis testing and materials screening capabilities.
Furthermore, ML can provide experimental feedback to improve or challenge existing theories, especially in regimes where classical models, like nucleation theory, are inadequate. Finally, integrating ML with inverse design approaches can accelerate rational materials development by identifying interaction parameters or guiding experimental design toward desired outcomes. This is particularly valuable when considering the importance of incorporating data from failed experiments. Unlike simulations that produce clean, successful outcomes, experimental data often include noise, artifacts, and results from trials that did not achieve the desired goal. By training models on these negative results, we can reduce data bias and help the model learn the boundaries of a system's behaviour, leading to more robust and generalized predictions. This approach accelerates the design of future experiments by guiding researchers away from unproductive regions of the parameter space and toward more promising conditions.
Colloidal experiments provide high-fidelity, high-dimensional datasets that capture microscale phenomena with a level of detail rarely attainable in atomic or molecular systems. Techniques such as confocal microscopy and traction rheoscopy yield reproducible, physically meaningful data, presenting an exceptional intersection between experimental observability and algorithmic inference.
Rather than viewing colloids as approximate analogues of atomic systems, their true value lies in their ability to generate structured, physically grounded descriptors. These include measures of local structural order, dynamical rearrangements, and stress distributions, resolved at the single-particle level. Such descriptors serve as robust inputs for supervised, unsupervised, and hybrid learning approaches, facilitating both prediction and mechanistic interpretation.
Importantly, colloidal systems complement existing tools in materials-focused ML by providing data that are both experimentally grounded and highly interpretable. While simulation and imaging-based approaches offer valuable insights, they often face challenges related to labelling and mechanistic transparency. Colloidal datasets, in contrast, support model development through direct access to particle-level interactions and structure–property relationships. Owing to their transparent structure–property relationships, colloidal datasets support model development that is not only predictive but also auditable and experimentally verifiable. This stands in contrast to many black-box models built from bulk measurements or coarse image embeddings.
The experimental versability of colloids further enables systematic perturbation for evaluating model performance. By varying interaction strength, volume fraction, or external fields, researchers can map model sensitivity and failure modes. As such, colloids provide a controlled platform for benchmarking generalization and advancing algorithmic robustness.
In the longer term, colloids may not be the materials of final interest, but they can be indispensable in building the foundations for trustworthy, physics-aware ML. Their role is akin to that of well-designed model organisms in biology: not the endpoint of inquiry, but a fertile ground for generating hypotheses, training algorithms, and validating mechanisms under well-controlled and observable conditions. When paired with advances in differentiable programming, simulation-aware architectures, and inverse design loops, colloids could help transform ML from a black-box accelerator into a transparent, hypothesis-driven research partner.
In summary, colloidal systems represent a high-leverage intersection between experimental resolution and algorithmic learning. Their ability to deliver structured descriptors across structural, dynamic, and mechanical domains—coupled with their compatibility with real-time imaging and systematic control—makes them uniquely suited to guide the development of next-generation materials intelligence. To unlock this potential, we must recognize colloids not simply as illustrative models of matter, but as data-rich training grounds for building predictive, interpretable, and transferable frameworks that connect microscopic rules to macroscopic functionality.
This journal is © The Royal Society of Chemistry 2025 |