Jesse M. Hanlan‡
a,
Sam Dillavou‡
a,
Andrea J. Liu
a and
Douglas J. Durian
*ab
aDepartment of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA. E-mail: djdurian@physics.upenn.edu
bDepartment of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA 19104, USA
First published on 16th July 2025
The sudden arrest of flow by formation of a stable arch over an outlet is a unique and characteristic feature of granular materials. Previous work suggests that grains near the outlet randomly sample configurational flow microstates until a clog-causing flow microstate is reached. However, factors that lead to clogging remain elusive. Here we experimentally observe over 50000 clogging events for a tridisperse mixture of quasi-2D circular grains, and utilize a variety of machine learning (ML) methods to search for predictive signatures of clogging microstates. This approach fares just modestly better than chance. Nevertheless, our analysis using linear Support Vector Machines (SVMs) highlights the position of potential arch cornerstones as a key factor in clogging likelihood. We verify this experimentally by varying the position of a fixed (cornerstone) grain, which we show non-monotonically alters the average time and mass of each flow by dictating the size of feasible flow-ending arches. Positioning this grain correctly can even increase the ejected mass by 70%. Our findings suggest a bottom-up arch formation process, and demonstrate that interpretable ML algorithms like SVMs, paired with experiments, can uncover meaningful physics even when their predictive power is below the standards of conventional ML practice.
There is substantial evidence that flow microstates involving (D/d)n relevant grains near the outlet are sampled randomly until one deterministically leads to a clog.11,12 Here, D/d is the ratio of the outlet diameter to the grain diameter, and n is the dimensionality of the system, indicating that these grains are contained in an area (n = 2) or volume (n = 3) above the outlet, not only in the arch. This model predicts a non-diverging form of average mass ejected per flow event 〈M〉 ∝ exp[(D/d)n], as well as an exponential distribution of ejected masses, both of which match experimental data well.11,12,17–19 Signatures of these clog-forming flow microstates remain unknown, but minimal differences between clogging in air and water suggest that they are primarily determined by grain positions, rather than momenta and contact forces.12
This picture suggests that the structure of clogging microstates is important to the clogging process. Machine learning has been successful in identifying a link between local structure and dynamics in disordered granular systems where particle rearrangements play a key role, such as glassy liquids and granular packings,20 and several types of disordered (granular) solids.21,22 In these works, however, structure was used to predict localized grain-scale rearrangements, which occur frequently throughout the system. In contrast, clogging involves a larger number ∼ (D/d)n of grains, and occurs only once per flow event. This makes the problem both less spatially localized and more difficult to adequately sample.
Here we use machine learning tools to predict clogs from a dataset of over 50000 flow-to-clogging events obtained using an automated hopper. We analyze positional and momentum flow microstates and find that nonlinear deep learning methods or those that include grain momenta perform only marginally better than linear, grain-position-only methods. All methods completely fail to predict clogging until only a short time prior to clogging (10s of ms, see ESI,† Appendix Predicting Individual Clog Formation), supporting the picture of Poissonian sampling of flow microstates.
Within that short time, the predictive accuracy of our simplest model, a linear Support Vector Machine (SVM) given solely positional information, is 58%. This is only marginally higher than random guessing (50%), an unsatisfactory result by prediction and benchmarking standards. Nevertheless, this model identifies the precise location of potential cornerstones of an arch as an important predictor of clogging. We confirm that this correlational observation is causal using experiments with a fixed cornerstone grain. This key grain controls the ejected mass by dictating the range of possible flow-ending arches.
To begin an experiment, an exciter (green in Fig. 1) situated near the outlet vibrates the hopper, dislodging the arch and initiating flow. The grains then flow freely under gravity until a clog spontaneously forms. The region near the outlet is monitored by a digital camera (yellow) at 130 frames per second. The system is considered stably clogged when no grains have exited the hopper for 5 continuous seconds. For each image taken, custom MATLAB code tracks each grain's size (small, medium, large) and location through time to ±σtracking = 0.14 mm precision (0.016dL). This is accomplished prior to starting the next flow, so that tracking data rather than raw video may be written to file to minimize storage requirements. A representation of this process, as well as a stable arch of grains, is shown in Fig. 1b.
Grains that pass through the outlet are directed into a closed loop chute with a blower attached at the base (red in Fig. 1a). An upward airflow recirculates grains to the top of the hopper, removing the need for refilling, and allowing the experiment to continue autonomously without intervention. The air flow is placed sufficiently far and shielded from the outlet such that air currents do not disturb grains in our region of interest, and vents (see Fig. 1a) are placed at the top and sides of the hopper to prevent circulating currents. We perform over 35000 experiments in this manner for a single outlet size, D = 3.86dL, and at least one thousand experiments each for D = {3.61, 3.74, 3.98, 4.15}dL, over 7000 total. We additionally perform over 13
000 experiments with a fixed grain and outlet size D = 3.86dL (Fig. 4).
We confirm a variety of standard granular flow behaviors in ESI,† Appendix Hopper Phenomenology: the distribution of flow events is exponential (Poissonian), the average event size grows exponentially in (D/d)2, and the average discharge rate follows the 2D Beverloo law. The large quantity of data captured with the autohopper presents a wide range of analysis opportunities. For instance, the dataset contains enough flow events to inform a multiplicative noise model that captures the dynamics of the flow rate and the relative stability of arches.23 However, for analysis in this work, we restrict our machine learning dataset to a one outlet size, D = 3.86dL, and use the 29000 flows that last at least 0.23 seconds, or 10% of the average flow length. The data for all flows and all outlet sizes is accessible on the Dryad repository.24 We also provide a Python script to automatically create folders of the expected classes described in the following section.25
To be precise, our aim is to use only instantaneous information contained in the microstate (positions, sizes, and momenta of grains) to perform 3 binary classifications to distinguish the Flowing state from the Clogging, Clogged and Emptied states, respectively. Thus, our goal is to produce a binary classification function that takes a microstate Ωi as input, and produces a single number i, which distinguishes between two classes of microstates (e.g.
i > 0 for Clogging, Clogged or Emptied, and
i < 0 for Flowing). We compose a function f with many adjustable parameters
, which we optimize for this purpose using supervised machine learning. Here we assume familiarity with this process, but for an expanded description, see ESI,† Appendix Supervised Machine Learning.
Our trainable functions f in this work are primarily linear Support Vector Machines (SVMs),26 but we also train a Convolutional Neural Network (CNN)27,28 for comparison. We use hinge loss26,29 for the SVMs and crossentropy loss27,28,30 for the CNN, with further training details given in ESI,† Appendices Supervised Machine Learning, SVM Cost Minimization and CNN Reconstruction. We also briefly discuss analysis using Graph Neural Networks (GNNs) in ESI,† Appendix Graph Neural Networks.
In linear SVMs, f takes the form
![]() | (1) |
![]() | (2) |
Method | Clogging (%) | Clogged (%) | Emptied (%) |
---|---|---|---|
Linear SVM, GDG | 58 | 70 | 95 |
Linear SVM, GBP | 57 | 68 | 95 |
Linear SVM, GDG (+velocity) | 59 | 78 | 99 |
Convolutional neural network | 61 | 84 | 99 |
The final weights in the linear SVM have specific spatial importance, that is, they denote the locations in which the presence of a grain correlates with increased likelihood of a given state, for example Clogging. However to understand our solutions, we must visualize not simply the weights, but the average effect this weight has when applied to the training data. Put another way, the features with greatest variance in their contributions σj2 = var[θj × GDGj(Ωi)]trainingset are those with greatest impact on the decision function, and therefore the most important. We plot feature significance αj = sign(qj)σj2 spatially in Fig. 3b–d. A direct comparison between feature weights θ and feature significance α can be found in ESI,† Appendix SVM Cost Minimization.
Despite modest predictive accuracy of the SVM, the feature contributions still give insight into spatial factors of clog formation. First, the prediction of Emptied vs. Flowing states gives an unsurprising feature map in Fig. 3b, where grains (likely falling) in the outlet suggest an emptied state is extremely unlikely. The Clogged vs. Flowing feature significance map in Fig. 3c suggests a relevance of the overall grain density gradient. This may be a means of sensing a slowing flow, occurring at this stage. The fact that velocity information significantly improves the accuracy only of the Clogged prediction fits nicely with this interpretation (see Table 1).
Notably, when predicting clogging states (Fig. 3d) we see high-valued blue and red regions next to each other at the edges of the outlet. This indicates that moving a cornerstone grain slightly to the right or left might change the prediction drastically. These results suggest that the lateral movements of a single grain in this location may have out-sized importance in clog formation. It is this mechanism that we confirm experimentally in the next section. Further discussion of these significance maps, as well as those using the alternative (Behler–Parrinello31) structure functions are included in ESI,† Appendix Alternate Analyses and Fig. S4.
Guided by our machine-learned solutions, we experimentally measure the impact of ‘cornerstone’ grain position. We place a fixed grain (magnet) of diameter dFG = dM on the floor of the hopper near the outlet, as shown by the drawings in Fig. 4a. This grain is held in place by another magnet on the exterior of the hopper. We define its position x to be zero when the grain is centered over the right-hand outlet boundary, and positive when moved to the right (away from the opening). We perform over 7500 experiments with a fixed grain, excluding any flows where we detect any movement of this grain from analysis (fewer than 200).
We find a strong and non-monotonic relationship between the position of the fixed grain x and the resulting average mass flow 〈MFG〉, as shown in Fig. 4a. Strikingly, even when the grain does not obscure the outlet (x > 0.5dFG), its placement may change the average ejected mass by a factor of almost three, including increasing its value above the no fixed-grain case (dashed line in Fig. 4a) by 70%. The mechanisms underlying these effects can be understood by visualizing the average final arch grains at several values of x, as shown in Fig. 4b.
When obscuring the outlet (small x, Fig. 4b1), the fixed grain serves as the cornerstone of the final arches, which are relatively narrow. As x is increased, the region between the cornerstone and outlet becomes excluded space, unable to stably admit another grain, resulting in wider and wider arches (Fig. 4b2) and increased ejected mass. At larger distances from the outlet x > (dFG + dS)/2 ∼ 0.9dFG, the fixed grain allows for free-flowing grains to act as a stable cornerstone, resulting in narrower arches (Fig. 4b3) and reduced ejected mass once again. As x increases further, the fixed grain continues to indirectly dictate cornerstone position, even when it is multiple diameters away from the outlet (Fig. 4b4 and b5). At this stage, the effect of x is reduced, which we attribute to the random availability of differently-sized cornerstones. Overall, we find a clear correlation between average arch width and the average ejected mass, as shown in Fig. 4c. Thus, the non-monotonic dependence of flow rate on fixed grain position x (Fig. 4A) is explained as follows. x affects average arch width non-monotonically due to commensuration effects (Fig. 4B), and arch width monotonically affects average ejected mass 〈M〉. This observation dovetails nicely with the Thomas and Durian model,11 as wider arches require a larger area of grains to cooperate. As a result, there is a smaller likelihood of clogging per sampling time. We find that arches formed in the presence of a fixed grain are slightly wider and significantly taller than those generated without one, as shown in Fig. 4d, perhaps a result of the additional stability of the fixed grain.
Of course, our numerous attempts do not prove there is no better solution, and we encourage other researchers to try their hand in improving upon our benchmarks. To facilitate such a competition we make our data available at.24 Additionally, we have detailed a variety of alternative analyses on this data and potential pitfalls in ESI,† Appendix Future Directions. One notable pitfall is the imposition of too much coarse-graining, including prematurely enforcing symmetries, even those imposed by the boundary conditions (such as left/right symmetry). In optimization problems it is often helpful to have additional degrees of freedom to find the solution, even if they are ultimately not required.32 We note that our models were trained only on one outlet size, making it prudent for them to ignore the (unchanging) outlet pixels and thus unlikely that any will generalize. However, our physical understanding of the SVM predictions (Fig. 4) suggests that models able to capture cornerstone position relative to the outlet (e.g. a CNN) could predict similarly well across outlet sizes, if provided the right training data.
In this study we ran headlong into another inherent limitation of ML analysis besides its voracious need for data. Because finding good solutions often requires over-parameterization,32 solution weights typically contain spurious variation; therefore, one can only claim that predictive information is present somewhere in the data. This type of claim is not without its scientific uses,33 however it does not, in itself, provide mechanistic understanding. Moreover, ML analyses (Fig. 3) are correlational, meaning that even high prediction accuracy provides an insufficient basis for any causal claims.
Despite this, we have uncovered new physics. In particular, by inspecting the features of greatest significance in our simplest method, a linear Support Vector Machine (SVM), we were able to identify that grains in the region immediately adjacent to the outlet are potentially critical to the onset of clog formation. To test this hypothesis directly, we performed a series of experiments with fixed grains in this key position. While many studies have modified outlet width, angle, and/or shape,5,10–16,34,35 or added ‘floating’ obstacles above the outlet,36,37 our experiments are distinct in that they sample a subspace of plausible positional microstates when no fixed grain is present. This allows us to probe the enormously high-dimensional dynamics of clog formation efficiently. For instance, it allows us to make some rare states (e.g. the wide arches in Fig. 4b2), common, and therefore far easier to study. Further, our method allows us to make causal claims about key grains affecting clog formation, which is unlike perturbing or analyzing already-stable arches,23,38,39 where only counterfactual arguments about formation may be made (e.g. were this arch to form differently, it wouldn't clog).
These experiments showed that the position of the ‘cornerstone’ grain has a large effect on ejected mass, potentially increasing it by 70%. Finally, we found that this relationship stems from the cornerstone grain's ability to dictate the size of final arches, and thus the clogging likelihood. Our results suggest a two-step process for clog formation. First, the base grains dictate the available space of stable arches, whose ultimate widths do not vary dramatically (see Fig. 4b). Second, grain microstates are sampled until one forms a clog, with likelihood monotonically decreasing with arch width (see Fig. 4c). The first step (base width) is continually resampled during a flow, resulting in draws from the probability distributions in the second step (arch formation) at width-dependent rates.
In sum, our results give causal insight into clogging, a rare, nonlinear, collective event that is influenced by poorly understood processes like frictional aging.33 This provides a heartening lesson for utilizing machine learning in scientific exploration: even when ML methods fail to make accurate predictions, their ability to find high-dimensional correlations can guide experiments on a broad range of complex phenomena across many fields.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5sm00367a |
‡ J. M. H. and S. D. contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |