Open Access Article
Benjamin Baylis and
John R. Dutcher
*
Department of Physics, University of Guelph, Guelph, ON N1G 2W1, Canada. E-mail: dutcher@uoguelph.ca
First published on 25th February 2026
We use machine learning to analyze atomic force microscopy-force spectroscopy (AFM-FS) measurements of the mechanical properties of soft nanoparticles on a hard substrate. We compare two approaches based on the manual selection of features that describe various aspects of the mechanical properties measured using AFM-FS – one that uses an extreme gradient boosting algorithm (supervised learning), and the other based on k-means clustering (unsupervised learning) to classify the force–distance curves according to the features. We used these approaches to generate two machine learning (ML) classifiers – one to differentiate between the soft nanoparticles and the hard substrate, and the other to identify structure within individual nanoparticles based on nanoscale variations in their mechanical properties. After training the classifiers on data from just two AFM images, we found that both approaches were successful in correctly identifying individual soft nanoparticles within the AFM-FS scans, whereas the supervised approach was more successful in correctly identifying and quantifying stiffer inner regions within the nanoparticles. The results of our study show that ML strategies can be used to accurately and efficiently characterize nanoscale variations in the mechanical properties of soft, biological materials.
An important use of ML approaches in materials science is to classify measured data to gain useful insights into the physical properties of different materials. Applying a ML classifier model to materials science data involves several steps: processing the data to identify a set of features (feature engineering) that describes the range of properties of the material, using an appropriate ML model to classify materials according to these features, training the model using the selected features, and evaluating the model on new, unseen data.9 To be successful, it is necessary to use high-quality data, feature engineering to accurately and adequately describe the range of the properties of the material, and a sophisticated, robust ML model that can efficiently identify relationships between the input features from new, unseen data and accurately relate them to a target classification or cluster.
Unsupervised and supervised learning are two widely used ML approaches to classify large amounts of data. Unsupervised approaches such as clustering techniques like k-means clustering are very effective in identifying patterns and have the advantage that they do not require labelled training data. Supervised approaches require labelled training data but can provide better interpretability compared to unsupervised approaches.10 For example, supervised models that use decision trees can provide information on the gain, or contribution to the performance of the model, for each input feature.10,11
Atomic force microscopy (AFM) is used extensively in materials science to characterize samples at micrometer and nanometer length scales, and its use in measuring variations in nanoscale mechanical properties has resulted in a better understanding of the morphology of soft, biological materials.12,13 In AFM force spectroscopy (AFM-FS), data is collected in a grid-like pattern with each pixel in the pattern corresponding to a force–distance curve collected as the AFM tip is pressed into and retracted from the sample (Fig. 1A). The sharp AFM tip and high force sensitivity allow the topography and mechanical properties of the sample to be mapped at high spatial resolution. Analysis of different regions of the approach and retraction curves (Fig. 1B) yields a variety of mechanical properties of the sample for each pixel of the images, including stiffness, modulus, adhesion, and sample height at high applied force.
The large amount of topographical and mechanical information contained in AFM-FS images make it an excellent candidate for the application of ML models. Recently, ML techniques have been developed to analyze AFM-FS measurements on a variety of soft and biological samples, allowing the distinction between live and dead bacteria,14 the identification of different cancerous cells,15–17 the location of features on cell surfaces responsible for aggressive cancers,18 and the identification of different polymers in a polymer blend.19 Petrov et al.19 demonstrated that values of mechanical properties determined using Bruker's proprietary Peak Force TappingTM mode can be used as inputs to supervised ML classifiers to predict the location of polymers on a surface. A more robust and flexible approach is to engineer features directly from AFM-FS force–distance curves. This provides the advantage that ML techniques can be applied to AFM-FS data acquired with any commercial AFM with a force spectroscopy mode, and it allows the user the possibility of fine tuning the choice of properties for use in expanded ML frameworks. In addition, user-controlled features combined with interpretable ML classifiers provide additional insights into material properties.
In the present work, we use AFM-FS to characterize the nanoscale mechanical properties of phytoglycogen (PG), a highly branched, glucose-based polysaccharide produced as soft, dense, compact, hydrated, hairy nanoparticles with a diameter of 42 nm and an underlying dendritic architecture.19–25 These physical characteristics of the PG particles lead to unique and useful properties, e.g., water within the particles that is much more ordered than bulk water,26–29 zero-shear viscosity that increases much more gradually with concentration than that for typical polysaccharides,24,30 and strong associations that can be achieved between the particles and small bioactive molecules.31,32 The morphology and functionality of PG nanoparticles can also be tuned through simple mechanical or chemical modifications.30,33–38 The properties of native and modified PG, together with their biodegradability and non-toxicity, make the particles desirable as unique additives in personal care formulations39,40 and as nanocarriers for bioactive molecules.41–43
We have recently used AFM-FS to study the nanoscale mechanical properties of PG nanoparticles in water immobilized on smooth hard gold substrates (Fig. 1).21 Measurements of high-resolution maps of force–distance curves of many individual PG nanoparticles revealed the high deformability of the hydrated nanoparticles, with the highly branched, dendritic chain architecture resulting in a stiffer inner structure surrounded by a softer outer structure within each particle. In the AFM-FS images, we observed stiffer inner regions, corresponding to the interaction of the AFM tip with the stiffer inner structure through the softer outer structure, and softer outer regions, corresponding to the interaction of the AFM tip with only the softer outer structure. AFM-FS measurements also allowed the determination of the average equivalent spherical radius (
= 23.0 nm), in close agreement with values determined using small angle neutron scattering20,22 and dynamic light scattering,30 and the average Young's modulus (Ē = 688 kPa), which was consistent with values of the bulk modulus determined using osmotic pressure techniques.28,29 In this previous AFM-FS study, the PG nanoparticles on the underlying hard gold substrate were identified by examining the fit of each force–distance curve to a hard substrate model and manually selecting regions with a relatively low goodness-of-fit that likely corresponded to the measurement of soft PG nanoparticles on the hard substrate. The dendritic stiffer inner structure was observed in height images collected at high force values (∼2 nN) when the particles are compressed onto the hard substrate. The stiffer inner and softer outer regions within each nanoparticle were identified by fitting height images collected at high force to a mixed Gaussian model to determine an appropriate height threshold between the stiffer and softer regions within the particles. These manual methods of identifying the particles and the spatial extent of their stiffer inner regions are time consuming and prone to uncertainties due to slight variations in the underlying substrate topography and the subjective choice of threshold values.
In the present study, we developed a ML approach to analyze large numbers of force–distance curves collected in AFM-FS measurements of PG nanoparticles on hard gold substrates, which allows the identification of the particles and the quantification of nanoscale variations in their mechanical properties. We began by manually selecting different measures of the mechanical properties (features) in different regimes within the force–distance curves using a feature engineering process. We then trained two supervised ML classifiers on a limited amount of data (just two 512 pixel × 512 pixel AFM images) using the selected features: the first ML classifier was trained to distinguish between the soft PG nanoparticles and the surrounding hard substrate, and the second ML classifier was trained to locate the spatial extent of the stiffer inner regions within the PG nanoparticles. We then tested the two supervised ML classifiers on new, unseen data to evaluate their accuracy by qualitatively comparing height images from the AFM-FS measurements to the classifier outputs and by quantitatively comparing the values of the particle radius and the Young's modulus to average values determined previously using the manual approach described in ref. 21 We also compared the performance of the two supervised ML classifiers with that of an unsupervised learning approach based on k-means clustering using the same features and training data. The unsupervised approach removes the need for labelled training data that is required in the supervised approach. This comparison showed that both approaches were successful in identifying the location of the soft PG nanoparticles on the hard substrate, whereas the supervised learning approach was more successful in determining small nanoscale variations in the mechanical properties within the PG nanoparticles.
The substrates for the AFM-FS experiments were prepared using the following procedure. Glass slides were cleaned by first submerging them in a sulfuric acid (H2SO4) bath for 15 min, rinsing thoroughly with Milli-Q water, then submerging them in a heated 1
:
1 solution of chloroform (CHCl3) and methanol (CH3OH) for an additional 15 min. Slides were removed and thoroughly rinsed with Milli-Q water and methanol before being dried with a stream of nitrogen gas. A titanium adhesion layer (∼5 nm thick) was sputter deposited onto the clean glass slides, followed by the sputter deposition of a layer of gold (∼200 nm thick) to create stable gold-coated substrates. Small sections (∼3 mm × 3 mm) were cut using a diamond tip pen and smooth terraces of gold (lateral extent ∼350 nm) were created by annealing the small sections of gold-coated glass slides at 675 °C for 65 s.44 The annealed substrates were then submerged in a 0.4% w/w solution of 4-mercaptophenylboronic acid (4-MPBA) in methanol for 24 h to allow the self-assembly of a monolayer of 4-MPBA on the gold surface via the formation of gold-thiol bonds between the gold and 4-MPBA. To measure isolated PG nanoparticles, a drop of a dilute (0.005% w/w) dispersion of PG in Milli-Q water was placed on the surface of the 4-MPBA/gold substrate for 12 min and then rinsed with water. This allowed the covalent bonding of an optimal number density of isolated PG nanoparticles to the 4-MPBA molecules (Fig. 1C). Additional details on the preparation of the samples can be found in ref. 21
To create a robust ML classifier, it is necessary to construct a feature vector with values that represent the different distinctive mechanical properties of a material. For AFM-FS measurements, materials may have similar properties in one or more regimes of the force–distance curves but different properties in others, e.g., similar surface characteristics resulting in similar adhesion values, but different stiffness or modulus values due to different structure. The feature vector should account for these differences, and this is especially important when classifying different regions within a single object such as identifying the location of the stiffer inner regions within the PG nanoparticles investigated in this work.
To select features that describe the distinctive mechanical properties of the PG particles, we considered different portions of the approach and retraction curves. For the approach curves, we considered features that characterized three regimes: near the contact point, near the maximum applied force, and within the range from low to high forces. Features calculated near the contact point describe the material properties near the sample surface such as the height deformation at low values of the applied force and the modulus corresponding to small indentations. Features calculated near the maximum applied force describe the material properties under compressive stress such as the height deformation at high values of the applied force and the stiffness (slope of the force–distance curve near the maximum applied force). Features calculated within the range from low to high forces are influenced by the material properties near its surface in combination with properties that describe the material under compressive stress, such as height deformation and the modulus calculated within this force range. For the retraction curves, we considered features that characterized three regimes: near the maximum applied force, near the surface of the sample, and within the range from low to high forces. Features calculated near the maximum applied force include the recovery of the sample height after reducing the applied force at high forces. Features calculated near the surface of the material include the recovery of the sample height at low forces. Features calculated within the range from low to high forces include the recovery of the sample height after reducing the applied force within this force range. In addition, we characterized the adhesion between the AFM tip and the sample by calculating the adhesion force, which is the magnitude of the maximum negative force in the retraction curve, and the adhesion energy, which is the area of the retraction curve below zero force. These adhesion-related quantities are significantly larger for measurements on the PG particles than for those on the hard substrate. We note that the adhesion between the AFM tip and the sample is influenced by the surface chemistry of the tip. Because of this, comparisons between measurements should involve the same type of cantilever tip, as we have done in the present study. Also, all measurements in the present study were performed in water to eliminate the influence of environmental conditions on the tip-sample adhesion that are present when performing measurements in air. The different height deformation and recovery values described above were obtained by examining the difference in tip height at six different applied forces for both the approach and retraction curves: 2% (0.04 nN), 5% (0.1 nN), 25% (0.5 nN), 50% (1 nN), 75% (1.5 nN), and 100% (2 nN) of the maximum applied force.
For force–distance curves collected on PG nanoparticles, we note that features calculated at higher forces can be influenced by the stiffer inner structure of the particles as well as the underlying hard substrate, whereas features calculated at lower forces are only influenced by the softer outer structure of the particles. For force–distance curves collected on the periphery of the PG nanoparticles, corresponding to their softer outer region, the influence of the underlying hard substrate dominates the force on the AFM tip at lower forces than for force–distance curves collected within the PG nanoparticles due to their stiffer inner structure. This is because it requires more force to compress the particle when pressing on the stiffer inner region. For force–distance curves collected on the hard substrate, the height deformation values are much smaller than those corresponding to force–distance curves collected on the PG particles, especially at low forces.
We constructed a feature vector for each pixel in the AFM scan that described the mechanical properties of the sample at each pixel location using a set of 37 features (Table S1). We chose the engineered features manually, instead of using AI generated features (by either generative or non-generative means), since the manually selected features had well-defined physical interpretations that allow more direct comparisons with results obtained using other techniques. In addition, the manual selection of features reduced the amount of data used as input to the machine learning classifiers, which correspondingly reduced the computational time and cost.
We also used two unsupervised ML classifiers to distinguish the PG particles from the surrounding hard substrate (unsupervised particle/substrate ML classifier) as well as to distinguish the stiffer inner region from the softer outer region within each particle (unsupervised inner particle structure ML classifier). In these cases, we classified individual force–distance curves using k-means clustering49 in the library Scikit-learn (version 1.0.2) and implemented in Python.
To train and test the particle/substrate ML classifiers, we used two AFM-FS images (the training/testing dataset): one that contained ∼60 PG nanoparticles and one that contained only the 4-MPBA/gold substrate. For the particle/substrate ML classifiers, we used 80% of the data contained in the two images to train the ML classifier (training dataset) and the remaining 20% of the data to test the ML classifier (testing dataset). To train and test the inner particle structure ML classifiers, we used one AFM-FS image that contained the same PG nanoparticles used in the particle/substrate ML classifier from the training/testing dataset, using only pixels corresponding to PG nanoparticles in this image. For the inner particle structure ML classifiers, we used 90% of the data contained in the determined PG nanoparticle pixels to train the ML classifier (training dataset) and the remaining 10% of the data to test the ML classifier (testing dataset). The ML classifiers were then evaluated on new, unseen data from three different AFM-FS images of PG nanoparticles on the 4-MPBA/gold substrate.
144 input vectors from measurements of ∼60 PG particles on a hard substrate, as well as 262
144 input vectors from measurements of a bare hard substrate. We used 80% (419
430 force–distance curves) of the labelled curves to train the classifier, which resulted in 37
155 force–distance curves labelled as corresponding to PG nanoparticles and 382
275 force–distance curves labelled as corresponding to the hard substrate.
We trained the unsupervised particle/substrate ML classifier using the same selected input features (Table S1) and the same training dataset (Fig. 3) used to train the supervised ML classifier with a set number of 2 clusters.
462 input vectors from measurements of ∼60 PG nanoparticles on a hard substrate. We used 90% (41
815 force–distance curves) of the labelled curves to train the classifier, which resulted in 16
009 force–distance curves corresponding to the stiffer inner region, and 25
806 force–distance curves corresponding to the softer outer region of the particles.
We trained the unsupervised inner particle structure ML classifier using only the force–distance curves that corresponded to the PG nanoparticles as determined by the unsupervised particle/substrate ML classifier. Training was performed using the same selected input features (Table S1) used for the supervised ML classifier with a set number of 2 clusters. We used 90% (39
236 force–distance curves) of the curves to train the classifier.
858 force–distance curves used to test the supervised particle/substrate ML classifier (testing dataset), we obtained an overall pixel classification accuracy of 99.5% with a false positive percentage of 0.3% and a false negative percentage of 3.1% when classifying the force–distance curves labelled as corresponding to PG (Fig. S2A and B). The features that contributed the largest improvement in the accuracy of the supervised particle/substrate ML classifier were obtained from the low force regime of the approach portion of the force–distance curves and describe the deformations at low applied forces (deformations measured for forces between 0.04 nN and 1 nN). The top three most important features describe this low force regime and accounted for 83.7% of the improvement in the accuracy of the supervised particle/substrate ML classifier (Fig. S2C). The large improvement in the accuracy of classifying the pixels resulting from features describing the low force regime is reasonable given the large difference between the mechanical properties of the soft PG nanoparticles and the hard substrates.
To qualitatively assess the performance of the supervised particle/substrate ML classifier, we compared the contact height image of the entire scan (Fig. 5A) to the PG nanoparticle regions, as determined using the supervised particle/substrate ML classifier (Fig. 5D). From this comparison, we can see that the supervised particle/substrate ML classifier is able to accurately determine the locations of the PG nanoparticles on the substrate: PG nanoparticle regions in Fig. 5D correspond to an average probability of 98% in Fig. 5B. Fig. 5 also highlights a region in the contact height image that contains circular objects that closely resemble the PG nanoparticles (indicated by the black arrow in Fig. 5A). By comparing the contact height image of the input data in Fig. 5A to the particle regions determined by the supervised particle/substrate ML classifier output in Fig. 5D, we see that these circular objects, which could be mistaken for PG nanoparticles, were correctly classified by the supervised particle/substrate ML classifier as corresponding to the hard substrate. For completeness, the contact height image and the output from the supervised particle/substrate ML classifier for the additional new, unseen data are shown in Fig. S4A–D and S5A–D.
To quantitatively assess the performance of the supervised particle/substrate ML classifier, we compared the median of the Young's modulus values E of the pixels corresponding to the PG nanoparticles in the new, unseen data to the mean value Ē of the median Young's modulus values of isolated PG nanoparticles determined manually in previous work.21 The values of E were determined using the fitting procedure outlined in ref. 21 in which a modified Hertz model was used to account for the close proximity of the underlying hard substrate.50 A histogram of the values of E for all pixels corresponding to the PG nanoparticles (N = 112
897) as determined by the supervised particle/substrate ML classifier is shown in Fig. S7B. The median value of this distribution is 693 kPa with a standard error of the median of 2 kPa, which is in excellent agreement with the mean value Ē of the Young's modulus values of isolated PG particles reported previously (Ē = 688 kPa with a standard error of 23 kPa).21 We also compared the average equivalent spherical radius,
, of individual isolated PG nanoparticles in the new, unseen data to the average value
of hydrated PG nanoparticles reported previously.21 The output of the supervised particle/substrate ML classifier consisted of 79 isolated PG nanoparticles (not on the boundary of the scan and not touching other particles) (Fig. S6). We calculated the equivalent spherical radius r of each of these particles using the method outlined in ref. 21 in which the volume V, corresponding to the sum of the volume of each pixel within the particle region, was calculated using the contact height image and the height of the underlying substrate. We then calculated the value of r using the relationship r = (3V/4π)1/3. A histogram of the resulting values of r is shown in Fig. S7A for which the mean radius value
is 20.2 nm with a standard error of 0.7 nm. This value of
is in good agreement with the
value of 23.0 nm (with a standard error of 0.5 nm) determined previously using manual methods21 as well as the nanoparticle radius determined using small angle neutron scattering20,22 and dynamic light scattering.30 The excellent agreement between the values of the average Young's modulus E and the equivalent spherical radius r of isolated PG nanoparticles found in the present study and those determined previously, including those using manual methods, provides confidence that the supervised particle/substrate ML classifier correctly identifies the location and lateral extent of the PG nanoparticles on the substrate.
858 force–distance curves) for testing both classifiers and compared the output from the unsupervised ML classifier to the labels used to test the supervised ML classifier. For the unsupervised particle/substrate ML classifier, we obtained an overall pixel classification accuracy of 99.0% with a false positive percentage of 0.4% and a false negative percentage of 7.2% when classifying the force–distance curves labelled as corresponding to PG (Fig. S8A and B). A visual comparison between the output for the supervised and unsupervised particle/substrate ML classifiers on the training/testing dataset is provided in Fig. S9. The large difference in the mechanical properties between the soft PG nanoparticles and the hard substrate results in a high classification accuracy of the unsupervised ML classifier with results that were very similar to those obtained for the supervised particle/substrate ML classifier.
To qualitatively assess the performance of the unsupervised particle/substrate ML classifier, we compared the contact height images of the three AFM scans of new, unseen data to the PG nanoparticle regions, as determined using the trained unsupervised particle/substrate ML classifier (Fig. S10). From this comparison, we can see that the trained unsupervised particle/substrate ML classifier is able to accurately determine the locations of the PG nanoparticles on the substrate.
To quantitatively assess the performance of the unsupervised particle/substrate ML classifier and to compare its performance to the supervised particle/substrate ML classifier, we examined the number of matching pixel classifications between the outputs of the trained unsupervised (Fig. S10D, E and F) and supervised (Fig. 5C and Fig. S4C, S5C) particle/substrate ML classifiers. We found that 98.5% (774
359 force–distance curves) of all pixels in the three AFM scans of new, unseen data (786
432 force–distance curves) had the same particle/substrate classification in the outputs of the two ML classifiers. We note that we achieved similar results when applying the unsupervised classifier to the new, unseen data without training on the training/testing dataset.
The similarity in the outputs of the unsupervised and supervised particle/substrate ML classifiers indicates that materials with large differences in mechanical properties can easily be distinguished using the methods outlined in this work and may not require the need for supervised training using labelled data.
To qualitatively assess the performance of the supervised inner particle structure ML classifier, we compared the peak force height images (for which the PG nanoparticles are compressed onto the substrate revealing the stiffer inner structure), such as that shown in Fig. 6B, to the corresponding images of the stiffer inner regions determined by the supervised inner particle structure ML classifier in the classification map (Fig. 6D). From this comparison, we observed that the supervised inner particle structure ML classifier can accurately determine the location of the stiffer inner regions in PG (the white inner regions in Fig. 6D). For completeness, the peak force height image and the output from the supervised inner particle structure ML classifier for the additional new, unseen data are shown in Fig. S4E–G and S5E–G.
To quantitatively assess the performance of the supervised inner particle structure ML classifier, we compared the median of the Young's modulus values E of the pixels corresponding to the stiffer inner and softer outer regions in the new, unseen data to the mean value Ēof the median Young's modulus values of the stiffer inner and softer outer regions determined previously using a manual approach in previous work.21 A histogram of the Young's modulus E values corresponding to the stiffer inner (orange) and softer outer (blue) regions of the PG nanoparticles as determined by the supervised inner particle structure ML classifier is shown in Fig. S7C. The median E value for the stiffer inner region (N = 47
797) was 1006 kPa with a standard error of the median of 3 kPa and the median E value for the softer outer region (N = 65
100) was half that of the stiffer inner region, with a value of 503 kPa with a standard error of the median of 2 kPa. These absolute and relative values of the moduli are in excellent agreement with the average median E values of isolated PG nanoparticles determined previously for the stiffer inner region (Ē= 1053 kPa with a standard error of 33 kPa) and softer outer region (Ē= 518 kPa with a standard error of 20 kPa). This excellent agreement between the average Young's modulus E values for the stiffer inner and softer outer regions determined using the supervised inner particle structure ML classifier and those determined previously using a manual approach provides confidence that the supervised inner particle structure ML classifier correctly identifies the location and lateral extent of the stiffer inner regions within the PG nanoparticles.
In addition to the high level of accuracy and the elimination of user bias, the supervised ML based analysis also results in a significant reduction in the analysis time. After training the two supervised classifiers on a relatively small, labelled database (just two 512 pixel × 512 pixel AFM images), the time required to identify the particle locations and the stiffer inner and softer outer regions within the particles in new, unseen data was reduced from hours to tens of seconds.
To qualitatively assess the performance of the unsupervised inner particle structure ML classifier, we compared the peak force height images (for which the PG nanoparticles are compressed onto the substrate revealing the stiffer inner structure) for the three new, unseen AFM scans (Fig. 6B and Fig. S4E and S5E) to the corresponding images of the stiffer inner regions determined by the trained unsupervised inner particle structure ML classifier in the classification maps (Fig. S10J, K and L). From this comparison, we observed that the unsupervised ML classifier inconsistently locates the stiffer inner regions within the PG nanoparticles and is less accurate than the corresponding supervised ML classifier. This is most likely due to the relatively small difference between the modulus values of the stiffer inner and softer outer regions, which results in rather indistinct boundaries between the two regions.
| This journal is © The Royal Society of Chemistry 2026 |