Open Access Article
Kieran
Nehil-Puleo
a and
Zhongyue John
Yang
*bcdef
aInterdisciplinary Material Science Program, Vanderbilt University, 2301 Vanderbilt Pl, Nashville, TN 37235, USA
bDepartment of Chemistry, Vanderbilt University, Nashville, TN 37235-1826, USA
cDepartment of Chemical and Biomolecular Engineering, Vanderbilt University, 2301 Vanderbilt Place PMB 351826, Nashville, TN 37235-1826, USA
dCenter for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
eVanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN 37235, USA
fData Science Institute, Vanderbilt University, Nashville, TN 37235, USA
First published on 2nd February 2026
The mechanical properties of amyloid-based materials are governed by fibril geometry, sequence, and polymorphism, yet systematic exploration of this vast design space has been limited by the lack of high-throughput modeling tools. Here we present FiberForge, an open-source workflow that automates construction of amyloid protofibrils, streamlines high-throughput simulations of amyloid deformation, and analyzes fibril trajectories to estimate mechanical properties and fracture mechanisms. Using 374 full-length amyloid crystal structures from the Protein Data Bank, FiberForge rebuilds fibrils with a mean per-chain RMSD of 1.7 Å (median 2.2 Å), demonstrating accurate structural recovery across wide sequence (18–420 aa) and symmetry ranges. Extensive SMD benchmarking on Aβ(1–42) (2MXU) yields a mean rupture force of 1.534 ± 0.164 nN from 232 replicas; bootstraping analysis shows that three replicas suffice for converged elastic-modulus and strength estimates. High-throughput screening of the amyloid fiber dataset produces elastic moduli of 0.2–20 GPa and ultimate tensile strengths of 0.1–1 GPa. Comparison with four AFM-characterized systems shows agreement within an order of magnitude, underscoring the method's predictive capability. FiberForge's screening results also enable larger-scale sequence–structure–property analysis, revealing that mechanical behavior is correlated with molecular assembly geometry, especially hydrogen-bond density. While earlier work suggested the relevance of these features for particular systems, our results demonstrate their importance across diverse fibril architectures. FiberForge thus provides an end-to-end platform for molecular modeling and design of amyloid materials, enabling physics-based identification of sequences and polymorphs with targeted mechanical behavior.
Despite the successful application of amyloids, the design of amyloid materials still faces several key challenges: the selection of the optimal fiber from the vast space of possible amyloid-forming proteins, prediction of the critical fiber length for the change of the fracture mechanism, differing properties exhibited by structural polymorphs, and lack of a curated dataset of amyloid structures relevant for mechanics investigation. On the issue of sequence selection, prior studies have shown that many proteins, even those not conventionally associated with amyloids, are capable of converting to amyloid fibers at high concentration under destabilizing conditions.9 The richness of the sequence space of possible amyloids necessitates a high-throughput (HTP), automated fiber construction, and simulation framework. In terms of structural polymorphisms, there can be a huge difference in properties between aggregate polymorphic structures for the same protein sequence.10 The sensitivity of mechanical properties to the underlying polymorphic structure highlights the importance of systematically sampling and characterizing different structural polymorphs. On the issue of fracture mechanism, insights from SMD modeling have revealed evidence of length-scale-dependent fracture mechanisms (stick-slip) that arise in both amyloids11 and silk crystals.12 As suggested by these studies, length-scale dependent mechanisms may be attributed to the fact that the fracture mechanism of a β-sheet-rich fibril is governed by the cooperative rupture of hydrogen bonds that stabilize the fibril structure. To investigate further cooperative rupture mechanisms arising in these materials, accurate construction of variable length amyloid structures is needed. Finally, we need a database that captures amyloid structural variations. Several amyloid databases have been curated, including AmyPro, AmyloGraph, and StAmP-DB.13–16 Although these databases are extremely helpful for the community interested in the pathological aspect of amyloids, they do not capture sufficient structural differences that are relevant for the design of amyloids for their mechanical applications.
One methodology to navigate the multi-factor design landscape of amyloids is high-throughput computational physics-guided protein design.17 High-throughput computational design of amyloid materials requires a framework for the autonomous building of fibril structures, the initialization of simulation systems, and the analysis of simulation results. Unfortunately, prior software focused on amyloid properties covers a broad range of applications not specifically focused on this challenge; these software include experimental image analysis of fibril microscopy,18de novo aggregate structure prediction,19,20 β-serpentine fibril structure prediction,21 and numerous examples of amylogenic region prediction.22–25
In this work, we seek to address the issues mentioned above by creating a software, FiberForge, that automates the construction of amyloid protofibrils, streamlines the deployment of high-throughput simulations of amyloid deformation, and automatically analyzes the trajectory of amyloid fibril deformation to estimate mechanical properties and fracture mechanisms. FibrilForge enables a deeper understanding of the structure–function relationships that dictate the mechanical properties of amyloid-based materials, providing critical insights for fundamental materials science. Moreover, it facilitates the in silico design and screening of amyloid mutants with tailored mechanical behaviors, accelerating the development of bioinspired materials for applications in nanotechnology, biomedical devices, and responsive materials.
After designing our foundational data-structure for the description of amyloids, we built accompanying modules and functions to utilize this structure to perform mechanical property modeling tasks. Specifically, we designed our software suite to have 3 modules: Build, Simulate, and Analyze (See Fig. 1 for depiction of FiberForge). The Build module consists of functions that enable the geometrical characterization of structures containing helical symmetry, the construction of assemblies based on their helical geometrical parameters and sequence mutation information, and finally the construction of a fully solvated amyloid fiber for simulation (see sample code in SI, Fig. S3). In our software, we are able to construct amyloids of different lengths by first learning the symmetry functions from multi-chain fibrils and then applying these symmetries to obtain the desired length fibril. The Simulate module consists of a tensile testing function and a bending testing function. These mechanics testing functions create an SMD protocol to conduct the application of an external force which depends on the testing conditions specified (see sample code in SI, Fig. S4). The Analyze module reads the output trajectories produced from the Simulate module and calculates mechanical properties of interest, such as elastic/bending modulus (see sample code in SI, Fig. S5). Together, these modules enable the construction of end-to-end workflows for fracture mechanics simulations of diverse amyloid structures.
Next, we quantified how simulation settings affect the mechanical constants extracted from SMD pulling (Fig. 4). Raising the spring constant from 5 × 102 → 1 × 104 nm ps−2 and the pulling rate from 0.01 → 1 nm ps−1 drives a clear, monotonic stiffening: the ultimate tensile strength climbs from around 5.3 × 109 Pa to 1.1 × 1010 Pa and the elastic modulus from around 3.0 × 109 Pa to 9.5 × 109 Pa, matching the behavior suggested by prior theory of SMD.27 In contrast, box size and number of strands can also perturb the calculated elastic modulus, but no significant change was observed on the scale of the estimated elastic modulus.
Lastly, we compared our estimated mechanical properties to those obtained from experimental methods (see Fig. 5 for results across the entire curated dataset). Our simulated values fall within the expected range for amyloid fibrils (roughly 109 to 1011 Pascals for elastic modulus and 108 to 109 Pascals for tensile strength).5 Furthermore, we compared our estimated properties with a single experimental study (see Table 1). Although our results showed reasonable agreement with experimental values, for example, for the insulin amyloid with PDB ID 8SBD, we obtained 4.7 GPa for the simulated elastic modulus compared to the reported experimental value of 3.2 GPa, our simulated estimates were consistently higher. This discrepancy is expected: theoretical predictions and atomistic simulations typically produce substantially larger elastic moduli than experiments because the deformation rates used in SMD are several orders of magnitude higher than those employed in AFM and related experimental techniques.28 Additionally, experimental measurements exhibit substantial variability across studies due to differences in fibril preparation, mechanical testing protocols, deformation rates, and solution conditions. For these reasons, and to mitigate the variability inherent in experimental datasets, comparison with experiment was limited to a single group. To test the generality of this approach, future work will focus on expanding comparisons across multiple experimental systems, despite the challenge posed by the relatively small number (on the order of 10) of available single-fibril mechanical measurements and their substantial inter-study heterogeneity.
| Common name | PDB entry | E exp ± SE | E sim ± SE |
|---|---|---|---|
| Insulin | 8SBD | 3.2 ± 0.6 | 4.7 ± 0.2 |
| τ-protein | 7YPG | 3.4 ± 0.7 | 11.3 ± 3 |
| β-lactoglobulin | 6GK3 | 3.7 ± 0.8 | 8.6 ± 1 |
| Aβ(1–42) | 2MXU | 3.2 ± 0.8 | 4.8 ± 0.9 |
In addition to the benchmarks, we should note the intrinsic stochasticity underlying the rupture force measurement from single-molecule AFM experiments. Early physical theories were developed to describe rupture force distributions in single hydrogen bonded systems29 (Bell–Evans model), these theories were later extended to protein unfolding, i.e. multi-hydrogen bonded system, then eventually the connection between AFM and SMD techniques.27,28 These theories were applied to describe amyloid fibril rupture.11 Essentially, amyloids are held together by a network of hydrogen bonds occurring between proteins which result in induced force being distributed across a network of bonds. The rupture of this network is a non-equilibrium, stochastic process; as such, the outcome of SMD mechanical property estimates inherently depends on many random factors, making statistical analysis essential.
000 atoms, equilibrate it, and ran a 10 ns constant-velocity pull at 0.1 nm ps−1, with post-processing handled automatically; the full set finished in 36 h on a single GPU node. The resulting Ashby plot (Fig. 5) contains 190 simulated data points: elastic moduli range from 7.0 × 107 to 1.1 × 1010 Pa (median 3.2 × 109 Pa) and ultimate tensile strengths from 9.0 × 107 to 1.0 × 1010 Pa (median 7.8 × 108 Pa). These values align with experimentally reported amyloid properties (the Young's modulus: 108–1010 Pa; tensile strength: 108–109 Pa) and sit well above those of typical structural proteins such as collagen or keratin. Wider multiprotofilament fibrils occupy the upper-right of the plot, while slimmer two-strand assemblies fall lower, reflecting the expected dependence on cross-section.5,30,31 The match to experiment and the ability to complete dozens of simulations overnight support both the reliability and scalability of our software for large-scale mapping of sequence–structure–property relationships in amyloids.
Notably, HBD showed a strong positive correlation with both elastic modulus and UTS, reinforcing earlier theories that interchain HBD is the primary factor affecting deformation resistance. Though earlier studies demonstrated this behavior in specific amyloid systems,5,11,12 our results imply that it is a general design principle of planar amyloid systems. Helical twist and rise generally showed weak correlations with mechanical properties, suggesting that local geometric offsets alone do not strongly dictate mechanical response. These two properties showed moderately strong negative correlation with each other, suggesting that a greater degree of twist in an amyloid reduces the spacing between between chains. Sequence mean hydrophobicity displayed modest (0.26) but statistically significant correlations with elastic modulus, indicating that hydrophobic packing may stabilize inter-chain interfaces and thereby alter load-bearing capacity.
Having established key correlations among the structural and sequence descriptors, we next asked whether these features could predict mechanical behavior. To evaluate their predictive power, we performed an automated screening of machine-learning algorithms using AutoML,32 testing dozens of regression models across multiple random train–test splits. The random forest regressor consistently achieves good performance and demonstrates strong robustness to dataset resampling. Based on these results, we selected the random forest as the base model for all subsequent analyses, and assessed how different feature groups contribute to predictive accuracy for elastic modulus and UTS (see SI for the parity plots of cross-validation and hold-out test, Fig. S7).
For elastic modulus, feature sets composed solely of helical parameters (magnitude of translation and rotation), sequence–distance metrics (HBD, cross-sectional area, and H-bond count), hydrophobicity metrics (fraction of hydrophobic residues, mean residue hydrophobicity, standard deviation of residue hydrophobicity), or evolutionary-sequence metrics yield near zero or even negative R2, indicating that no single class of descriptors is sufficient to predict elasticity variations. Notably, inclusion of evolutionary sequence embeddings33 alone decreases predictive accuracy relative to the geometric features, suggesting that sequence evolutionary information does not meaningfully contribute to the elastic behavior of planar amyloids. The strongest predictive performance is achieved when geometric descriptors were combined with hydrophobicity features (R2 = 0.377 on the holdout set), highlighting the interplay between amyloid structure and amino acid properties on elasticity. For UTS, hydrophobicity or helical descriptors alone produced weak predictive performance, and ESM embeddings again performs poorly in isolation. The highest R2 values are achieved using the geometric, hydrophobicity, and ESM feature set (R2 = 0.516 on the holdout set). These results emphasize that amyloid UTS is influenced by the geometry of interchain organization, particularly features such as HBD, and evolutionary sequence information. However, the gap in R2 values between cross-validation and the hold-out test suggests that the model may be overfitting due to the limited size of the training dataset. We anticipate that this issue will diminish as additional molecular simulation data become available.
Overall, both properties are driven predominantly by geometric organization—most notably HBD, which emerges as the single most informative descriptor for elastic modulus and a significant predictor for UTS. Elastic modulus depends largely on HBD and hydrophobicity with only minor gains from any additional features and little to no improvement from evolutionary embeddings. By contrast, UTS also relied heavily on HBD but benefited substantially from the inclusion of evolutionary sequence information, indicating that tensile strength is more sensitive than elasticity to sequence-encoded nuances once the primary geometric scaffold (HBD) is accounted for. This establishes a path toward rapid, simulation-free prediction and optimization of amyloid-based materials, helping to define general design principles and highlighting the promise of integrating interpretable structural features with machine learning for next-generation biomolecular material design.
Benchmarking shows that (i) reconstruction fidelity remains high across diverse polymorphs, (ii) rupture-force calculations converge within ±0.010 nN after 30–40 replicas, and (iii) the expected rate- and spring-constant-dependent stiffening follows classical non-equilibrium theory. Using FiberForge, we screened 72 experimentally resolved fibrils (18 sequences), completing 190 tensile tests in 36 h on a single GPU node. The simulated elastic moduli and tensile strengths overlap closely with single-fibril AFM measurements.
Our structure–property analyses reveal that amyloid mechanics arise from the coupled effects of molecular assembly geometry and residue-level interactions. Across fibrils, HBD emerges as the strongest determinant of elasticity and a strongly correlated with UTS. Evolutionary sequence information does not appear to aid in the prediction of elasticity but does appear to aid in the prediction of UTS. This establishes a route toward rapid, simulation-free property prediction and materials optimization.
While FiberForge provides a robust computational platform for mapping sequence–structure–property relationships in amyloids, several limitations remain. The current framework is primarily optimized for parallel, helical-symmetry protofibrils and does not yet capture anti-parallel, mixed-symmetry, or irregular architectures. Moreover, it is best suited for protofibril-scale fracture characterization, whereas the multi-stranded fibril bundles observed in mature fibers remain beyond its present modeling scope. From a simulation standpoint, the accessible timescales of classical molecular dynamics inherently limit the treatment of rate-dependent mechanical behavior.
Future developments will address these limitations by (i) extending FiberForge.Build to support anti-parallel, mixed-symmetry, and bundled fibril assemblies, (ii) coupling coarse-grained or multi-scale fracture models to capture longer timescales and larger systems, and (iii) integrating automated mutation, ranking, and machine-learning–based property prediction modules to enable high-throughput screening and accelerated design.
Together with our structure–property analysis—showing that mechanical behavior emerges from the interplay of supramolecular geometry and residue-level chemistry, and that features such as hydrogen-bond density, while important in specific amyloid systems, are not universally predictive across all fibrils, FiberForge provides a unified computational foundation for disentangling the multiscale determinants of amyloid mechanics.
FiberForge provides a computational platform for mapping sequence–structure–property relationships in amyloids and accelerates the in silico discovery of bio-inspired, high-performance protein materials. Combined with emerging single-fibril experimental techniques, FiberForge promises to bridge atomic-scale insight and macroscopic material design, paving the way for next-generation biomedical and sustainable engineering applications built upon amyloid architectures.
In SMD, an external force Fext(t) is applied to the system according to the equation:
| Fext(t) = −k(x(t) − x0(t)) |
To assess sequence contributions to mechanics, we calculated Kyte–Doolittle hydrophobicity38 values for each residue in the extracted sequence. Mean, standard deviation, and the fraction of residues with positive hydrophobicity score were used as sequence-level hydrophobicity features. To examine whether protein sequence information, beyond simple hydrophobicity metrics, improves prediction of fibril mechanical performance, we extracted primary sequences for each simulated amyloid structure and embedded them using a state-of-the-art protein language model. First, amino-acid sequences were parsed from the first chain of each PDB file using BioPython. For each valid fibril, we also retrieved precomputed structural descriptors including inter-chain hydrogen-bond count, cross-sectional area, helical rotation and translation offsets, elastic modulus, and ultimate tensile strength (UTS). To capture higher-order, nonlocal sequence information, we computed deep sequence embeddings using the ESM-2 protein transformer model (esm2_t33_650M_UR50D).33 Each sequence was batch-converted into token format, passed through the pretrained model, and the 33rd-layer per-residue representations were averaged to obtain a fixed-length embedding vector for each fibril sequence.
Finally, sequence embeddings were paired with inter-chain hydrogen-bond density, cross-sectional area, Kyte–Doolittle hydrophobicity statistics, as well as helical parameters, to construct an extended feature vector representing sequence, structure, and inter-chain interaction characteristics.
Before performing cross-validation analyses, we first identified an appropriate regression model using an automated model-selection procedure. We applied an AutoML workflow32 that screened a broad set of machine-learning algorithms across randomized train–test splits. We performed this operation several times on different train-test partitions to obtain model which we believed was robust to the train-test partitioning. This initial step enabled systematic comparison of dozens of candidate models under repeated resampling, providing a robust estimate of algorithm stability rather than performance on any single data partition. Across all tested regressors, the random forest consistently achieved strong and reproducible performance, demonstrating resilience to dataset reshuffling and minimal sensitivity to outlier splits. Based on these findings, the random forest model was selected as the base estimator for all subsequent analyses.
We trained random forest models under two complementary evaluation schemes. First, we performed 10-fold cross-validation on the full dataset to quantify model robustness and assess the relative predictive value of specific feature subsets. Feature groups included: (i) geometric offsets only, (ii) hydrogen-bonding and cross-sectional metrics, (iii) hydrophobicity features, and (iv) full sequence embeddings. Both features and targets were standardized within each training fold, and performance was assessed via the coefficient of determination R2. Second, to evaluate generalization, we performed a strict 80/20 train–holdout split, retrained models exclusively on the training portion (including normalization), and computed hold-out R2 on unseen fibrils. For both evaluation schemes, parity plots were generated for each target–feature-set combination to visualize predictive accuracy and systematic bias (See SI, Fig. S7).
![]() | (1) |
The “classical” pathogenic amyloid structure is characterized by helical symmetry (See eqn (1) for formal description). The rotational symmetry about the growth axis can be represented by a single scalar
, the translational symmetry by a translation scalar
, and the axis about which these parameters are applied (the growth axis)
.
The tuple (θ, t, a) characterizes the symmetry of the protein sheets that make up the protein fibril. It should be noted that functional amyloids, amyloids characterized by solenoid fibrils such as CsgA, can also be described using helical symmetry. In this case θ is simply 1 i.e., only translational symmetry.
We created the function identify_protofibrils to identify groups of protein chains forming protofibrils based on spatial proximity within a given PDB file. The PDB structure was first parsed using PDBParser from Biopython,35 and for each chain, the atomic coordinates were extracted. The center of mass for each chain was then computed by averaging the Cartesian coordinates of all its atoms.
To group chains into protofibrils, a clustering approach based on a predefined distance threshold was applied. Each chain's center of mass was compared against those in existing protofibril clusters. If the distance between a chain's center and any chain already assigned to a protofibril was below the threshold, the chain was incorporated into that protofibril. If no existing protofibril met the distance criterion, a new protofibril cluster was initiated.
This procedure resulted in a list of protofibrils, where each protofibril was represented as a dictionary containing chain identifiers and their corresponding centers of mass. This method facilitated the automated identification of spatially grouped chains for further structural analysis.
For a formal description, let the protein structure contain C chains, each with a center of mass
,
as the set of protofibrils where each Pj is a subset of chains, dij = ‖ri − rj‖ as the Euclidean distance between chain centers ri and rj, and δ as the distance threshold for protofibril membership.
The goal is to find a partitioning of chains into protofibrils that minimizes the maximum intra-protofibril distance:
| rn = R(θ,a)rn−1 + ta | (2) |
The protofibril structure can be constructed using either the mBuild44 or Pymol45 software packages to enable direct conversion for molecular dynamics simulations.
A key challenge in SMD of amyloids is the large system size required, given the nanometer scale of fibrils. To automate system construction efficiently, we leveraged the amyloid's helical parameters to optimize its orientation in the simulation box. Specifically, we aligned the fibril along its identified growth axis—a 3D vector—placing it in a rectangular box with its longest dimension parallel to the growth direction. Once the fibril is correctly oriented we can place solvent molecules using PackMol.46
The FiberForge package offers two modes of deformation: tensile and shear. Tensile deformation refers to a pulling force applied parallel to the growth axis, while shear deformation involves a pulling force applied perpendicular to the growth axis (see Fig. 1 for a depiction of these mechanical tests).
To initialize these simulations without human input, we utilize pre-defined data structures. Specifically, we used the translational component t of the helical symmetry to orient the fibrils in the simulation box. For tensile tests, the fibrils are aligned parallel to the applied force, and for shear tests, they are aligned perpendicular to the applied force.
Springs are attached to the center of mass of the end chains of the amyloid for a tensile simulation. For a shear simulation restraints are placed on one end of the fibril and a spring is attached to the other end.
SMD simulations are performed using GROMACS.47 Systems are solvated using Packmol. After the fibril is placed in the correct orientation in the simulation box, energy minimization is performed. Following minimization, the system undergoes equilibration under an NVT ensemble for 50 ps, followed by further equilibration under an NPT ensemble for another 50 ps. Finally, SMD pulling is performed for 500 ps.
Parameters affecting SMD simulations, such as the pull rate and the force constant, are determined manually. The optimal pull rate is estimated to be 0.01 nm/ps and the optimal force constant is 1000 kJ (mol−1 nm−2). Temperature coupling is performed using the Nose–Hoover extended ensemble under standard conditions, and pressure coupling is performed using the Parrinello–Rahman extended ensemble.
The elastic modulus is a measure of a material's resistance to elastic deformation and is generally defined as
Similarly, the UTS is defined as
| UTS = max(σ) |
that is, the maximum stress a material can withstand before failure during a tensile test.
These properties are estimated by analyzing the geometry of the protofibril and the SMD trajectory. The Fext and the corresponding deformation distance produced from the SMD simulations are extracted from the output files. To reduce the noise we apply a moving average filter to the stress–strain curve. To further process the simulation data, automated routines were implemented to smooth and fit the stress–strain relationships using cubic spline interpolation. Next, we estimate the linear elastic region of the stress–strain curve by calculating the second derivative of the stress with respect to strain and then extracting the inflections points. The elastic modulus E is then obtained as the slope of the fitted curve within the linear elastic region, which is identified between the initial point and the first inflection point of the second derivative of the spline. This inflection point also defines the yield point, corresponding to the transition between elastic and plastic deformation. To estimate the tensile and shear strength, the maximum Fext obtained during the SMD simulation is used.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d5dd00307e.
| This journal is © The Royal Society of Chemistry 2026 |