Open Access Article
Mason Hooten
a and
Meenakshi Dutt
*b
aDepartment of Biomedical Engineering, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA
bDepartment of Chemical and Biochemical Engineering, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA. E-mail: meenakshi.dutt@rutgers.edu
First published on 9th March 2026
Self-assembled nanostructures comprising Phenylalanine-based oligopeptides are widely utilized in various applications across the domains of electronics, personal care, and biomedicine. Numerous factors, including sequence, determine the morphology of these nanostructures. Computational approaches such as bottom-up coarse-grained (CG) models coupled with the molecular dynamics simulation technique can be used to predict morphologies as a function of the sequence. These approaches efficiently resolve the multiple scales relevant to the self-assembly process while generating CG representations that are consistent with the all-atom representations. However, bottom-up CG models are computationally demanding to develop. This study proposes an approach to develop bottom-up CG models of Phenylalanine oligopeptides that are transferable between sequences to resolve their assembly. The approach builds upon a previously reported bottom-up CG model for the Phenylalanine tripeptide to develop parameters for new degrees of freedom associated with other oligopeptides. The performance of the new CG model is analyzed by comparing simulation results with corresponding results obtained using different versions of the Martini force field. The predictions from the new force field are also compared with available experimental results.
Design, System, ApplicationSelf-assembled nanostructures composed of phenylalanine-based oligopeptides are utilized in various domains, including electronics, biomedicine, pharmaceuticals, and personal care. The extensive use of these nanostructures stems from their multifunctionality and morphological polymorphism, which is influenced by numerous factors, including the oligopeptide sequence. However, the ability to predict the morphology of these self-assembled nanostructures as a function of the sequence remains limited. The development of this predictive capability will significantly reduce experimental effort and costs, thereby advancing numerous domains. This capability can be developed through computational approaches that efficiently bridge scales pertinent to the assembly, packing, and organization of oligopeptides into nanostructures, while simultaneously preserving the chemical details of their molecular conformations. In this study, we propose a bottom-up coarse-graining approach to develop a transferable, multiresolution, coarse-grained model for various phenylalanine oligopeptides to resolve their self-assembly. The hybrid approach employs a structural coarse-graining method for bonded interactions and a variational approach for nonbonded interactions. The approach is employed to enable bridging the scales relevant to the process of self-assembly while simultaneously resolving consistent all-atom and coarse-grained representations of the molecules. The approach proposed and used in this study can be applied to investigate self-assembly in pure and mixed molecular systems across diverse classes of biomolecules and polymers. |
There are two overarching philosophies underlying the coarse-graining of molecular representations: top-down and bottom-up.43–54 The top-down coarse-graining approaches will use macroscopic, thermodynamic, and experimental results to parameterize the CG model. Meanwhile, bottom-up CG approaches use atomistic structure and properties to parameterize the CG model. In this study, a bottom-up coarse-graining approach is used to develop a transferable CG force field for di- and tetraphenylalanine (FF and FFFF, respectively) to resolve their self-assembly in an aqueous solution. A hybrid approach, utilizing a structural coarse-graining method for the bonded interactions and a variational approach for the nonbonded interactions, is employed to enable scale bridging while simultaneously resolving consistent all-atom (AA) and CG representations30 of the molecules.
There has been considerable interest in the development of CG models with reduced degrees of freedom that can accurately simulate the self-assembly process across a range of system compositions.55 Coarse-grained models have resolved the process in systems that span a range of chemical variables, including the composition of thermodynamic states,56 local milieu,57 chain composition,58 and peptide sequence lengths.11 The tunable chemical details captured by bottom-up CG models make them attractive for simulating this process in various systems. However, the computational cost of developing such models increases exponentially with the complexity of their representation, force field parameters, and composition of the system.
More recently, Mohammadi et al. reported the development of a bottom-up nonbonded force field representing interactions between the 20 proteinogenic amino acids and a one-site water model.27 Their model optimizes Lennard–Jones parameters using artificial neural network-assisted particle swarm optimization, targeting the atomistic free energy of solvation for each amino acid. The resulting nonbonded force field is used in concert with earlier-derived bonded parameters58 to simulate self-assembly of peptide amphiphiles. Simulations of two different amphiphile systems revealed distinct nanostructure morphologies — fibers versus micelles — that matched their respective experimental observations. This ability of a single parameter set to distinguish between systems represents an improvement over an earlier version of their model. It highlights the value of optimal solvation potentials in studies of solvent-mediated self-assembly.
Extensions to the relative entropy minimization (REM)59 technique have also demonstrated the capacity to reproduce both monomer structure and supramolecular arrangement over a range of system compositions. In the regularized REM (reg-REM) method, Liebl and Voth added a regularization term to capture the binding affinity between protein monomers.60 One of their reg-REM models, designed to capture the binding affinity of CA/SP1 monomers in an 18-monomer lattice, also faithfully simulates the structure of a large virus-like particle comprising 3180 monomers, demonstrating its transferability to a larger system size. In another study, Pretti and Shell developed a procedure that includes a term for tuning the model to recover specific secondary structures.61 Training the model on an ensemble of simulations of short peptides, they found that the model recovered secondary structure characteristics of amyloidogenic sequences not included in the training set.
An earlier study34 developed a bottom-up CG model of FFF with potential energy functions derived from AA reference data. The model was used to predict the formation of self-assembled nanospheres, which agreed with corresponding experimental results. Whereas such models can potentially predict self-assembly pathways, they suffer from high computational costs arising from the generation of the AA reference sample and the routines used for extracting CG potential energy functions (namely, force-matching (FM), Boltzmann inversion (BI), and iterative Boltzmann inversion (IBI)). The rationale underlying the selection of FM to determine CG nonbonded interactions is due to its ability to capture the potential energy surface associated with multibody interactions. The BI approach was chosen to compute the CG bonded interactions because it is well-suited for determining localized interactions and creating potentials that reproduce structural properties.62,63 The adoption of the IBI approach enables the iterative refinement of the CG bonded potentials initially determined using the BI approach. The refined bonded potentials allow better reproduction of the structural properties. Furthermore, bottom-up models are typically designed to reproduce precise structural representations, such that a fully parameterized model cannot be reused for different chemical species.
To mitigate the high costs associated with developing bottom-up CG models, it is desirable that the fully parameterized models can be substantially reused in developing structurally similar models. The ability to reuse model parameters from one structure to create a model of a new structure is typically referred to as transferability. A transferred model, thus, incorporates some of the structure and parameterization details of an earlier model but applies them to a different species. This study proposes a new approach for developing transferable, multiresolution, bottom-up CG models of biomolecules to resolve the dynamics underlying their self-assembly. The approach (1) preserves the nonbonded CG potentials,34 capturing multibody interactions, while (2) refining the CG bonded potentials to accurately resolve the local structure of the molecules. The approach is sufficiently modular that the specific coarse-graining methods used to develop the nonbonded and bonded potentials can be substituted with analogous methods.
In this study, CG models of two peptides – diphenylalanine (FF) and tetraphenylalanine (FFFF) – are developed, validated, and compared with corresponding results from the Martini CG force field. Simulations using the developed models demonstrate self-assembled nanostructures that are distinct from those reported using the FFF model34 and from one another. The results suggest that a judicious process of transfer modeling can reproduce atomistic structural targets for a family of short aromatic peptides. Some portions of the FFF model34 are transferred to the new models, demonstrating a significant reduction in the cost of developing bottom-up CG models. The approach proposed and used in this study can potentially be applied to investigate the process of self-assembly in pure and mixed molecular systems, encompassing biomolecules and polymers of different chemistries. The CG approach can also be adopted to conceive designs and iteratively optimize molecular sequences with target functionalities. For example, folding, binding or assembly. This can be implemented by integrating the approach into a high-throughput workflow that identifies molecular sequences, starting from a candidate pool of sequences or blocks, with specific functionalities.
![]() | ||
| Fig. 1 Representations of (a) FF, (b) FFF, and (c) FFFF. CG representations are shown as transparent spheres overlayed on the AA representation in licorice. The FFF model, AR3, is inherited from an earlier study34 and transferred to the representations of FF and FFFF to create the F2 and F4 models, respectively. Terminal beads NH3 (blue) and COO (red) appear once in each model. Backbone beads CA (yellow) appear once for each side chain, with interspersed AMD (orange) beads representing amide groups. The PHE representation comprises three beads per side chain, PHA, PHB, and PHC (purple). | ||
The CG force field includes parameters for bonded interactions required to maintain the peptide's conformation, as well as nonbonded interactions between the peptide and solvent beads. Potential energy functions are applied to the bonds, angles, and dihedrals in the bonded structures and the interpeptide, peptide–water, and water–water nonbonded interactions. The potential energy functions are nonparametric and derived from the atomistic reference simulations using the methods described in subsequent sections. Although the atomistic reference force field separates the potential energy's electrostatic and van der Waals components, the CG force field does not explicitly represent electrostatic charge. Thus, pairwise nonbonded interactions are incorporated in a single tabulated energy function that reflects the cumulative effects of electrostatics and van der Waals interactions observed in the AA reference simulations.
The CG models for FF and FFFF utilize the same set of 7 pseudo atom types, as defined initially in the AR3 model,34 as shown in Fig. 1. The F2 model contains 11 pseudo atoms, compared to 43 atoms in the atomistic reference, resulting in a nearly 4-to-1 mapping ratio. The F2 pseudo atoms are connected with a network of 12 bonds, implying 17 angles and 16 proper dihedrals. All bonds, angles, and dihedrals are parameterized with potential energy functions. The F4 model also achieves a nearly 4-to-1 mapping ratio with 21 pseudo atoms compared with 83 atoms in the atomistic reference. The model encompasses parameters for all 24 bonds, 35 angles, and 36 dihedrals.
The AA MD simulations use the Amber99 force field and SPC/E water model. Steepest descent energy minimization and an equilibration MD simulation run in the NVT ensemble are performed before the production MD simulations. The production simulations sample the NPT ensemble and use a leapfrog integrator with a 2 fs time step. The temperature is maintained at 300 K using the Nose–Hoover thermostat, and the pressure is maintained at 1 bar using the Parrinello–Rahman barostat. The Lennard–Jones cutoff is set to 1.2 nm, with long-range dispersion correction applied for energy and pressure. The particle mesh Ewald (PME) method with a cutoff of 1.2 nm is used to calculate the electrostatic interactions. All bonds are constrained with the LINCS algorithm. All simulations in this study — AA and CG — are performed using Gromacs 2018.
The second model in that earlier study represented the aromatic ring of the side chain with three pseudo atoms and was thus designated AR3 (for aromatic ring 3 beads). This study investigates the transferability of the AR3 model for FFF to phenylalanine peptides with altered chain lengths. The F2 and F4 models inherit potentials from AR3, including the entire nonbonded force field and most of the bond, angle, and dihedral potentials. The nonbonded force field, including peptide–peptide, peptide–water, and water–water interactions, remains unchanged from the AR3 model. Only the bonded parameters are updated or iterated, as discussed below.
![]() | (1) |
![]() | (2) |
UCG(ϕ, T) = −kBT ln(PCG(ϕ, T))
| (3) |
Once initial guesses are available for all potential energy functions for the bonded DOFs, the angles and dihedrals are corrected using the IBI scheme. In IBI, trial MD simulations are performed using an initial set of potentials, and DOF distributions are subsequently calculated. The potentials are refined proportionately to the difference between the trial and reference distributions, using the iterative procedure defined in eqn (4).
![]() | (4) |
The Earth Mover's Distance (EMD), also known as the Wasserstein metric, measures the distance between two distributions. The EMD characterizes how far the masses of two distributions would need to move to resemble one another. A comprehensive analytical description of the EMD can be found in ref. 71 and 72. This study calculates the EMD between each test distribution and its respective atomistic reference after each IBI step. The EMD values for the complete set of IBI DOFs — angles or dihedrals — are summed to yield an error index in the conformational ensemble, with a lower value indicating a lower error. The IBI process is stopped the first time this value increases, and the iterated potentials corresponding to the previous step, i.e., the IBI step, are selected, which results in the first minimum of the conformational error.
The F2 and F4 models are parameterized using tabulated potentials transferred from the AR3 force field, modified as discussed in the results and discussion section, with a potential energy cutoff of 1.2 nm. Initial configurations in each CG simulation are produced by mapping a frame from an equilibrated AA reference trajectory to its CG representation. The CG MD simulations are run in a cubic box of constant volume, sized to match the equilibrated box volume in a corresponding AA reference simulation. The CG MD calculations use a leapfrog stochastic dynamics integrator with a 1 fs time step. Temperature is maintained at 300 K using an inverse friction coefficient of 0.2 ps.
The development of Martini as a general tool for biomolecular simulation has, of course, come with challenges and tradeoffs. One such tradeoff is that the current version of Martini, the most flexible to date, has been refined to reduce spurious aggregation (the famous “stickiness” problem). Still, at the same time, it has become less capable of accurately simulating the self-assembly of short peptides. Earlier studies using Martini 2 reported a variety of nanostructures in homogeneous systems of FF in aqueous solution, including tubes, vesicles, and bilayers.5 A recent systematic investigation into the evolution of Martini as a framework for the self-assembly of short peptides also found that Martini 2 simulations using FF predicted the formation of spherical or elongated vesicles. In contrast, FF simulations with Martini 3 did not yield identifiable nanostructures.73
For this study, simulations of FF and FFFF were conducted using Martini 2 and Martini 3 to compare the results with those from the bottom-up models. Initially, simulations were performed using Martini 3 to examine the formation of peptide nanostructures in the most recent iteration of the force field. The results demonstrated that the behavior of the Martini 3 models was not consistent with earlier versions of Martini, such as Martini 2.2.
Systems of 32 peptides FF or FFFF were prepared in a cubic simulation box with the exact initial dimensions as the AA simulation. The Martini 2.2 simulations are solvated with a 90/10 split of water and antifreeze particles. The Martini 3.0 simulations use the initial release water model. Simulations use the Martini 2.2 and 3.0.0 protein parameter sets, respectively.
The cubic simulation boxes, with three-dimensional periodic boundary conditions, are sized to match the peptide concentration of the high-concentration bottom-up CG MD simulations. Peptides are mapped to the Martini CG representation using the martinize2 program. The peptide molecules are randomly dispersed in the simulation box along with the water beads for the initial configuration. A short equilibration simulation is performed before production MD simulations.
The Martini 2 and Martini 3 production simulations are parameterized following the guidance of reference.74 The simulations run for 40 to 80 million iterations using a 25 fs time step (corresponding to a nominal simulation time of 1 to 2 microseconds) and a leap-frog integrator. The temperature is set to 300 K using the velocity rescale thermostat with a stochastic term,75,76 and the pressure is set to 1 bar using the Berendsen barostat.77 The Lennard-Jones potential uses a switch tapering to a cutoff of 1.2 nm. Reaction-field electrostatic interactions with a potential shift are calculated with a dielectric permittivity of 15 and a cutoff of 1.2 nm.
Initial potentials are calculated via BI using a distribution function corresponding to each new DOF. An AA MD simulation of a single peptide in water is run for 500 ns with a time step of 2 fs and a sampling rate of 2 ps. The resulting sample of 250
000 frames calculates the CG-mapped distributions of each new bonded interaction for each CG model. These distributions are inverted via BI, completing the intrapeptide potentials needed to maintain the conformation of a single peptide in CG MD simulation.
The CG test simulations for IBI are run for 10 million steps with a 2 fs time step and sampled at 2 ps intervals, resulting in a sample of 10
000 frames per iteration. At each iteration, the distributions of the DOFs corresponding to the complete set of target potentials -- angles or dihedrals -- are evaluated and compared with distributions from reference AA trajectories. While the first few steps of IBI generally improve the similarity of the CG and AA distributions, there is a point at which subsequent IBI steps increase the error. This failure to converge is attributed to computational limitations and the relatively large number of concurrently parameterized DOFs in the model. To select a set of iteratively refined potentials, after each IBI step, the EMD is calculated between the CG test and AA reference distributions for each conformational DOF (angle or dihedral), and the EMD values are summed over the set of DOF potentials being resolved (all angles or all dihedrals).
The set of potentials associated with the first minimum of the summed EMD is selected for each IBI run. In the F2 model, total angle EMD reaches its minimum after initial BI plus four rounds of iteration. Applying these angle potentials, total dihedral EMD reaches its first minimum after BI plus one iteration. In the F4 model, angle potentials are chosen after BI plus five iterations, and dihedrals are selected after BI plus one iteration.
Angles in F4 also typically match the sampled ranges of the atomistic reference, with some deviations. As with F2, the F4 angles describing the extension of the side chains away from the peptide backbone are somewhat biased but take on values generally within the range sampled by the AA reference. However, the backbone angle bridging the first and second amide groups across the second backbone alpha carbon (AMD1-CA2-AMD2) samples configurations from 1.5 to 2.0 rad, suggesting a slightly collapsed conformation in the CG simulations compared to the AA reference (see SI Fig. S7). Although IBI refines the potential in this angle, the initial estimate is inherited from the AR3 model in an earlier study.34 The reference AA distribution was used to develop the original samples in the 1.5 to 2.0 rad range; thus, the AR3 potential function achieves a local minimum. Despite several iterations of IBI, the propagation of these configurations to the F4 model highlights the importance of the initial parameterizations on the eventual shape of refined potentials.
As end-to-end distance and Rg will be influenced by nonbonded interactions, comparing these quantities between highly concentrated AA and CG systems, where nonbonded interactions are fully present, is examined. The end-to-end distance in the AA and CG representations of the 25 peptide system samples falls within the same range and shares some peak locations. Although the atomistic curve exhibits four distinct modes in the short-range sampling, the CG curve displays only two modes. This is expected, as the nonbonded interactions are resolved at the same high concentration as the 25 peptide systems. The bonded DOFs are refined using these nonbonded interactions; thus, the single peptide conformation may be interpreted as biased in their absence.
The rotameric character of individual peptides and those in the aggregate for the four models of FFFF simulated – AA, F4, Martini2, and Martini3 – was calculated. The simulations of monomers using the bottom-up F4 force field show rotamer distributions vary along the chain, indicating the incorporation of specific conformational details gleaned from the AA reference simulations. In comparison, the Martini simulations typically show a random relationship, indicating nonspecific behavior of the side chains. The F4 simulations of aggregates also show a comparative spreading of the rotamer distributions. This flexibility is reminiscent of that seen in the AA reference and is key to the formation of the branched aggregates shown in Fig. 4. In comparison, the distributions from the Martini aggregates become sharper, reflecting their apparent tendency to snap sharply into place in the bilayer-like aggregates shown in Fig. 5.
To examine the tendency of the side chains, their relative orientation (namely, isomerization or rotamerization) in a single peptide is measured. The coordination between adjacent side chains in the tetramers is calculated as the dihedral angle measured from the proximal bead in each side chain across their respective backbone beads. A value of 0 in this calculation corresponds to side chains projecting in the same direction as the backbone, i.e., a cisoid conformation. In contrast, a value of pi describes side chains which are diametrically opposite, i.e., transoid.
The parity of the CG representations in the F4 and Martini models naturally allows for direct comparison, where the differences between the two models (namely, Martini backbones do not specify explicit amide groups) will not impact measurements of the relative rotations. This feature is also calculated in simulated systems of multiple peptides, including representative aggregates from CG simulations.
Cisoid rotamerization is observed among the second, third, and fourth side chains in simulations of a single peptide using the AA, the bottom-up CG, and the Martini models, as seen in Fig. 2. However, the models differ in their results for coordination between the first and second side chains. The parity of the CG representations in the F4 and Martini models allows for direct comparison. The differences between the two models (namely, amide groups are not explicitly specified in Martini backbones) do not impact measurements of the relative rotations. This feature is also calculated in simulated systems of multiple peptides, including representative aggregates from CG simulations.
In the AA representation, an isolated peptide in water exhibits a transoid arrangement of the first two side chains (SC1-SC2) and a cisoid arrangement of the remaining three side chains (SC2-SC3 and SC3-SC4). The three rotameric measurements are more flexible in simulations with multiple peptides, where clear transoid arrangements also appear among the last three side chains —a feature that does not appear in any of the multi-peptide CG simulations.
An individual peptide simulated with the F4 model adopts a different SC1-SC2 arrangement, with the most probable orientation centered around pi/2. Similar to the more straightforward transoid arrangement of the AA model, this allows for the proximity of the side chain to the backbone, resulting in hydrophobic shielding. This level of shielding may be unavailable in a cisoid arrangement, where the side chains would become crowded and sterically interfere with each other. This effect may modulate the excluded volume of the CG beads involved. In aggregates resulting from the F4 model, all rotamers are cisoid. This is consistent with an intuitive picture of the backbone maximizing the shielding of the hydrophobic side chains by burying them in the aggregate's interior.
For the Martini models, all rotamer distributions have a single mode at 0. For single-peptide simulations using Martini 2, a distinct central peak is observed. In contrast, the single-peptide distributions for Martini 3 appear essentially uniform, albeit with a slight increase in frequency over the range of [−1, 1]. For the multiple-peptide simulations, the distributions for Martini 2 are centered around zero much more sharply with no transoid probability. The multiple peptide systems using Martini 3 also demonstrate a more pronounced cisoid mode, with some likelihood of transoid arrangements. Furthermore, the Martini 3 multi-peptide distributions are noticeably smoother than those of others and appear almost Gaussian in shape. Given the predominance of cisoid rotamerization in the Martini models of the peptides, it is essential to note that these models do not apply a potential energy function to the backbone dihedrals. Although the Martini models include zwitterionic potentials, charges are assigned as either +1 or −1 on the N- and C-terminal backbone beads, respectively. Thus, the observed configurations primarily arise from intermolecular interactions with water, with polar interactions modulating the intermolecular arrangement and reinforcing the cisoid tendency.
Structures formed in the F4 simulations are considerably more diverse, including prolate droplets, cylindrical rods, wavy rods, and branched structures. As with simulations using F2 and the earlier AR3 models, side chain beads are typically shielded at the interior of the aggregates. The F4 nanostructures are solid, allowing no water to intrude among the packed side chain beads. The minimum (transverse) diameters of the various structures are 2 to 3 nm. The structure of the cylindrical rods differs from that observed in the earlier AR3 simulations. The AR3 rods consisted of planar layers comprising three peptides each, stacked in a longitudinal direction. The rods formed using the F4 model do not create such distinct layers. Instead, the peptides are randomly oriented, shielding the side chains in the aggregate and exposing the main chain to the aqueous solvent. The 128 peptide system resolves a single branched network with continuous connections across the periodic boundaries in all directions.
The ten Martini 3 simulations comprising 32 FF peptides in aqueous solution yielded no ordered nanostructures. A representative final state of the FF systems is visualized in Fig. 5(a). In the simulations of 32 FFFF peptides, final states typically showed a stable lamella with side chains on the interior. The charged terminal beads are alternating on the upper and lower surfaces. A representative final state of the FFFF systems is visualized in Fig. 5(b) and (c).
All ten simulations using the Martini 2 model for FF resulted in rodlike morphologies, with the charged terminal backbone beads clustering on the surface. These charged beads are sometimes bunched up, visually suggesting an upper and lower layer to the aggregate, with side chain beads in the middle layer. The rod appears twisted in others, and the charged beads are more evenly distributed over the whole surface. All ten simulations using the Martini 2 model for FFFF uniformly produced layered morphologies in which consecutive peptide backbones are arranged antiparallel with side chain beads fanned out toward the interior of the layer. The consistent and close arrangement of the charged terminal beads yields a layer that appears striped, with backbone beads exposed to the solvent. Typical structures are shown in Fig. 6.
The bottom-up CG simulations for FF demonstrate the formation of spherical micelles, or nanospheres, which represent a potential transient state in the self-assembly pathway leading to the formation of nanorods, as captured by simulations using the Martini 2 model for FF. These results deviate significantly from corresponding results obtained using the Martini 3 model for FF. The formation of branched fibrils, as captured by bottom-up CG simulations for FFFF, differs from the self-assembled nanostructures observed using the two Martini models for FFFF.
As an aside, the bilayers formed in the Martini FFFF simulations can be presumed to be precursors to the formation of layered nanoplates. However, the fact that bilayers are the only morphology represented may indicate that assembly dynamics are driven by the hydrophobicity of the side chains and the length of the peptides, which allow for a preferential arrangement of the charged backbone terminals. In the Martini model, the peptides also lack explicit representation of the amino groups, such that the polar interactions of those groups and their associated effects on the formation of secondary structures cannot be determined. It is noted that these Martini simulations utilized amino acid representations generated using the defaults of the martinize2 program. Modifications to the model, including manual charge distribution, modulation of water self-interaction, and the addition of a dihedral potential, may improve the results.
The system sizes in this study are relatively small, raising the possibility that finite-size effects influence the final nanostructures and their dynamics of assembly. This may be the case despite the mitigating strategy of using potential cutoffs less than half the length of the simulation box (a common approach, in observance of the minimum image convention in MD simulations79). Regarding the hypothesis that the observed dynamics of assembly represent essential characteristics of the peptide chemistry, a favorable comparison of the particle RDFs between the CG and AA systems (Fig. S5 and S11) is observed. This comparison suggests that the CG force field accurately captures nonbonded interactions. Also, the dynamics of assembly are observed to be stable across system sizes (Fig. 9). And while the number of peptides in the simulations – both the raw numbers and the concentrations – suggests the need for discretion when interpreting the results in the context of macroscopic bench experiments, the stability of the dynamics and structures is found to be a positive indicator of the applicability of the CG models.
In the systems using the F2 model, the 32 and 64-peptide maps are nearly identical. Most contacts occur between beads of the first side chains with other first and second side chains, with fewer contacts between beads of the second side chains, a pattern similar to that of the 32-peptide AA system. The CG systems demonstrate practically no contacts involving the terminal beads. The terminal beads are pushed to the micelle surface and excluded from close contact by the backbone curvature induced by packing the side chains in the micelle interior.
In the BA maps, as in the AA maps, the N-terminal makes a moderate number of contacts with the C-terminal and backbone at the second side chain. This constitutes a rebound in these contacts after they disappeared in the CG simulation, but their numbers are in a smaller proportion than in the reference AA system. The BA system also shows contacts highly concentrated on the side chains, but now more evenly distributed among the possible configurations (i.e., intermolecular contacts between the first side chain on each monomer, the first and second side chains, and the second side chains).
The contact maps for systems of 32 and 64 peptides in the AA representation differ. For example, the 32 peptide map shows higher contact counts on the first side chain, while the 64 peptide map shows higher contact counts on the second side chain. The contact maps calculated from CG trajectories for the two system sizes are substantially similar. This is expected as the CG MD simulation for both system sizes predicted identical nanostructure morphology. However, the dissimilarity between the two system sizes in AA representation suggests an underlying behavior that the parameterization of the CG model has captured, which is not visible in either of these long AA simulations. It should be noted that the CG model converges to these stable structures for both system sizes in only tens of nanoseconds of simulated time.
The BA simulations allow the backmapped CG morphology (i.e., a set of micelles) to be relaxed by the atomistic force field. This relaxation occurs due to a combination of detailed peptide reconfiguration and increased spatial resolution, allowing the atoms in the peptides to interact more smoothly over the length of the chain. The 32- and 64-peptide BA contact maps show some differences, but they appear substantially similar nonetheless. This is despite notable dissimilarity between the respective reference AA systems. It is therefore surmised that the equilibrium configurations predicted by the CG model provide plausible conformations and arrangements of molecules within the predicted nanostructures.
To overcome the difficulty of direct interpretation among contact maps with different gross sample counts, the virtual two-dimensional free energy surfaces (FES) implied by the contact maps are compared. First, the contact maps are Boltzmann inverted to yield a two-dimensional FES. Then, using a two-dimensional implementation of the EMD algorithm, the EMD between pairs of FES is calculated at each simulated resolution and across two system sizes. This approach is adapted from an earlier study by Hunkler et al.,80 who employed a similar method to evaluate backmapping-based sampling by comparing FESs from AA, CG, and backmapped AA simulations of ASP and GLU oligomers. A lower EMD indicates greater similarity between the two sampled FESs. Comparisons of FES similarity among systems as measured by two-dimensional EMD is shown in Fig. 8.
![]() | ||
| Fig. 8 EMD values are calculated to compare the two-dimensional FES obtained by Boltzmann inversion of the contact maps shown in Fig. 7. Lower values indicate closer concordance between FES, higher values indicate divergence between FES. (a) FES are compared from AA, CG, and BA simulations in systems of 32 and 64 peptides, for peptide sequences FF and FFFF. The FES implied by AA and BA simulations are typically more similar than those implied by CG simulation. (b) FES are compared between the systems of 32 and 64 peptides at each simulated resolution, for peptide sequences FF and FFFF. FES implied by CG simulations at different system sizes are more divergent in simulations using the F4 force field than those using the F2 force field. | ||
Fig. 8(a) shows that for both peptide chain lengths, the EMD values between CG and AA and those between CG and BA are much larger than those calculated between AA and BA. This is unsurprising since the AA and BA simulations proceed under the same force field, and thus their molecules should sample similar regions. However, given that the nanostructures represented by the BA contact maps inherit some contact propensity from the evolution of the CG simulations, the fact that the AA and BA FES match well demonstrates that the molecular arrangements resolved by the CG MD simulations are, at a high level, consistent with the equilibrium conditions captured using the atomistic force field. Fig. 8(b) compares FES from the nanostructures in the 32 and 64 peptide systems for each peptide sequence, thereby gauging the consistency of the FES associated with the force fields used at each resolution across system sizes.
With that caveat, interaction count time series typically look linear at the early stages before transitioning to exponential behavior. Comparing the evolving structure during aggregation will help identify the underlying dominant processes. The dynamics of the evolving structure can be approximated as linear to understand the relative rates of change of interaction counts. This approximation enables the calculation of an intercept (minimum count, available from single-peptide simulations) and an appropriate scaling factor. In this study, the counts are scaled by the maximum possible counts, i.e., where all available particles overlap (a forbidden state under a typical MD model).
By observing the relationships between peptide clustering and counts of mutual interpeptide interactions, one can develop an understanding of the dynamics underlying the formation of nanostructures. Fig. 9 illustrates the evolution of bead-wise contacts calculated for ensembles of 32 and 64 peptides, using FF and FFFF with the F2 and F4 models, respectively. Fig. S15 shows that the relative proportions of contacts established in the first 2.5 million simulation steps remain stable through to 100 million steps. Fig. 10 depicts the average aggregate size in each ensemble, scaled by the number of peptides in the simulation, to illustrate the proportion of system peptides represented by the average aggregate size over time. This framing clearly shows in the F2 systems a tendency to form aggregates comprising only a fraction of the total peptides in simulation, compared to the F4 systems, which tend to form a single aggregate involving all peptides. Alternative framings of this tendency are shown in Fig. S16.
The FF simulations typically resolve the formation of multiple micelles within 5 ns of simulated time. These micelles do not assemble into a single nanostructure when the simulations are run for an extended duration. Fig. 9 shows that early aggregation in both systems is characterized by a rapid increase in interactions between side chains, indicative of the shielding process of the side chains due to the hydrophobic effect. Interactions between the complementary terminal groups gradually increase during the simulations, reflecting their progressive organization on the surfaces of the individual micelles. Interactions involving main chain beads remain at their minimum values throughout the FF simulations, since the tightly packed side chains bar the main chain beads from interacting with the interiors of the micelle, and most micelles do not approach each other. Interactions between amide groups are essentially zero throughout both simulations. Hence, the process of micelle formation in these simulations is driven almost entirely by the hydrophobic effect, with nominal influence of the polar groups on the evolution of the micelle surfaces.
In the simulation with 32 FFFF peptides, the system forms a single straight nanorod within 1 ns of simulated time. As in the FF simulations, early aggregation exhibits a rapid increase in interactions between side chains, which occurs much more quickly than other bead interactions. Interactions between amide groups increase gradually, followed by interactions involving main-chain beads. Complementary terminal beads have essentially no interactions. Therefore, the nanorod formation in this simulation is initially driven by the hydrophobic effect, with subsequent long-range organization into a nanostructure.
The simulation of 64 FFFF peptides revealed the formation of a highly networked three-dimensional nanostructure, where the branches are typically 2 nm in diameter, consistent with the results for the 128-peptide system shown in Fig. 4(c). The rapid increase in side chain interactions mirrors that of the 32-peptide system. However, in the 64 peptide system, the amide group interactions also rapidly increase at the same pace as the side chains, with a slight lag in main chain interactions. This relatively rapid spatial organization of the backbones in the 64 peptide system, driven by interactions between the amide groups, signals the simultaneous formation of multiple proto-rods similar to those observed in the 32 peptide system, which form throughout the simulation box and coalesce into a network-like nanostructure.
Mayans and Alemán83 have compiled data on self-assembly in di-, tri, and tetraphenylalanine with and without protective capping groups, and in a variety of solvents ranging from apolar to polar, summarized in the reprinted Fig. 11. In the highest-concentration aqueous solutions, the authors report that uncapped FF and FFFF peptides assemble into nanotubes, microtubules, and fibrils. Mayans and Alemán also note the influence of solvent polarity and terminal capping on the aggregate morphologies. Under various conditions, the morphological landscape of FF solutions grows to include plates, ribbons, and toroids, while FFFF is also seen to produce branching, flower-like, and braid-like structures.
![]() | ||
| Fig. 11 Summary of assemblies formed by phenylalanine di-, tri, and tetra-peptides as a function of solvent polarity and peptide concentration: (a) uncapped and (b) capped at both the N- and C-terminus with fluorenyl groups. Reprinted from ref. 83. | ||
There is appreciable similarity in the morphological diversity of the simulated FFFF nanostructures with the corresponding experimental results. Specifically, the networked or branched fibrils in the FFFF simulations are consistent with experimental observation under equivalent solution conditions. It is surmised that the appearance of such structures in the simulations reflects a decreased influence of the charged terminal groups in analogy to the charge shielding provided by capping groups in experiments. This decrease in influence may be due to various factors, including charge distribution over the atoms of the reference AA peptides and within the CG representation.
In the case of FF, the early phase of aggregation of the FF peptides typically begins with the formation of large incoherent aggregates before forming distinct octamers, as demonstrated by the time evolution of the average number of molecules per cluster. The formation of these predicted nano-micelles is a plausible transient state in the pathways to the formation of larger equilibrium morphologies reported in experiments.85 However, the simulations do not demonstrate the formation of larger nanostructures and therefore appear to deviate from the experimental results. This discrepancy is likely due to a combination of physical effects and limitations of the CG methodology. It is noted that the simulated concentrations are set at a high value to exceed the (unknown) critical micelle concentration and ensure aggregation, a common strategy in simulations investigating aggregation.1,5,11,20,34,86–90 The simulated peptide concentrations in this study (200–400 mg mL−1) are two to four orders of magnitude higher than the range of values reported by Mayans and Alemán. The solvent content of the simulations is therefore much lower, reducing the entropic contributions of water to the formation of the aggregates. This mitigation of entropic forces may partially explain why the micelles do not collapse into more densely packed aggregates.
It may also be that the CG model is too coarse34 to represent the chemical details required for close packing. Positing once again that micelles are genuinely on the continuum of concentration-dependent morphologies, the CG force field may be too smooth to allow relaxation into closely packed nanostructures. Experiments show that the supramolecular organization of phenylalanine dipeptides is an Ostwald process, that is, a process in which self-assembled aggregates partially dissociate, with the solution progressively “ripening” into a state where lower energy aggregate morphologies dominate.91 In this case, the micelles may represent a relatively high-energy early aggregation state, but with model conditions that do not permit the required partial dissociation. These arguments suggest improvements to the model, such as increasing the resolution of the representation. Furthermore, the detailed dynamics of the self-assembly process may reflect the physicochemical properties of high-concentration solutions, whose behavior may differ fundamentally from that of much lower in vitro concentrations used in experiments.92
Predicted self-assembled nanostructures encompassing FFFF reflect polymorphism consistent with prior experimental studies. The F2 model's predictions of the early phase of aggregation are in agreement with corresponding experimental results. The nanostructures predicted in the F2 model beyond this phase appear to deviate from those reported in experiments. However, it is surmised that concentration effects and a highly coarse representation of FF may result in structures associated with transient states in the aggregation process.
Comparable peptide systems were also simulated using two versions of the popular Martini model, allowing a performance comparison on structural and computational metrics. While Martini has an advantage over bottom-up models in computational performance, the latter can rapidly converge on equilibrium structures with reasonable accuracy, and future performance upgrades can be used to resolve larger systems.
These results are by no means an indication that Martini is not a suitable framework for investigating short peptide aggregation. For one, the total counts of peptides in the simulations presented here are quite small for Martini, which is routinely used to simulate systems hundreds or thousands of times larger. The Martini ecosystem also supports a variety of structural constraints, and several targeted refinements of the force field have been described. Rather, the comparison simply shows that a bottom-up model trained on modest reference data can resolve fine-scale system dynamics that are fundamentally distinct from those seen in a very popular top-down model.
This study demonstrates that chemically specific models may be transferred and judiciously refined to simulate various systems within a family of aromatic peptides. Notably, these CG models are parameterized solely with nonbonded interactions, as well as 2-, 3-, and 4-body bonded interactions, and the parameterizations are readily interpreted in terms of reference structures. These CG models are computationally efficient in scale bridging while providing a representation that is consistent with corresponding AA representations. These attributes enable the CG models to capture multiscale processes such as self-assembly while simultaneously generating chemically accurate molecular conformations on demand. Currently, efforts are underway to automate the demonstrated hybrid CG strategy, thereby increasing efficiency in the development process to a level comparable to that of machine learning-enabled CG strategies. The CG approach discussed in this study could be potentially adopted to investigate multiscale processes involving multiple molecules in pure or mixed molecular systems, encompassing biomolecules or polymers of different chemistries. Furthermore, the CG approach can be adopted to propose designs and iteratively optimize molecular sequences with specific functionalities, such as folding, binding, or assembly. To this end, the approach can be integrated into a high-throughput workflow93 that identifies specific molecular sequences, starting with candidates of sequences or blocks, with target functionalities.
Supplementary information (SI): containing structural distribution plots, alternative clustering profiles, performance evaluation of the CG systems, and supplementary model description. See DOI: https://doi.org/10.1039/d5me00207a.
| This journal is © The Royal Society of Chemistry 2026 |