Room-temperature stacking disorder in layered covalent-organic frameworks from machine-learning force fields †

The local structures of layered covalent-organic frameworks (COFs) deviate from the average crystal structures assigned from X-ray diﬀraction experiments. For two prototype COFs of Tp-Azo and DAAQ-TFP, density functional theory calculations have shown that the eclipsed structure is not an energy minimum and that the internal energy is lowered for an inclined stacking arrangement. Here we explore the structural disorder of these frameworks at 300 K through molecular dynamics (MD) simulations using an on-the-fly machine learning force field (MLFF). We find that an initially eclipsed stacking mode spontaneously distorts to form a zigzag configuration that lowers the free energy of the crystal. The simulated diﬀraction patterns show good agreement with experimental observations. The dynamic disorder from the MLFF MD trajectories is found to persist in mesoscale MD simulations of 155 thousand atoms, giving further confidence in our conclusions. Our simulations show that the stacking behaviour of layered COFs is more complicated than previously understood.


Introduction
Covalent-organic frameworks (COFs) are a class of porous organic materials synthesized from secondary building units through reticular chemistry. 1,2][8][9][10][11][12][13][14] One example is stacking disorder, in which layers of a COF with twodimensional (2D) connectivity are offset at different amounts relative to adjacent layers.Recent evidence for stacking disorder comes from athermal density functional theory (DFT) calculations, which revealed a variety of stacking modes can exist in contrast to eclipsed stacking (i.e.zero layer offset) typically reported in experiments. 6,8,13,15However, the stacking behaviour obtained from DFT simulations is inherently static in nature and limited by the small system size that can be studied.As a result, prior models may not be representative of the true stacking behaviour of layered COFs at ambient temperature.
The stacking disorder in COFs is challenging to probe experimentally due to the averaging effect of time and space from various probes.In 2011, Lukose et al. 16 used DFT calculations and X-ray diffraction (XRD) to investigate the formation of four different stacking sequences (eclipsed, zigzag, serrated and inclined) in a set of reported and hypothetical 2D COFs and found that these This journal is © The Royal Society of Chemistry 2023 stackings exhibit different diffraction patterns and band gaps.In 2017, Fan et al. 17 confirmed that SIOC-COF-8 and SIOC-COF-9 stack in an inclined arrangement using a combination of powder XRD, pore-size distribution analysis, and transmission electron microscopy (TEM) analysis.In 2022, Kang et al. 11 used 13 C solid-state nuclear magnetic resonance (ssNMR) to study the chemical environment of atoms in adjacent layers of 2D COFs and reported that different stacking modes can be quantitatively characterized by the split signals of the methoxy/methyl side groups.In the same year, Natraj et al. 10 employed continuous rotation electron diffraction and XRD to demonstrate that TAPPy-PDA and TAPB-DMPDA COFs are disordered along the stacking c direction while remaining ordered within the ab plane.
COFs can be viewed as organic macromolecules with typically more than 100 atoms in the crystallographic unit cell.Thus, it is unfeasible to study their dynamics and disorder over long time and length scales using ab initio molecular dynamics (MD) simulations.2][23] For classical MD simulations, force fields with fixed functional forms are required to describe the interatomic forces.Popular force fields such as OPLS 24,25 and DREIDING 12,[26][27][28][29] have been used to study the dynamics of COFs.However, the accuracy depends strongly on the particular force field parameterization and its ability to describe the local structure and extended interactions.
1][32][33][34][35][36] MLFFs trained on DFT data have demonstrated high efficiency and accuracy in predicting atomic energies and forces. 32,34,37On-the-fly learning is an efficient method for force field training.It constructs a DFT reference database by sampling an MD trajectory for structures with high predicted uncertainty. 31This approach means that during the MD simulation, only a fraction of the force evaluations are performed using DFT, with the rest provided by the MLFF.This state-of-the-art method has been used to study the dynamical properties of many materials, including the melting points of Al, Si, Ge, Sn and MgO, 31 the phase transitions of hybrid perovskites, 38 and the thermal conductivity of zirconia. 39hese previous studies indicate the potential of applying MLFFs to understand the dynamics of structurally and compositionally complex materials such as COFs.
In this work, on-the-fly MLFF models are trained for the prototype COFs, Tp-Azo 40,41 and DAAQ-TFP 42 (Fig. S1, ESI †), which have exhibited high energy storage performance as battery electrodes.We find COF structures that are initialized in an eclipsed stacking arrangement spontaneously shift to a zigzag stacking mode during MLFF-MD simulations.In contrast, the inclined structures retain their stacking throughout, suggesting that both zigzag and inclined stacking are locally stable arrangements.Using classical force fields to simulate longer time and length scales, similar, but more diverse, stacking properties are observed.These different stacking sequences in COFs lead to different radial distribution functions (RDF) and predicted XRD patterns, and we show that a zigzag motif is in best agreement with experimental observations.

Computational methods
Kohn-Sham DFT is a powerful tool for calculating atomic energies and forces from first principles.However, its computational cost generally limits its applicability to relatively small systems and short-range time scales.We employ MLFFs to model the dynamics of two COFs at room temperature for intermediate system sizes.The results obtained from the MLFF were compared to those obtained from a classical force field for large-scale simulations.

Crystal structure optimisation
The starting crystal structures for Tp-Azo 41 and DAAQ-TFP 42 COFs were taken from our previous DFT study, 15 in which the structures were relaxed from experimental reports.To further optimize the structures and determine the lattice parameters, we employed the projector augmented wave (PAW) formalism 43 within the VASP 43,44 code.The Perdew-Burke-Ernzerhof (PBE) 45 exchange-correlation functional was used in conjunction with the Tkatchenko-Scheffler method incorporating iterative Hirshfeld partitioning for the treatment of interlayer van der Waals (vdWs) interactions. 46Wave functions were expanded in a plane-wave basis with an energy cutoff of 520 eV.Gaussian broadening with a width of 0.01 eV was employed to set the electronic occupations.At each ionic step of the DFT calculation, the total energy was converged to within 10 À6 eV.These parameters were used consistently throughout the study.The optimised eclipsed and inclined Tp-Azo and DAAQ-TFP structures and their lattice parameters are shown in Fig. 1 and Table S1 (ESI †).

Machine-learning force field
Following the geometry optimization, 1 Â 1 Â 2 supercells (containing 216 atoms) were used to generate the MLFF datasets for eclipsed and inclined Tp-Azo and DAAQ-TFP.The isothermal-isobaric (NpT) ensemble with the Langevin thermostat 47 and a time step of 0.5 fs was used during the training process.We employed the on-the-fly MLFF generation scheme implemented in VASP, 31,48 which uses Gaussian approximation potentials (GAP) with the Smooth Overlap of Atomic Positions (SOAP) approach for structural features. 49adial and angular cutoffs of 7 Å and 5 Å were used to describe the local environments.Other parameters of the on-the-fly MLFF scheme were kept at their default settings.These settings were consistently applied throughout the MLFF study.
Initially, we trained the force field from the equilibrium ground state structure at 300 K. Next, we trained the model on a gradual heating run from 300 K to 500 K starting from the equilibrium structure to probe a more diverse region of the potential energy surface.The time for each training is shown in Fig. 3 for Tp-Azo and Fig. S2 (ESI †) for DAAQ-TFP.The temperature was kept below the degradation temperature of the COFs but high enough to widely sample the potential energy surface. 1,50In total, the MLFF was trained for more than 150 ps, including several restarts in each training run as shown in Fig. 3 and Fig. S2 (ESI †).
The trained force fields were validated using a test set of 500 structures, each consisting of a 1 Â 1 Â 2 supercell of each stacking mode for the two COFs.These structures were randomly sampled from 10 MD trajectories, each generated with a different random seed.Each MD trajectory was obtained within an NpT ensemble at 300 K for 50 ps, utilizing the generated MLFF.The energies, forces and stresses for these 500 structures were subsequently calculated using DFT and compared with those obtained from the MLFF, in order to investigate the accuracy of the trained force field.Large errors in the validation process indicate the training should be continued.
To study the dynamic behaviour of large COFs under ambient pressure at 300 K, validated force fields were applied to 2 Â 2 Â 8 supercells containing 3456 atoms using an NpT ensemble.The expansion of supercell along the stacking axis is to enable the description of short to medium-range fluctuations.The simulation process involved the use of separately trained force fields for each stacking mode of each COF, with production runs lasting over 100 ps with a time step of 0.5 fs.
To analyze the disorder present in the COFs, the structures from the MD trajectory were analyzed for interlayer offsets.The overall process described above is illustrated in Fig. 2.

Classical force field
Classical force field-based MD simulations were carried out using the large-scale atomic/molecular massively parallel simulator (LAMMPS) code. 51The OPLS-AA force field was chosen for its general applicability in predicting the structure of organic molecules. 25A 5 ffiffi ffi 3 p Â 3 Â 48 supercell (containing 155 520 atoms) was utilized for both the eclipsed and inclined configurations.We performed MD simulations with the NpT ensemble at 300 K and 1 atm using the Nose ´-Hoover thermostat 52 and barostat. 53he simulations were run for 300 ps with the last 100 ps used for sampling and analysis after allowing for equilibration.The initial structures for each configuration were constructed based on the DFT-optimized structures presented in Fig. 1.

MLFF molecular dynamics
The number of DFT calculations performed during on-the-fly training, along with the root-mean-squared error (RMSE) in the energies per atom, forces and stresses are summarized in Table 1.Fig. 3 illustrates the Bayesian errors, the structures that were sampled with DFT, and the force error threshold during the training process for eclipsed (a) and inclined (b) Tp-Azo.The Bayesian errors in the training of eclipsed and inclined DAAQ-TFP are provided in Fig. S2 (ESI †).
During the training process, over 99% of force evaluations were performed using the MLFF rather than expensive DFT calculations, saving a significant portion of computational time.The final models show small RMSEs in the predicted energy (0.41-0.58 meV per atom), force (0.076-0.082 eV Å À1 ), and stress (0.30-0.39 kbar).Furthermore, the maximum Bayesian errors during training remain low and stable as illustrated in Fig. 3 and Fig. S2 (ESI †).These errors are similar to other studies that have utilized MLFFs for MD. 31,38,48g. 2 The computational workflow used for on-the-fly MLFF training, validation, and production runs for the Tp-Azo and DAAQ-TFP COFs studied in this work.We validated the trained force fields to demonstrate their performance.Energies, forces, and stresses calculated from the MLFFs and DFT for the eclipsed and inclined validation structures are shown in Fig. S3 and S4 (ESI †).The RMSE of energy (0.42-0.66 meV per atom), force (0.077-0.084 eV Å À1 ), and stress (0.30-0.44 kbar) of the validation dataset, as presented in Table 1, are almost the same as the RMSEs observed during the training.These results indicate that the force fields have been trained to sufficient accuracy.
To investigate the stacking behaviour of the COFs, we performed production calculations in a 2 Â 2 Â 8 supercell (containing 3456 atoms).Calculations were performed with the NpT ensemble at 300 K for a total of 100 ps and a time step of 0.5 fs.MLFF-MD runs were performed for the eclipsed and inclined structures of both COFs.The Bayesian errors for the production MD runs remained stable throughout the entire trajectory and are illustrated in Fig. S5 (ESI †).The lattice parameters of the eclipsed and inclined Tp-Azo structures during the production runs are presented in Fig. 4 (Fig. S6, ESI † gives the results for DAAQ-TFP).The cell lengths (a, b, c) and angles (a, b, g) averaged over the whole trajectory are given in Table 2.We also report the internal energy (E 0 ) of the final structure at the end of the MD run after being relaxed using the MLFF.The average lattice parameters differ slightly from the initial equilibrium starting structures (a and b change less than 0.5%, c expands in the range of 3-6%) as expected due to the inclusion of temperature in the MD run (Table S1, ESI †).Fig. 4 (Tp-Azo) and Fig. S6, ESI † (DAAQ-TFP) reveal the structures are dynamically stable during the trajectories with lattice parameters that remain roughly constant.
The plots of total internal energy versus time, presented in Fig. S7 (ESI †), demonstrate the excellent stability of the entire MLFF-MD trajectory.The stability of the simulations indicates that both configurations can be considered metastable states, which will be further rationalized in the following section.
A comparison of the internal energy calculated from the relaxation of the final structures in the MD using MLFF also shows that the internal energy of the inclined stacking is lower than that of initially eclipsed COFs that distort to a zigzag structure during the simulation in Table 2.In Tp-Azo, the energy difference between the zigzag and inclined structures is 0.24 eV per unit cell (2.20 meV per atom), while it is 0.11 eV per unit cell (1.04 meV per atom) in DAAQ-TFP.However, the energy difference between the two stacking modes is relatively small suggesting competition between the two arrangements will be present at ambient temperatures.

Tracking disorder
3.2.1 Layer offsets in three dimensions.During the MD simulations, it was observed that the eclipsed COFs shifted into a zigzag stacking mode while the inclined COFs retained their original stacking behaviour (Fig. 5 and Fig. S8, ESI †).This indicates that eclipsed stacking is not a stable stacking mode whereas inclined stacking is locally stable.We find that the layers in the inclined COFs are more corrugated than those in the zigzag COFs.This follows similar reports of corrugated frameworks in 2D COFs, which are believed to originate from inter-layer interactions between polycyclic aromatic hydrocarbons. 54ur results reveal that the main source of disorder occurs along the c direction of stacking rather than in the ab plane.This is in agreement with continuous rotational electron diffraction experiments, which found that 2D COFs of TAPPy-PDA and TAPB-DMPDA behave as single-crystals in the ab plane but are disordered in the c dimension. 10To demonstrate this point, in Fig. 6 we plot the stacking disorder averaged over the full MD trajectories for initially eclipsed and inclined Tp-Azo (the same analysis is presented in Fig. S11 (ESI †) for DAAQ-TFP).The disorder is quantified by the offset of each layer relative to adjacent layers.The offsets are decomposed into their components along the x, y and z directions and are averaged along the MD trajectories after equilibration (Fig. S9 and S10, ESI †).For the x and y directions, an offset of 0 Å indicates a fully eclipsed structure, whereas a constant positive or negative offset indicates inclined stacking.A constant offset in the z direction indicates a fixed layer separation.
For zigzag Tp-Azo (initially eclipsed pre-equilibration), the average absolute offset for the x, y, and z directions are 0.2 Å, 2.2 Å, and 3.4 Å, respectively.The small values for the x offset Table 2 Averaged unit cell lattice parameters and total energy of eclipsed and inclined Tp-Azo and DAAQ-TFP from MLFF-MD calculations.The energy (E 0 ) is the internal energy per unit cell calculated from a relaxation of the final structures at the end of the MD simulation using the MLFF.DE 0 is the energy difference per unit cell between the zigzag and inclined structures.'Z' and 'I' correspond to zigzag and inclined configurations after MLFF-MD, respectively (close to zero) indicate adjacent layers are almost perfectly stacked in the x direction.The offsets for the y direction oscillate from BÀ2.2 to B2.2 Å, indicating each layer is shifted by an equal but opposite amount relative to adjacent layers.This oscillating offset is a characteristic signature of a zigzag stacking mode.The relatively constant offset in the z direction indicates the layer separation remains roughly fixed throughout the entire MD run.The total absolute displacement in the ab plane is 2.1 Å, highlighting the strong deviation from the idealised eclipsed structure.The eclipsed DAAQ-TFP structure also shifts to a zigzag stacking mode during equilibration.The interlayer offsets show broadly the same features as Tp-Azo, with a relatively large overall displacement distance in the ab plane of 1.8 Å (Fig. S11a, ESI †).
The inclined COF structures retain their inclined stacking throughout the entire MD run.This can be seen in the stacking disorder analysis which reveals a roughly constant negative offset of 2.6-2.8Å in the x direction and 0.6-1.1 Å in the y direction for Tp-Azo (offsets in DAAQ-TFP are 2.1-2.4Å and 0.4-0.7 Å in x and y, respectively).The overall offsets in the ab plane are 2.4 Å (Tp-Azo) and 2.5 Å (DAAQ-TFP) (Fig. 6b and Fig. S11b, ESI †).Thus, the overall offsets are marginally larger for inclined stacking than initially eclipsed stacking in both COFs.The inter-layer distances are very similar, around 3.4 Å in each case, independent of the stacking mode (Fig. 6 and Fig. S11, ESI †).The calculated offsets in the ab plane fall in the predicted energy minimum basin from static DFT calculations which span 1.7 Å to 3.5 Å. 15 Configurational entropy is a thermodynamic driving force for the observed behaviour.In the ab plane, there are multiple degrees of freedom in the zigzag configuration, but only one in the inclined configuration.As there is only a small difference in the internal energy of zigzag and inclined configurations, an entropic contribution from stacking faults in a zigzag configuration, which is not present in the inclined configuration would be sufficient to change the order of free energies at T = 300 K. Thus, we expect the zigzag mode to be the favoured structure at ambient temperature.
3.2.2Radial distribution functions.The radial distribution function (RDF) describes the probability of finding a particle at a distance r from another particle.A recent study of the imine-linked TTI-COF used RDF analysis and simulations to compare the total scattering functions of different stacking modes, revealing random layer offsets at low temperature. 14e first calculate the RDF of the eclipsed and inclined structures from static DFT relaxed structure.The O-O RDF of  the eclipsed Tp-Azo structure reveals a sharp peak at 3.2 Å, corresponding to the inter-layer distance.In the inclined stacking structure, the peak position is shifted to 4.2 Å due to additional displacement along the ab plane (Fig. 7). 15Comparable results are found for DAAQ-TFP (Fig. S12, ESI †).
We next calculate the average RDF across the MLFF-MD trajectories after the structures are equilibrated.We find that the small-distance sharp peaks are broadened and shifted to larger distances compared to the static DFT RDFs.The shift in the initially eclipsed structures is larger than that of the inclined structures, primarily as the eclipsed structures shifted to a zigzag arrangement during the simulation.There is a qualitative difference in the RDFs of the inclined and zigzag structures, indicating that RDF analysis can be used as a probe for stacking disorder.
3.2.3Simulated diffraction patterns.The stacking modes of 2D COFs can be experimentally observed through powder XRD measurements.To establish a connection between simulation results and experiments, we predict the diffraction patterns of COFs in different stacking modes.The averaged diffraction patterns of zigzag and inclined COFs were simulated by sampling the trajectory every 5 ps.We compare these to the eclipsed and inclined structures obtained from static DFT relaxations.XRD patterns were simulated using pymatgen. 55here are significant differences in the simulated XRD patterns of the zigzag and inclined structures (Fig. 8 and Fig. S14, ESI †).The zigzag structure shows a single sharp low-angle diffraction peak, whereas the inclined structure has a split low-angle peak.The peak splitting in the inclined structure is due to the reduced symmetry in slip-stacked structures and is consistent with previous simulated XRD patterns from DFT relaxed COFs. 14,56or both Tp-Azo 41 and DAAQ-TFP, 42 the simulated XRD of the zigzag structure shows a significantly improved fit with the experimental PXRD pattern compared to the inclined structure.Interestingly, the averaged zigzag XRD pattern and the XRD from the eclipsed static DFT relaxation look similar up to high angles (subtle differences are observed for the (001) reflection peaks in Fig. S13 and S14, ESI †).Many studies have concluded that the stacking of COFs is eclipsed based on good agreement between the PXRD data and the simulated eclipsed diffraction patterns.However, our work demonstrates that comparisons of experimental diffraction patterns with simulated patterns from static DFT are not sufficient to unambiguously assign the stacking mode.Indeed, the instability of the eclipsed stacking in our simulations combined with the good agreement of the zigzag structure with experimental XRD patterns, suggest that previous work may be incorrectly assigning the eclipsed configurations.

Mesoscopic dynamics
To further investigate the dynamic structures of the COFs on a mesoscopic length scale of over 15 nm, MD simulations using a classical force fields were performed (Fig. 9 and Fig. S16, ESI †).Lower accuracy is expected here as there has been no systemspecific tuning of the model parameters; however, the equilibrium structures obtained from the OPLS-AA force field show reasonable agreement with the MLFF (and underlying DFT) values (Table S2 and Fig. S15, ESI †).
Again we considered room temperature dynamics starting from the two initial stacking configurations.Qualitatively similar dynamic behaviour is observed.Importantly, the eclipsed COFs spontaneously disorder into the zigzag configuration at equilibrium, while the inclined COFs maintain their stacking mode in these large-scale MD simulations (Fig. 9, Fig. S16 and S17, ESI †).This analysis gives us confidence that our predictions with the MLFF are not limited by finite size-effects in the MD simulations.

Conclusions
To investigate the dynamic properties of the layer disorders in 2D COFs, we performed MD simulations of two prototypes of COFs: Tp-Azo and DAAQ-TFP, using on-the-fly MLFFs and the classical OPLS-AA force field.The small Bayesian errors  In the MLFF-MD simulations, the initially eclipsed configurations shift into a zigzag stacking sequence, while the inclined configurations remain in the inclined mode, showing that both are locally stable states at ambient conditions.However, due to the larger entropic contribution to the zigzag stacking, we expect this to be the favoured structure.The predicted diffraction patterns of the ''random'' zigzag mode from the MLFF exhibit significantly better agreement with the reported experimental PXRD than the (perfectly) eclipsed or inclined stacking patterns, highlighting the importance of sampling the dynamic structures.This questions the common assignment of eclipsed structures on the basis of low-quality PXRD patterns often reported for 2D COFs in the literature.The results from classical force field dynamics are qualitatively consistent MLFF-MD simulations, supporting that a zigzag stacking structure is realistic on longer lengths scales.
Overall, the results of the room temperature simulations indicate that the on-the-fly MLFF is a valuable tool for accurately describing the dynamic properties of 2D COFs and that the stacking structure of these materials is more complicated than previously reported.

Fig. 3
Fig. 3 Maximum Bayesian errors of the force for per atom for (a) eclipsed and (b) inclined Tp-Azo during the whole training process.The black dots indicate the steps where ab initio calculations were performed.The orange and red lines in each plot correspond to the threshold criterion used in the training of eclipsed and inclined Tp-Azo, respectively.The dips in the threshold curves correspond to restarts of the training runs, where the threshold was reset to its default value.The three sets of temperatures and times indicated on the plot correspond to the training, retraining, and heating runs described in the methodology.

Fig. 4
Fig. 4 Evolution of the lattice parameters of (a) a and b, (b) c, (c) a and b, (d) g in initially eclipsed and inclined Tp-Azo during the MD trajectory at T = 300 K.The x-axis represents the time in picoseconds.The y-axis represents the lattice parameters in Angstroms or degrees.'IE' and 'I' correspond to initially eclipsed and inclined configurations, respectively.In the simulation of MLFF-MD, the initially eclipsed COF distorted into zigzag stacking.

Fig. 5
Fig. 5 COF structures obtained by relaxing the final structure at the end of the MD simulations using the MLFF for (a) zigzag Tp-Azo, (b) inclined Tp-Azo.The structures are viewed along the c direction (top panel) and ab plane (bottom panel).The layers are numbered from 1 to 8.

Fig. 6
Fig. 6 Offsets in the x, y, z directions between adjacent layers in (a) zigzag and (b) inclined Tp-Azo.The layer numbers correspond to the numbers in Fig. 5.

Fig. 7
Fig. 7 RDF of O-O bonds in initially eclipsed, inclined and zigzag Tp-Azo.The orange and red dotted lines represent the RDF of the initial eclipsed and inclined Tp-Azo at the beginning of the MD simulation.The blue and green solid lines are the average RDF in the trajectory after the structures have stabilized.

Fig. 8
Fig. 8 Simulated average diffraction patterns of (a) zigzag Tp-Azo, (b) inclined Tp-Azo were generated by selecting structures every 5 ps in the MD trajectory.(a and b) Cover the 2y range of 0-301.The bottom diffraction pattern in each plot is from the initial structure of each COF.