Coarse-grained modelling to predict the packing of porous organic cages

How molecules pack has vital ramifications for their applications as functional molecular materials. Small changes in a molecule's functionality can lead to large, non-intuitive, changes in their global solid-state packing, resulting in difficulty in targeted design. Predicting the crystal structure of organic molecules from only their molecular structure is a well-known problem plaguing crystal engineering. Although relevant to the properties of many organic molecules, the packing behaviour of modular porous materials, such as porous organic cages (POCs), greatly impacts the properties of the material. We present a novel way of predicting the solid-state phase behaviour of POCs by using a simplistic model containing the dominant degrees of freedom driving crystalline phase formation. We employ coarse-grained simulations to systematically study how chemical functionality of pseudo-octahedral cages can be used to manipulate the solid-state phase formation of POCs. Our results support those of experimentally reported structures, showing that for cages which pack via their windows forming a porous network, only one phase is formed, whereas when cages pack via their windows and arenes, the phase behaviour is more complex. While presenting a lower computational cost route for predicting molecular crystal packing, coarse-grained models also allow for the development of design rules which we start to formulate through our results.

1 Parameters for the coarse-grained model  Table S3: Values of the o↵set value for the torsional modulation term used in each simulation.
Packing type O↵set angle window-to-window ⇡ 2 window-to-arene 0 S-3 Table S4: Cluster sizes for phase determination in the window-to-window simulations.
We note that due to the very directional interactions of the patches in the 0.1 simulations, two clusters had formed. Thoroughout the simulation these clusters were constantly combining and breaking apart from each other, leading to the variation in the cluster size.
S-4  Figure S1: (a) Representative cluster of the amorphous phase formed in window-to-window simulations when ang 0.7. Here the cluster is taken from the low temperature simulation when ang = 0.9. (b) Cross section of the plastic phase formed at low temperatures when ang = 1.0, evidencing orientational disorder. CIF files for both these phases are provided in the ESI.

Structural determination
Clusters formed from HOOMD-Blue simulations were transformed onto clusters of cages using custom made code. This code transformed for the distance between neighbouring octahedra from ⇡ 0.6Å as in the simulations, to ⇡ 20Å between the cages for easy visual examination of the structure. A second custom made code then picked out a unit cell of repeating cages ( Fig. S2(a)). As the unit cell contained orientations of the cages which were slightly disordered due to the nature of the simulations, the structure was coarse-grained in order to use traditional symmetry solving algorithms. The coarse graining process worked by creating a structure with only the central position of the cage and the position of three of the carbons of the arene, in order to preserve the symmetry of the cage ( Fig. S2(ac)). Using this coarse-grained structure, FINDSYM S1,S2 was employed to determine the structures space group. The same transformation as invoked by FINDSYM to the coarsegrained unit cell, was then applied to the original unit cell ( Fig. S2(e)). Using the solved space group, the symmetry conditions were then applied to the unit cell, and the sites that S-6 related to the same atomic position were merged to get the full structure of the cage under symmetry conditions. This process worked best for cages with a small patch width, but for some of the ordered phases at large patch widths for the window-to-arene simulations, the orientation of the coarse-grained structures were still too disordered to solve using this process, i.e. FINDSYMM was unable to find an ordered structure even with the coarsegrained configuration. In these cases, the clusters were instead compared visually to known solid-state structures of octahedral porous organic cages. An overview of which of the space groups of the window-to-arene structures were solved using the coarse-graining process and which were solved visually is given in Table S6.   Figure S3: Comparison of the simulated phases of the window-to-arene simulations (top) to their corresponding experimental phases (bottom) as solved visually. (a) ang = 0.6/CC9, the experimental structure is derived from CC9 with the vertex functional groups (phenyl groups) removed for easy comparison. (b) ang = 0.7/CC1 .

S-9
5 Phase diagram determination The phase diagrams for both the window-to-window and window-to-arene simulations were coloured using a continuous colour map based on their similarities to the RDF of the solved ordered structures at low temperature Fig. S6. For phases at small patch widths, the structures are relatively ordered and as such the RDFs were compared to the RDFs of the ordered structure calculated with 40 bins, i.e. a high resolution RDF ( Fig. S6(a),(c)). On the other hand, the phases at large ang are inherently more disordered, and as such so are their RDFs Fig. S4,S5. Therefore, the similarities between the simulated and solved structure was compared using an RDF with a fewer number of bins (25) i.e. an RDF lower resolution.
The comparisons between the RDFs were made by using a time series analysis, dynamic time warping, an algorithm commonly used for measuring the similarity between two temporal sequences. This analysis has been used to compare the di↵erence between two PXRDs before. S3 In this study, we used it to compare RDFs by first normalising the RDFs such that the largest value of g(r) = 1, and then calculating the dynamic time warping. The values returned by the algorithim were then normalised for both the window-to-window and window-to-arene simulation, which gave a similarity measure between 0-1 for each ordered RDF where 1 is the most similar and 0 the least. This similarity measure was then used to colour the phase diragrams. For the window-to-window simulation, given there was only one phase deemed to have both orientational and translational order, the phase diagram was coloured orange by the similarity measure to said phase ( Fig. S6(a)). For the window-toarene simulation, as there were four separate ordered phases, the similarites of the RDFs for the most disparate three were used to colour the phase diagram red, green, and blue ( Fig. S6(b-d)).

Predictive capabilities
To determine whether atomistic calculations can be used to determine the likely values of ang for known cages, we looked at how the energies of two cages slipping over one another changes as a function of position and distance. For this we focused on the energetics of CC9 and CC1 due to them both having solid-state structures which exist within the same simulations; window-to-arene. We first fully optimised both CC9 and CC1 in the mixed Gaussian and plane waves code CP2K/QUICKSTEP S4 with the PBE functional, S5 GTH-type pseudopotential, S6 TZVP-MOLOPT basis sets, S7 using a cuto↵ of 400 Ry for the plane wave grid and the Grimme-D3 dispersion correction. S8 With the optimised cages, we than ran single point calculations using the OPLS3 force field S9 as implemented using Schrödinger S10 on two cages with the same orientation which were displaced next to eachother along the h111i axis ( Fig. S10(a)). To determine the preferred orientation and displaecment between the two moleclues, we rotated one of the cages around the h111i octant at di↵erent distances from the first cage ( Fig. S10(b)). To monitor how the energy of the dimers changed with position, we translated one of the cages around the h111i octant at di↵erent distances from the first cage ( Fig. S10textcolorblue(c)). For these calculations we constrained the vertices of the cages so as to ensure we were sampling the energy of the rigid cages textcolorbluerotating/slipping, rather than possible conformational e↵ects. We then measured the interaction energy between the cages in each pair by calculating: For the rotation calculations, we plotted all points on a graph where E int < 0 (Fig. S11).
These showed that the bulkier side groups considered here would not e↵ect LJ and as such our octahedral shapes can be considered a crude representative of all the cages. For the slipping calculations, we then plotted all the energies where (E int E min int )  30 kJmol 1 where E min int is the minimum interaction energy between two cages (Fig. S12). We only looked S-17 at energies within 30 kJmol 1 as we deemed that energies higher than 30 kJmol 1 would be energetically unfavourable. These results showed that CC1 has more freedom in the slipping motion than CC9 as there are positions o↵ the central h111i axis which are within the 30 kJmol 1 limit for CC1, whereas CC9 only has on axis low energy configurations. This helps explain why CC9 is found at a lower value of ang than CC1.
S-18 The left hand side is looking down the h111i axis and the right hand side is a cross section containing the lowest energy position looking down the z axis. Points are shown when the energy of the cages slipping at each position is  30 kJmol 1 . The colour of the plot is how close to the minimum the energy is, where the closer to the minimum energy it is, the darker green it is, and the closer to the limit it is, the more yellow it is. S-21