Systematic exploration of accessible topologies of cage molecules via minimalistic models

Cages are macrocyclic structures with an intrinsic internal cavity that support applications in separations, sensing and catalysis. These materials can be synthesised via self-assembly of organic or metal–organic building blocks. Their bottom-up synthesis and the diversity in building block chemistry allows for fine-tuning of their shape and properties towards a target property. However, it is not straightforward to predict the outcome of self-assembly, and, thus, the structures that are practically accessible during synthesis. Indeed, such a prediction becomes more difficult as problems related to the flexibility of the building blocks or increased combinatorics lead to a higher level of complexity and increased computational costs. Molecular models, and their coarse-graining into simplified representations, may be very useful to this end. Here, we develop a minimalistic toy model of cage-like molecules to explore the stable space of different cage topologies based on a few fundamental geometric building block parameters. Our results capture, despite the simplifications of the model, known geometrical design rules in synthetic cage molecules and uncover the role of building block coordination number and flexibility on the stability of cage topologies. This leads to a large-scale and systematic exploration of design principles, generating data that we expect could be analysed through expandable approaches towards the rational design of self-assembled porous architectures.

Table S1: Definition of topologies studied in this work and stoichiometry (in terms of the number of constituent building blocks).* suggests a modification on the named stk class was performed or that the topology was defined from scratch for this paper; these definitions are in the code provided with this paper.topology stk S2    The role of the connectivity parameter is to handle the appropriate definition of angular terms in the four-connected nodes, where target angles between neighbouring beads in the pyramid are 0, but the target angle between beads on the opposing side of the central atom, opposite, is defined as . [S1] This definition removes the possibility of rearrangements of the bonding pattern on these building blocks.
In this work, we implement an artificial torsion (technically defined over five atoms) that mimics a form of rigidity relevant to cage molecules; all other torsions are ignored.The torsion of interest is between  beads (where the connectivity is , i.e., the  bead is ignored), which defines the alignment of the binding atoms of the ditopic linkers.This torsion is set to 0 ∘ (using a phase offset, 0 = 180 ∘ ), or it is off, allowing free rotation.

S3. Optimisation sequence
The optimisation sequence is as follows 1. Starting from geometry-optimised building block structures, we construct the cage with stk.
2. A constrained geometry optimisation is performed for 20 steps, where all bond and angle potential terms are softened by a factor of 10, all torsions are switched off, and constraints are applied to all atoms not part of bonds formed by stk.
3. A geometry optimisation is performed with the full force field applied and no constraints.
4. A series of model conformers are generated, where all instances of a single bead type are displaced away from the model's centroid by 1 Å, 2 Å, 3 Å or 4 Å.Each generated conformer is then geometry optimised.This is repeated for each distinct bead type.
5. Lowest energy structures of models with adjacent input parameters (e.g., the previous target ditopic internal angle) are collated and geometry optimised with the new force field parameters.
6.A MD simulation is performed starting from the lowest energy conformer accessed so far, where the bond and angle terms are softened by a factor of 10 and all torsions are switched off, allowing for conformer-space exploration.The simulation is performed in the NVT ensemble with Langevin dynamics for 20000 steps (collecting a conformer every 500 steps, 40 in total) with a friction coefficient of 1 ps −1 and a time step of 0.5 fs at 300 K.All bead masses were set to 10 a m u , which are small for the average building block used in realistic cage systems.Every extracted conformer is then geometry optimised with the full force field.
7. The lowest energy conformer accessed in this process is saved as the final conformer of the model.
The sequence above was developed through trial and error to maximise how consistently the lowest energy conformer was found over our cage space.We believe that the ad-hoc force field applied in this work has led to a difficult-to-traverse potential energy surface, which we have overcome through multiple approaches (e.g., applying bead translations, the soft MD step).A more robust global optimisation sequence is our goal for future work.
We show in Fig. S2 that the energies extracted from two distinct runs of this workflow for three topologies (Tri

S4. Geometry validation and comparing model input to output
In this section, we compare the input and output geometrical properties of our models.The differences between observed and target bond lengths are much smaller than those for angles.Fig. S7 shows that the angles in the tetratopic building blocks tend to be larger than their targets, while the angles in the tritopic building blocks show a mixture of larger and smaller than their targets.Additionally, Fig. S3, Fig. S4 and Fig. S7 show that larger topologies will allow for smaller deviations in the cage angles, where the strain can spread throughout the structure.Fig. S9 clearly shows the effect of the torsion restriction, where the restriction leads to higher energy structures in most cases (Fig. S10).Although there is still a preference for torsions near 180 ∘ when the torsion restriction is off.Fig. S11 shows the observed versus target angles for the  angle in the ditopic ligands.This angle corresponds to one side of the ditopic ligand, which should be symmetrical if the structure is unstrained; the target bite-angle of the ditopic ligand is 2(  − 90), which is not well-defined if the  torsion is far from 0 ∘ .This data shows that the width of the deviations decreases without the torsion restriction, implying improved structural relaxation.

61
In this section, we go through potential limitations to our approach and how those relate to application to real systems.Firstly, 62 we acknowledge that structures in our model may not have any experimental translation, e.g., the inside-out structures, like Ultimately, this model's simplicity results in a likelihood of producing nonsensical structures when there is a lot of strain in the system.This could also result in false positives (regarding stability).We find that there is one case where the torsion-restricted case is lower energy than the unrestricted case, which goes against our expectations because the models can alleviate strain easier without restriction.Looking further into this, Fig. S12 shows the above example and other selected examples from test runs of the generation algorithm (for comparison) where the unrestricted case is higher energy than the restricted case.We propose that the issue is mostly that the geometry optimisation procedure has failed to find the lowest energy minimum in the unrestricted case.This is evident for the Tet 12 Di 24 case, where protrusions are shown (arrow in Fig. S12).While we have aimed to be as general as possible, we must consider the bias in our model that stems from what is present in the literature.For example, the input, output and design of stk depends on what exists in the literature.Hence, the initial structures of topology graphs will match the geometries of building blocks similar to what has been studied before, starting lower in energy than those that deviate from the known cases.This could be overcome by using random coordinates or a distance-geometry method when applying stk topology graphs, which we aimed to recreate through the soft-potential MD step.
Because of the low-cost nature of minimalistic models, it is possible to generate many, possibly spurious, structures and explore the configuration space around them.These models could also improve the initial guesses within stk, by modifying the ideal guesses toward one matching the building block geometries.Ultimately, this model allows us to explore beyond our relatively narrow explored chemical space.
One important limitation in this work is the resolution between changes in bond and angle parameters.This raises the question of whether the resolution we have implemented here (e.g., changing the  angle in increments of 5 ∘ ) is sufficient to find a continuous phase space, or if there will be stability cliffs.Further to this, if sharp changes in stability are present in the phase space, are they topological effects that can be taken advantage of in design?These are questions we aim to explore with more focused studies.

S6. Ditopic internal angle relationships.
This section details the ditopic internal angle relationships for each topology.The structures on the right in the following figures were selected to highlight interesting relationships.The circle markers in the left-hand figure map to the structures on the right.
In summary, we see: • Without torsions, stable structures are found with a wide range of internal angles due to the ability to twist the ditopic ligand and form, for example, helical structures.
• The angles in the tritopic/tetratopic building block can greatly modify the shape of the pore in stable structures.
• Many stable structures without torsion restriction are visually collapsed or have internal facing building blocks.
• Stable structures without torsion restrictions can be visually more dynamic (Fig. S17).
• For some topologies, the effect of torsion restriction on the geometry of the stable structure is less significant (Fig. S22).

S9. Distributions of cage energetics
Here we collate information about the energy distributions in our cage data set.Firstly, Fig. S39 shows the structures of the lowest (or one of the lowest) and highest  b structures for each topology.This figure highlights that there is not a dramatic change in structure for most topologies, but simply the presence of extreme strain between beads.This is further supported by the dominant nature of the angular terms in the component energy distributions (Fig. S41).Fig. S40 shows the distribution of energies for each topology, and shows that strain can spread out in larger structures, where the distribution of energies tends to be lower.

S10. Self-sorting behaviour and accessible topology maps
Self-sorting, as we define in this work, is the thermodynamic selection of a single species over all others that can possibly form during the self-assembly pathway of a set of building blocks.This corresponds to the experimental goal of performing self-assembly and realising the clean formation of a single species after characterisation.Here, we look, very approximately, at the topological effects on self-sorting behaviour.We approximate three outcomes: i) "selected" with only one stable topology, ii) "mixed" with more than one stable topology, and iii) "unstable" with no stable topologies for a given point in the accessible topology maps.
Firstly, Fig. S42 shows that the percentage of selected topologies as a function of threshold energy depends on the tritopic or tetratopic pyramid angle.There seems to be less selectivity with intermediate pyramid angles, than the extremes, which may be an outcome of the types of topologies studied.This can be related to the angle maps (Section S7), where for some topologies, smaller pyramid angles allow for larger ditopic internal angles to fit in the topology before the stability "drop" that occurs at larger ditopic angles.It is possible that this result leads to increased competition, hence decreased selectivity, for those angle combinations.The balance between achieving stability and self-sorting is not trivial.Hence, predictive tools can be useful.Interestingly, the square planar (90 ∘ , tetratopic) systems show much higher percentages of selectivity than the trigonal planar (120 ∘ , tritopic) systems.Fig. S43 shows that the tetratopic-containing systems consistently have less "mixed" and more "unstable" isomers than the tritopic-containing isomers.Fig. S44 and Fig. S45 show the accessible topology maps with and without torsion restrictions for tritopic and tetratopic containing structures, respectively.We see a general area of instability at large ditopic angles and large pyramid angles (top right-hand corner of plots), matching the results in Section S7 and a shift to larger topologies from left-to-right.However, this shift depends on the pyramid angles and topology/cage flexibility.For example, there are also unstable points in the bottom left-hand corner when torsion restrictions are on, and in the tetratopic-containing maps (because the tritopic-containing maps include a very flexible topology, Tri 2 4 Di 6 , which is stable in most of this region).We also show the same data but only highlight the smallest stable topology at each point.It is possible that experimental conditions could be chosen to target or avoid this outcome.
the trigonal planar shape with three vertices.
Fig. S48, Fig. S49, Fig. S50 and Fig. S51 show the distribution of shape measures over all cages for each topology.Fig. S59 shows the pore sizes as a function cage size, building block properties and cage structural properties after optimisation with and without restricted torsions.The structures shown are per building block pair, where the smallest stable structure is shown if it is not expected to form a mixture.Fig. S59(a) and (d) show the relationship between pore size and cage stoichiometry (topology) and size, respectively.Interestingly, some topologies show a far wider range of pore sizes in their stable structures.For example, with ten building blocks, the Tri 4 Di 6 and Tri 2 4 Di 6 topologies show a wide range of pore sizes (similarly for the Tri 6 Di 9 topology).This suggests the chance for more tunability in properties but also may suggest more difficulties in their synthesis due to configurational flexibility.Fig. S59(b) shows that the connectivity of the building blocks has a small impact on the distribution of pore sizes, where having four-connected nodes increases the median pore size.However, it is not clear if this is a result of accessible configurations or the topologies available (tetratopic-containing topologies include the Tet 12 Di 24 topology (36 building blocks) while tritopic-containing topologies stop at Tri 8 Di 12 (20 building blocks)).Fig. S59(e) shows that cages with restricted torsions (black) deviate from the correlation between cage size (g) and pore size, which is observed to be stronger for the unrestricted case (pink).Most interestingly is that Fig. S59(c) and (f) show that the angle in the building block with the largest coordination number has a less strong relationship with pore size than the target ditopic internal angle, where target internal angles closer to 180 ∘ lead to larger pore sizes.Indeed there seems to be a switch between two regimes around 140 ∘ in (f).This follows the finding that larger internal angles tend to favour larger topologies.

Fig. S2 :
Fig. S2: Parity of  b values for a selection of cage topologies from two distinct runs (run on the same computer).Only cages with  b > 0.01 kJ mol −1 for both runs are shown, and only cages that have a change in  b > 0.01 kJ mol −1 between the two runs are shown in colour.Both axes are on a log scale.The horizontal and vertical grey lines show the energy threshold for stability of  b = 0.3 kJ mol −1 .

Fig. S5 :
Fig. S5: Distributions of the observed  bond lengths for all topologies, except Tet 6 Tri 8 , (left) with and (right) without torsion restrictions.The target value is 1 Å.

Fig. S10 :Fig. S11 :
Fig. S10: Distributions of the differences in  b with and without the restricted torsion for all topologies, except Tet 6 Tri 8 .

Fig. S12 :
Fig. S12: Pairs of (left) torsion-restricted and (right) unrestricted cage models for cases where the unrestricted case has higher energy. b is given above each cage structure, blue if stable, pink if unstable.Topology, target bite angle and pyramid angles are given under each pair.The first pair (Tri 6 Di 9 :170:50) are from this production run, while the other three are from unshown test runs.The arrow highlights protrusion in a high-energy state.Green are the  beads, orange are the  beads, cyan are the  beads, black are the  beads, and grey are the  beads.

Fig. S38 :
Fig. S38: (a) Energy of structures over tritopic angle range for each tetratopic angle.(b) Low energy structure examples for each row.Orange are the  beads, cyan are the  beads, and black are the  beads.
Fig. S39: Structures of the (top) minimum and (bottom) maximum  b cage for each topology (with torsion restrictions on, if applicable). b is shown above each structure in kJ mol −1 .Green are the  beads, orange are the  beads, cyan are the  beads, black are the  beads, and grey are the  beads.
Fig. S41:From top-to-bottom, distribution of energy components of bond, angle, torsion and nonbonded terms for all topologies (excluding Tet 6 Tri 8 ), with and without torsion restriction.Note the changes in the y-axis scales.

Fig. S42 :
Fig. S42: The percentage of selected topologies for all building block pairs (with torsion restrictions on) separated by the angle in the tritopic or tetratopic building block as a function of the threshold for stability.This analysis is only applied to Trip m Di n and Tetp m Di n topologies.

Fig. S43 :
Fig. S43:The percentage of mixed (dashed lines) and unstable (solid lines) topologies for all building block pairs with restrictions as a function of the threshold for stability.This analysis is only applied to Trip m Di n and Tetp m Di n topologies.

Fig. S52 ,
Fig. S52, Fig. S53, Fig. S54 and Fig. S55 show the map of building block angles to stability and ideal shape difference.

Fig. S58 :
Fig. S58:Distributions of properties for all topologies (excluding Tet 6 Tri 8 ) with and without torsion restrictions.In order from top-to-bottom: g/, pore size/g, and pore size/.

Table S2 :
The connectivity and parameter ranges of beads used in this work.

4 Di 6 , Tri 2 4 Di 6 and Tri 8 Di 12
) are mostly equivalent.However, there are some examples of changes in energy larger than 0.01 kJ mol −1 (highlighted).Most of these examples are for the larger Tri 8 Di 12 topology, suggesting that this sequence is less reliable as more degrees of freedom are present.There are few examples where cage stability (determined by a threshold of  b = 0.3 kJ mol −1 )

Fig. S3: Distributions of the
difference between observed and target  angles for all topologies, except Tet 6 Tri 8 , (left) with and (right) without torsion restrictions.Fig. S4: Distributions of the observed  angles for all topologies, except Tet 6 Tri 8 , (left) with and (right) without torsion restrictions.The target value is 180 ∘ .
Distributions of the observed  torsion angles for all topologies, except Tet 6 Tri 8 , (left) with and (right) without torsion restrictions.The target, when applied, is 0 ∘ .
Fig. S6: Distributions of the difference between observed and target  bond lengths in the ditopic building blocks for all topologies, except Tet 6 Tri 8 , (left) with and (right) without torsion restrictions.nbnor mbm angle: not restrictedFig. S7:Distributions of the difference between observed and target  (tritopic) or  (tetratopic) angles for all topologies, except Tet 6 Tri 8 , (left) with and (right) without torsion restrictions.Fig. S8: Distributions of the difference between observed and target  (tritopic) or  (tetratopic) bond lengths for all topologies, except Tet 6 Tri 8 , (left) with and (right) without torsion restrictions.
Distribution of  b for all topologies (excluding Tet 6 Tri 8 ), with and without torsion restriction.

Table S3 :
Definition of topologies studied in this work and their associated reference, ideal shapes.Percentage of cages with  < 2 of all cages for topologies where shape was measured, with and without restricted torsion.Plots show either the tritopic or tetratopic building block shape measures (tetratopic for Tet 6 Tri 8 topology).
Fig. S46: Fig. S57: Distributions of properties for all topologies (excluding Tet 6 Tri 8 ) with and without torsion restrictions.In order from top-to-bottom: pore size, maximum diameter  and g.