Ultrahigh binding aﬃnity of a hydrocarbon guest inside cucurbit[7]uril enhanced by strong host–guest charge matching

Cucurbit[7]uril (CB[7]) is an artificial macrocyclic molecule that can form exceptionally strong host–guest complexes with binding constants higher than that of the biotin–avidin complex. Despite notable experimental efforts, there do not exist large-scale computational investigations on finding strongly binding guests of CB[7]. Herein, we develop a computational approach based on large-scale molecular modelling to predict strongly binding hydrocarbon motifs. Our results indicate that an expanded cubane (PubChem ID 101402794) will be the most strongly binding hydrocarbon guest of CB[7] among the hundreds of thousands of hydrocarbons in the PubChem database, achieving a binding affinity significantly stronger than those reported in preceding experimental studies. Our findings highlight the important role of charge complementarity in the form of quadrupole electrostatic interactions in enabling the ultrahigh binding affinity of nonpolar guest molecules with CB[7], in addition to other known contributions such as van der Waals interactions and high-energy water release.


Introduction
Cucurbit[n]urils (CB[n], n = 5-8, 10, and 14) are a family of artificial organic macrocycles that have gained increasing attention in recent years, owing to their unique aqueous host-guest chemistry. CB[n] have a rigid molecular construction consisting of a hydrophobic cavity and two symmetric, electronrich carbonyl portals. They can form host-guest complexes with a range of small guest molecules, showing potential applications in drug delivery, 1 sensing, 2 and responsive nanomaterials, 3 as well as catalysis in solution 4 and in the gas phase. 5 Notably, CBs are known to form highly stable noncovalent complexes compared to other macrocycles such as cyclodextrins, calixarenes and pillarenes. 6 In particular, CB [7] and a congressane derivative can form complexes whose binding constant (K a = 15.3 Â 10 15 M À1 ) exceeds that of the well-known biotinavidin complex, one of the strongest noncovalent bonds found in nature. 7 A central quantity that lies at the heart of the chemical properties of CB[n] is their binding free energy DG tot bind with a specific guest candidate. The quantity DG tot bind indicates whether a guest@CB[n] complex is stable as well as its relative stability compared to the complexes formed with other competing molecules. In this context, a priori knowledge of DG tot bind across a range of host-guest complexes would provide useful insights for designing and engineering CB-based supramolecular systems and stimuli-responsive materials. They include supramolecular polymer networks, 8 noncovalent ligand immobilisation 9 and atypical adhesives, 10 all of which typically require strongly binding host-guest moieties for building robust molecular architectures.
Fast and accurate evaluation of DG tot bind is a challenge often encountered in virtual screening in the field of drug discovery and design. The binding problem can be separated into two parts. First, the optimal conformation of the guest upon complexation with the host needs to be determined, and this step is referred to as docking. Then, the strength of the intermolecular interaction needs to be quantified in an operation called scoring. In practice, some docking algorithms modify the guest's conformation and evaluate on-the-fly its scoring function in iterative steps until a best fit is found.
A method combining high speed and accuracy together with little need for oversight is however less common, and a trade-off between these desired attributes is often required. Computational methods for estimating DG tot bind include attach-pull-release, thermodynamical integration, metadynamics and computation of configurational integrals, all of which have been reviewed elsewhere. 11 Additional techniques have been used to compute host-guest free binding energies in the SAMPL challenges and in particular for CB [7] in the SAMPL4 12 challenge. Nevertheless, there do not exist large-scale computational investigations on finding strongly binding guests of CBs in the literature.
Herein, we use a classical model based on force fields to screen a large quantity of molecules to predict the most strongly binding hydrocarbon frameworks for CB [7]. Unrecorded strongly binding frameworks are predicted in addition to several derivatives of known high affinity guests such as adamantane, congressane and bicyclo [2.2.2]octane. Furthermore, ab initio calculations on the top ranked host-guest complexes predicted from the forcefield methods reveal an expanded cubane to be the most strongly binding hydrocarbon guest for CB [7] with a binding affinity significantly outranking those of all known neutral hydrocarbon frameworks. 6 Energy decomposition analysis reveals that the exceptionally strong binding can be attributed to host-guest charge complementarity in the form of electrostatic quadrupole interactions, which provide an additional boost to the binding affinity on top of the classical van der Waals forces. Notably, the strength of the quadrupole interaction (À42.82 kcal mol À1 ) within the champion complex reaches that of the dispersion component (À53.05 kcal mol À1 ), which is very rare among neutral nonpolar host-guest systems and is unprecedented for CB complexes, to the best of our knowledge.

Method
The overall procedure followed throughout this article is illustrated in Fig. 1. From a skimmed pool of hydrocarbon molecules in Pubchem (Fig. 1A), 200 conformations are generated for each candidate and minimised using MMFF94. 13 These conformations are then used to assess the flexibility of the molecules. Only rigid ones are passed to the next stage of the pipeline and other candidates are discarded (Fig. 1B). These molecules are then docked inside CB [7] using AutoDock Vina 1.1.2 (Fig. 1C). For each guest/guest@CB [7] couple, the vacuum binding enthalpy DU tot VdW + DU tot Coul + DU tot val , the configurational entropy change upon binding ÀTDS tot cfg as well as the polar DW tot elec and nonpolar DW tot np solvent contributions to binding are estimated using the Generalised Amber Force Field (GAFF), the Rigid Rotor Harmonic Oscillator (RRHO) approximation and the generalised born implicit solvent and surface area (GBSA) method, respectively (Fig. 1D). Subsequently, the molecules are ranked according to DG tot bind as given by eqn (2). Top ranked candidates are then manually inspected and unphysical molecules are discarded. Finally, the binding affinity of the top 50 molecules as ranked using the force-field based method is evaluated using a density functional theory (DFT) method (Fig. 1E). Individual contributions to binding energy in a vacuum of 10 of the most promising molecules are further characterised using symmetry adapted perturbation theory (SAPT).

Computational details
All screening operations are performed using the force field MMFF94 as implemented in the RDKit. 14 The conformers are converged using at most 100 000 steps and 10 À9 kcal mol À1 as the tolerance criterion on the energy. Molecules that could not be generated using the algorithm are put aside and are not considered further. Molecular volumes are computed using the RDKit.
Docking is performed using AutoDock Vina 1.1.2 using the maximum level of exhaustiveness (8). Up to 9 docked poses are generated for each guest. CB [7] is centred at (0, 0, 0) and the search space is centred at (0, 0, 0) with a volume of 24 Â 24 Â 24 Å 3 . All other parameters are left to default. Ligand.pdbqt files are generated using the AutoDockTools script prepare_ligand4.py using the Gasteiger charges produced by the same script. The CB [7].pdbqt file is produced using the AutoDockTools GUI.
The contributions to the free energy of binding DG tot bind À Á from van der Waals energy (DU VdW ), Coulomb energy (DU Coul ) and valence energy (DU val ) are computed using the General Amber Force Field (GAFF) 15 within AMBER 16. The GAFF is expected to perform well in describing noncovalently bound complexes. 16 The GAFF parameters are generated using the antechamber from AmberTools16 using the Gasteiger charges. The geometries are subsequently minimised to 10 À3 kcal mol À1 using the GBSA implicit solvent model and the conjugated gradient algorithm and to 10 À8 kcal mol À1 using the Newton-Raphson algorithm. The generalised Born part is evaluated using the Hawkins, Cramer, and Truhlar's form of pairwise generalised Born model for solvation as implemented in Amber16 17 with solute and solvent dielectric constants of, respectively, 1 and 78.5. A nonpolar contribution arising from the hydrophobic effect is added as g*SASA, where g = 0.005 kcal mol À1 Å À2 is the surface tension of water and SASA is the solvent accessible surface area as approximated using linear combinations of pairwise overlaps 18 as implemented in NAB.
The final energy is used in the ranking and its associated geometry is used to estimate the molecular entropy using the Fig. 1 Outline of the workflow followed in this article. First, from a list of potential hydrocarbons prefiltered based on size and the absence of reactive functional groups (A), guest candidates are labelled as rigid or flexible, based on their conformational distribution (B). Flexible guests are discarded while rigid guests are docked into CB [7] (C) and their structures are minimised using a force field. The free energy of binding is then approximated taking into account enthalpic, entropic and solvation effects. Molecules are ranked against the other candidates based on their free energy of binding with CB [7] (D). Finally, a selection of the top ranked guests is studied using DFT to obtain a high-quality estimate of the binding energy (E). A specific version of an expanded cubane is expected to bind most strongly to CB [7] (F).
RRHO model within Amber16 via the NAB interface. 19 The configurational entropy values at 298.15 K are taken as supplied by AmberTools16 and include the translational, rotational and vibrational components of molecular entropy. It is known that a variable number of degrees of freedom, in general, less than 6 including 3 degrees of freedom in translation and 3 in rotation, are lost upon binding. 20 Due to the difficulty of systematically estimating the number of degrees of freedom lost upon binding, Amber16 is trusted to select the modes relevant to the estimation of the molecular entropy. The solvent entropy, on the other hand, is mostly contained in the non-polar surface area term of the GBSA evaluation. Indeed, the hydrophobic effect that is evaluated with the non-polar term is mostly entropic in nature.
The endo and exo complexes are distinguished based on two parameters, viz. the distance between the centroid of CB [7] and the centroid of the guest molecule, and the number of atoms of the guest molecule lying within the CB [7] cavity. Hydrogen atoms are not included in the computation of the centroid coordinates. The complex is considered endo if the distance between centroids is less than 4 Å and the number of guest atoms within the CB [7] cavity is greater than or equal to 6.
The distance between centroids is computed using the built-in feature within the RDKit. A heavy atom of the guest is considered to be within the CB [7] cavity if it lies within the convex hull of the heavy atoms of CB [7]. Here, the Delaunay triangulation of CB [7]'s atomic positions is first computed using the SciPy library in Python that uses the qhull algorithm. 21 Then, the find_simplex method is used to locate the simplices containing each of the guest's atom. If the procedure is unsuccessful for a given atom, it is assumed to lie outside of CB [7]'s cavity.
Further geometry optimisations and binding energy evaluation of the top 50 ranked host-guest complexes, from the above force-field model, are performed at the oB97XD/6-31G* level of theory in a vacuum using the Gaussian16 software package 22 and cross-validated using the Spartan 16/18 parallel suite. The dispersion-corrected DFT functional oB97XD 23 is chosen to accurately estimate the van der Waals interactions, which are expected to contribute greatly to the stability of these complexes.
Decomposition of interaction energy into its constituent terms using symmetry-adapted perturbation theory (SAPT) analysis 24 is performed using PSI4 software 25 at the SAPT0 level using the jun-cc-pVDZ basis set. 26 SAPT is a perturbative approach that directly computes the interaction energy as a perturbation to the Hamiltonian of the individual monomers and provides a decomposition of the interaction energy into physically meaningful components of electrostatic, exchange, induction, and dispersion. SAPT0 is the simplest and most inexpensive SAPT method that essentially treats the monomers at the Hartree-Fock level and appends explicit dispersion terms obtained from second-order perturbation theory to the electrostatic, exchange, and induction terms from HF dimer treatment.
To obtain the binding enthalpy of a guest in water using an explicit solvent model, the guest, CB [7] and their complex are solvated in 1500 molecules of water using the TIP3P model and a periodic box. The system is first equilibrated to 298.15 K, then the box edges are adapted to equilibrate the pressure to 1 bar. The energy of each subsystem is taken as the time average of the potential energy of the system over a 500 ps simulation. The GAFF is used for the force parameters of CB [7] and the guest. The binding energy is estimated as the difference between the solvated complex and the solvent molecules alone with solvated CB [7] and the solvated guest.

Results and discussion
Among the homologues of CB[n], CB [7] is chosen in this study because of its relatively high water solubility and larger inner cavity that allows the encapsulation of a wide variety of guests. Meanwhile, CB [7] is known to form exceptionally strong hostguest complexes owing to a combination of classical and nonclassical hydrophobic effects. 27 It is interesting, and perhaps useful, to explore whether there exist other supramolecular forces that can further enhance the noncovalent interactions within CB [7] complexes, potentially leading to record-breaking binding constants.
On the other hand, hydrocarbons are chosen as a starting guest pool because of their chemical stability, ease of functionalisation and structural diversity. Their diverse molecular construction can maximise the chance of catching, using the proposed large-scale computational approach, a series of ideal candidates with best possible shape complementarity to the CB [7] cavity. This hypothesis is underpinned by the presence of a few experimentally proven high-affinity guests, such as adamantane derivatives, which are also based on hydrocarbon frameworks. Furthermore, the nonpolar nature of hydrocarbons can also reveal fundamental insights into van der Waals interactions in a CB [7]-guest complex, suggesting new design rules for better guests with exciting new applications.

Selection of candidates
The initial pool of candidate hydrocarbon guests is obtained from PubChem and contained 238 605 hydrocarbon molecules. A filter based on the number of heavy (i.e. carbon) atoms is applied and candidates containing 19 heavy atoms or fewer are retained, yielding a trimmed pool of 132 296 candidates. This prescreening step is justified by the fact that the cavity of CB [7] has total a volume of 242 Å 3 , which is smaller than the molecular volume of most molecules containing 19 heavy atoms (see the volume distribution shown in Fig. S1, ESI †). Although guests with a volume larger than the CB [7] cavity could still form a host-guest complex with parts sticking out of the cavity, the encapsulated parts can be well-represented by smaller molecular motifs already contained in the trimmed pool. Therefore, we confidently exclude guests that are too large to fit entirely inside the CB [7] cavity. Reactive hydrocarbons containing allenes (CQCQC), three membered rings and charged groups were removed, leaving 110 319 guest candidates for subsequent investigation.
We take advantage of the conformational diversity of the conformers generated by the RDKit and generate 200 conformers for each candidate from its SMILES string. The library of conformers generated, though arbitrary, is expected to sample exhaustively the conformational space of a molecule with fewer than 13 rotatable bonds. 28 All conformers are then converged using MMFF94. If a given 3D fingerprint computed for each conformer yields sufficiently close values, the molecule is labelled as rigid. In the present case, a molecule is labelled as rigid if the torsion fingerprint deviation 29 matrix of its 200 conformers has a single cluster within the Butina clustering algorithm 30 provided that a maximum deviation threshold low enough is used. Both operations are carried out using the RDKit implementation with a maximum deviation of 0.01 for the Butina clustering algorithm. It is assumed that if all the 200 conformers generated correspond to the same structure, then the molecule is labelled rigid.
We choose to restrict the search of strongly binding candidates to rigid molecules for two reasons. Firstly, many experimentally observed guests that bind tightly to CB [7] contain rigid hydrocarbon skeletons, such as bicyclo[2.2.2]octane, adamantane or congressane. 6 Rigid molecules are expected to possess higher normal mode frequencies and hence are less likely to endure a large entropic penalty upon binding. Secondly, confining the investigation to rigid molecules eliminates the need to consider multiple contributions of different conformers to the change in binding entropy and enthalpy, thereby greatly simplifying the estimation of the free energy of binding that can be computationally demanding even for small systems. 31 Determining a priori the rigidity of a molecule is not trivial. For example, an sp 3 carbon within a linear alkane can yield a rotatable bond and contribute to the molecular flexibility. On the other hand, an sp 3 carbon within a cyclic structure does not indicate the same degree of molecular flexibility. As graph based techniques could prove complex 32 with no guarantee of robustness, we choose a statistical method to estimate the rigidity of the small molecules. The method described above is able to distinguish the high flexibility of linear hexane from the limited flexibility of cyclohexane and the high rigidity of benzene. Indeed, it is expected that chair, seesaw and boat conformations will be encountered while enumerating conformers for cyclohexane, indicating its relative molecular flexibility. Cyclohexane is therefore discarded. On the other hand, only one conformer of benzene will be encountered resulting in its labelling as rigid. Moreover, flexible unsaturated molecules are appropriately labelled as flexible while polycyclic saturated structures such as cubane are labelled as rigid. After removing non-rigid molecules, a total of 8999 guest candidates remain and are subject to further study.

Binding affinity evaluation
Docking is initially performed using AutoDock Vina to obtain a range of starting configurations of each host-guest complex required to estimate the binding affinities. Different guest orientations within the cavity can strongly modulate the computed affinities. Since it is not known a priori which of the docked poses would yield the most negative force-field based DG tot bind , the free energy of CB [7] complexes are minimised and evaluated for all binding poses. No conformation analysis is required for the free guest molecules owing to the absence of a rotatable bond.
It is noted that, however, the binding affinity predicted by AutoDock Vina is somewhat unreliable for our host-guest system, especially for strongly binding guests. When benchmarking against experimental data (Fig. S2, ESI †), AutoDock Vina tends to underestimate the binding affinity over the entire range, with significant deviation in the slope of the linear fit and in absolute binding affinity for strongly binding guests. For instance, neither bicyclo[2.
with experimental binding affinities in water of respectively 13.44 kcal mol À1 , 14.23 kcal mol À1 and 17.34 kcal mol À1 , are predicted by AutoDock Vina to have a binding affinity exceeding 10 kcal mol À1 . The inability of the software to accurately predict the binding affinities for known strongly binding guests motivates us to employ a more refined force-field based approach for assessing the DG tot bind of all the docked poses generated by AutoDock Vina.
The free energy of binding DG tot bind can be decomposed generally in eqn (1) 33 and further broken down as shown in eqn (2): 34 where DU VdW corresponds to the van der Waals energy, DU Coul corresponds to the Coulomb energy, DU val corresponds to the valence energy (including bond stretches, torsions and intrinsic dihedral energy), DW np represents the non-polar solvation term, DW elec represents the electrostatic solvation term and ÀTDS cfg corresponds to the configurational entropy term.
The accuracy of the model has been evaluated by benchmarking a set of experimental binding constants collected from the literature 6 against their binding affinity predicted by using the present technique, as shown in Fig. S2 (ESI †). The benchmarking plot shows a Pearson correlation coefficient R 2 of 0.49 and a slope of 1.2 where a coefficient R 2 of 1 indicates an ideal linear correlation with the experimental data. These parameters are considered reasonable for the given level of theory. 34 Nevertheless, it should be noted that the binding affinity DG tot bind is overestimated. Indeed, the present procedure is known to overestimate the favourable contribution of DU VdW and underestimate the unfavourable contribution of À TDS cfg to the binding affinity DG tot bind , 34 but not expected to invalidate the consistency of the ranking.
The docked poses of the host-guest complexes can be separated into two groups, namely the endo complexes where the guest molecules lie inside the CB [7] cavity and exo complexes where the guest molecules significantly stick out. As mentioned in the Methods section, a complex is defined as ''endo'' if the centroid of the atomic coordinates of the guest is less than 4 Å away from the centroid of the atomic coordinates of CB [7] and if at least 6 of the guest's heavy atoms are located inside the cavity. Failure to meet one of these criteria results in the complex being labelled as ''exo''. It is observed that the guest molecules that exhibit high binding affinities are predominantly endo complexes. This is wellevident from Fig. 2, which plots the distribution of endo and exo complexes against their binding affinities with CB [7]. This finding can be rationalised by the maximisation of the van der Waals interactions of the guests with the CB [7] cavity, which is consistent with experimental and other computational studies. 4 The resultant, force-field based, DG tot bind values of the top 100 guests are plotted in Fig. 3, which shows that DU Coul + DU VdW + DU val , as computed by GAFF, together contribute the most to the binding affinity of the top ranked guests in the present model. The van der Waals contribution DU VdW is in turn the major contributor to the favourable enthalpic part of the binding affinity (Table S2, ESI †). The presence of the solvent is generally detrimental to binding due to the presence of a mostly positive contribution from DW elec . The entropic penalty to binding ÀTDS cfg as estimated here is relatively small compared to the enthalpy term. For the sake of convenient discussion, guests are labelled as Gx where x stands for their positions in the GAFF-based ranking.
The displacement of high-energy water, also known as the nonclassical hydrophobic effect, was previously shown to be the major contributor to the strong binding affinity of CB [7] and CB [n]. 27 This effect is the consequence of water molecules inside the CB[n] cavity having unsatisfied hydrogen bonds that are able to become satisfied once they leave the cavity and enter the bulk of the solvent. Such an effect is unlikely to be wellcaptured using an implicit solvent model such as GBSA, where the organisational constraints imposed on the solvent by the solute are mainly accounted for using the term DW np = g*SASA.
Indeed, DW np 's continuous description is likely inappropriate to account for the discrete nature of the water molecules that yields this effect. Nevertheless, we argue that for guests big enough to fill the CB [7] cavity, the enthalpic contribution to the removal of the high energy water, though big in absolute terms, should be relatively similar. Implicit solvent models have also been used to estimate host-guest binding affinities in the case of CB [7] with appreciable accuracy. 34 The GBSA method is selected here to estimate the solvation contributions DW elec and DW np due to its computational speed. As shown in Fig. 3, the contribution of DW elec and DW np to DG tot bind is rather small and shows little variation among the top guest candidates. This can be rationalised by the fact that all guests studied are uncharged hydrocarbons and hence generally hydrophobic. Therefore, the energy difference is insignificant when shifting the guest in a hydrophobic solvent pocket in the bulk to the hydrophobic cavity of CB [7]. The magnitude of DW elec and DW np is in reasonable agreement with previously reported values. 34 The contribution from the change in configurational entropy ÀTDS cfg in particular needs to be considered. In the case of the binding of guests with cyclodextrin 31 and CB [7], 34 it was shown that the unfavourable binding contribution of configurational entropy is of the same order as the favourable van der Waals contribution. Estimating the change in entropy upon binding is not trivial. For instance, in the SAMPL3 challenge 35 where diverse research groups attempted to predict the DG tot bind of a range of compounds, the techniques that directly attempted to model the entropic contribution (using the RRHO model for example) performed significantly worse than regressionbased approaches that accounted for it implicitly. Entropic contributions can be accounted for accurately using molecular dynamic techniques or end-point approaches similar to RRHO  View Article Online such as HA/MS, 36 at the expense of increased computation time. One known shortcoming of the RRHO approximation is that it is only exact for infinitesimally small deformations.
In the case of flat energy surfaces such as the case of freely rotatable bonds, marked anharmonicity can occur, yielding inaccurate results. 36 Since only rigid molecules are considered here, we expect the guests' potential energy surface to be well-described by a harmonic approximation. The entropic contribution to binding is defined as: ÀTDS tot cfg = ÀTDS complex cfg À (ÀTDS CB [7] cfg ) À (ÀTDS guest cfg ) As shown in Fig. 3, ÀTDS tot cfg (at 298.15 K) is relatively constant and of similar magnitude to that computed in other similar binding studies. 34 Occurrence of the so-called enthalpy-entropy compensation effect 37 was checked by plotting the entropic penalty TDS1 bind against the binding enthalpy DH tot bind , as shown in Fig. S3 (ESI †). It has been suggested that ultrahigh binding affinity in the case of CB [7] was achievable at least partially through the avoidance of a steep entropic penalty upon binding due to the high intrinsic rigidity of the macrocycle. 38 It is interesting to see in Fig. S3 (ESI †) that the highest ranked guests do not exhibit a severe enthalpy-entropy compensation effect, with a regression slope of only 0.686 instead of an expected value of 1 for an ideal manifestation of the effect. The relatively small slope indicates that a large increase in binding enthalpy is less than counterbalanced by the entropic penalty, which helps explain the overall large binding affinities. For the guests ranked between 51 and above (only guests 51 to 200 are displayed), the slope converges towards unity, indicating an increasingly strong enthalpy-entropic compensation effect. It is noteworthy that the enthalpy-entropy compensation effect reported herein arises from the direct computation of DH tot bind and DS tot bind , giving support to the hypothesis of a physical origin of the phenomenon rather than it being a statistical artefact. Indeed, unlike experimental evidence of the effect arising from computation of DG tot bind and DH tot bind where TDS1 bind is obtained a posteriori by subtraction, the present vacuum analysis is free from solvent effects and correlated errors. 39

Ranking
Ten selected guests from the 50 best candidates are shown in Fig. 4 and the references of each of the 100 best candidates are detailed in Table S1 (ESI †). It is noted that several carbon skeletons among the predicted best candidates are already known to bind strongly to CB [7], including guests G1, G9 and G39, which are adamantane 40 derivatives, and guests G30 and G10, which are congressane 7 and bicyclo[2.2.2]octane 34 derivatives, respectively. A guest with a skeleton very similar to G24 has also been reported to bind strongly CB [7]. 41 It is also worth mentioning guest G27, a cubane derivative similar to those recently reported. 42 Moreover, we have identified hydrocarbon frameworks that can potentially bind very strongly to CB [7] but have not been reported in such a context, including guests G38, G5 and G11, as shown in Fig. 4. Upon visual inspection of the docked poses, G38 is expected to bind very strongly to CB [7], due to its highly symmetric molecular structure that is complementary to that of the CB [7] cavity, which is also highly symmetric. Guest G38 43 appears to have only been studied in silico while G5 was added to PubChem as a part of a patent claim along with over 400 other molecules. To the best of our knowledge, the syntheses of G38 and G5 have not been reported in the literature.
In Section 3.2, we observe that the binding enthalpy in a vacuum (DU Coul + DU VdW + DU val ) serves as the major contributor to DG tot bind , while the contributions from solvent effects and configurational entropy remain insignificant for our hostguest system of CB [7] and rigid hydrocarbon guests (Fig. 3). In order to better evaluate the vacuum binding energy, DFT optimisation is performed on the host-guest complexes of the top 50 GAFF-based ranked guests. The resultant DFT binding energies are presented in Table S3 in the ESI, † where guests are re-sorted by decreasing magnitude of DFT binding energy. Aligned with our initial hypothesis, guest G38 exhibits the most negative binding energy (À62.78 kcal mol À1 ) at the oB97XD/ 6-31G* level of theory in a vacuum, which exceeds those of other known strongly binding guests such as adamantane (À41.18 kcal mol À1 ), congressane (À46.79 kcal mol À1 ) and bicyclo[2.2.2]octane (À34.74 kcal mol À1 ). Guests G5, G9 and G39 also exhibit strong binding with CB [7] with interaction energy values of À54.34, À48.41 and À49.47 kcal mol À1 , respectively.
To further understand the nature of the exceptionally strong binding of G38, quantitative decomposition of interaction energies into their constituent terms is performed using SAPT on a representative set of 10 top host-guest complexes involving guests G1, G5, G9, G10, G11, G24, G27, G30, G38 and G39. The individual electrostatic, exchange, induction and dispersion components are plotted in Fig. 5 and summarised in Table S4 in the ESI. † The results reveal that the interaction energy typically has the most significant contributions from the dispersion and exchange repulsion terms with the attractive dispersion term outweighing the repulsive exchange term. Notably, however, the exceptionally high binding affinity of G38 can be attributed to its strong electrostatic component in addition to the dispersion contribution, which is uncommon for nonpolar hydrocarbon motifs. The electrostatic component for G38 is À42.82 kcal mol À1 , which is comparable to the dispersion component of À53.05 kcal mol À1 . Moreover, the relatively small exchange component (+39.49 kcal mol À1 ) indicates a good shape match between CB [7] and G38.
Upon closer study of the DFT model of the G38@CB [7] hostguest complex, it can be seen in Fig. S4 (ESI †) that the positive regions of the electrostatic potential of G38 located near its hydrogen atoms overlap significantly with the negative regions of the electrostatic potential of CB [7] near its carbonyl portals, while the electron-rich region of the CRC triple bonds overlaps well with the electron-poor region of the CB cavity, revealing a high degree of electrostatic charge matching between the host and the guest.
The extent of charge distribution matching can be quantified, to a first approximation, by considering the molecular electrostatic quadrupole moment y zz , which is defined as the second moment of the charge density 44 along the main axis (as defined by component analysis) of a guest molecule or along the axis passing through the centres of both CB [7] portals in the case of CB [7] and its complexes. Despite the possibility of using higher order multipole descriptions, for uncharged, nonpolar molecules (i.e. both monopole and dipole moment = 0), such as hydrocarbons, y zz can serve as a simple and effective measure for assessing the distribution of charge density, as well as for estimating their interaction and matching with each other. As shown in Table 1, G38 has an exceptionally large intrinsic quadrupole moment of 17.69 Buckingham among other top ranked guests, which is followed by G1 (6.66 Buckingham), G11 (8.31 Buckingham), G27 (8.84 Buckingham) and G30 (7. 49 Buckingham).
Nevertheless, a large quadrupole moment of the guest alone is not sufficient to generate significant host-guest electrostatic interactions, while it is also necessary to have quadrupole (or multipole) charge matching between the host and the guest, which can be roughly estimated by the change in y zz of the host upon complexation. In this context, G38 produces the largest change in y zz (= 20.89 Buckingham) upon binding with CB [7], as shown in Table 1. It is noted that, however, the change in y zz does not directly correlate to the degree of charge complementarity. For instance, G1, G11, G27 and G30 also exhibit large intrinsic y zz and can induce significant change in y zz upon complexation despite displaying a negligible electrostatic contribution to binding, as indicated by the SAPT results (Fig. 5). This seemingly paradoxical effect can be attributed to hostguest charge mismatch or the change in y zz being mainly contributed by moieties outside the CB [7] cavity, which might likely be the case for elongated guests, such as G1, G11, G27 and G30. More sophisticated models involving higher order multipole terms would be required to accurately and reliably gauge the mode and the degree of charge complementarity between CB [7] and nonpolar guests.
In order to rule out any overlooked electronic effect during the DFT minimisation, the electronic density r of G38@ CB [7] was analysed, confirming the absence of charge delocalisation or chemical bonds between CB [7] and G38 (see Fig. S5, ESI †).
A molecular dynamics model based on GAFF with explicit solvent molecules is additionally performed, with the aim to validate the binding enthalpy between CB [7] and G38 in water (see Fig. S6, ESI † for details). The binding enthalpy DH tot bind is computed to be À36.24 kcal mol À1 , which is significantly higher than any value computed using the same technique or experimentally measured for strongly binding guests; for instance, 1-adamantanol has a binding enthalpy in water of À24.9 kcal mol À1 . 45 The binding enthalpy in water was also estimated using a DFT implicit solvent model with CPCM/ oB97XD/6-31G*, yielding a value of À46.5 kcal mol À1 , comparable to that obtained from the explicit solvent model. The total interaction energy (purple) is the sum of all contributions. The y-axis represents the contribution to DG tot bind (kcal mol À1 ). Table 1 Quadrupole moment y zz of selected guests within the top 50 GAFF-based ranked candidates. y zz , in Buckingham, is computed along the guest's axis of docking inside CB [7] and at the origin at the centre of the CB [7] cavity. G38 has a large positive y zz . Although G1, G11, G27 and G30 also have a relatively large y zz , their poor matching with CB [7]'s quadrupole moment reduces the overall electrostatic interaction

Conclusion
In summary, we have developed a force-field based computational approach for finding the most strongly binding hydrocarbon guest for CB [7] from the PubChem database. Top ranked guests are then subjected to further investigation using dispersioncorrected DFT models. An expanded cubane (G38) emerges as the champion, achieving an ultrahigh predicted binding energy of À62.78 kcal mol À1 at the oB97XD/6-31G* level of theory in a vacuum, which significantly exceeds those of other known strongly binding guests. Notably, this exceptionally strong hostguest binding interaction can be attributed to the excellent complementarity in molecular shape and electrostatic charge, as revealed by the SAPT energy decomposition and the DFT electrostatic potential. The G38@CB [7] complex has been further analysed using molecular quadrupole moment calculations, as well as a molecular dynamics model with explicit solvent molecules and a DFT implicit solvent model. Interestingly, in the champion complex, the strong quadrupole interactions arising from host-guest charge matching serve as a significant driver to supramolecular complexation. This is in contrast to, for example, the situation of the ferrocene@CB [7] system where the quadrupole interactions are only strong enough to modulate the guest's orientation within the cavity. 46 Furthermore, our finding extends the relevance of quadrupole interactions to neutral host-guest complexes, in addition to their documented significance in anion 47 and cation 48 interactions with aromatic hydrocarbons, the efficiency of small graphene sheets exfoliation 49 and in the stability of the benzenehexafluorobenzene complex. 50 The presented computational approach can be extended for investigating other rigid host-guest systems beyond CB[n] and hydrocarbons. Meanwhile, our findings highlight the unexpected, yet important, role of charge complementarity for neutral nonpolar molecules, which can potentially be used to further strengthen host-guest interactions on top of hydrophobic effects and van der Waals interactions.

Conflicts of interest
There are no conflicts of interest to declare.