Theory and Simulation of DNA-Coated Colloids: a Guide for Rational Design

By exploiting the exquisite selectivity of DNA hybridization, DNA-Coated Colloids (DNACCs) can be made to self-assemble in a wide variety of structures. The beauty of this system stems largely from its exceptional versatility and from the fact that a proper choice of the grafted DNA sequences yields fine control over the colloidal interactions. Theory and simulations have an important role to play in the optimal design of self- assembling DNACCs. At present, the powerful model-based design tools are not widely used, because the theoretical literature is fragmented and the connection between different theories is often not evident. In this Perspective, we aim to discuss the similarities and differences between the different models that have been described in the literature, their underlying assumptions, their strengths and their weaknesses. Using the tools described in the present Review, it should be possible to move towards a more rational design of novel self-assembling structures of DNACCs and, more generally, of systems where ligand-receptors bonds are used to control interactions.


I. INTRODUCTION
Colloidal suspensions are well described by the same statistical mechanical equations as systems of atoms or small molecules. As a consequence, colloids behave in many respects as 'scaled-up' models of atoms, and like atoms, colloidal suspensions can be found in phases that resemble gases, liquids or crystals. There are, however, important differences. One is that colloids move in a solvent, rather than in vacuum. The other is that, through modification of the colloid or the solvent, we can tune the interactions between colloids in a way that is not possible for atoms or small molecules. Due to our ability to control colloidal interactions, colloids exhibit a much richer phase behaviour than atomic systems. Over the past two decades, our ability to design colloidal interactions has undergone a quantum jump with the development of so-called DNA-Coated Colloids (DNACCs), colloidal particles functionalised with short sequences of single-stranded DNA (ssDNA). DNA allows the colloids to bind selectively, either directly or through singlestranded linkers, to other particles or surfaces coated with complementary ssDNA sequences.
DNA-Coated Colloids were developed by the groups of Mirkin and Alivisatos 1,2 in the 1990s. Over the past decades, DNACCs (sometimes referred to as "spherical nucleic acid" 3 ) have been studied intensively because of their promise for the development of new classes of ordered colloidal materials (see e.g. [4][5][6][7]. Furthermore, this system has interesting biomedical applications as ultra-sensitive biomarkers, detectors or efficient drugand gene-delivery carriers [8][9][10] . In all cases, applications a) These authors contributed equally to this work b) Corresponding author; Electronic mail: df246@cam.ac.uk exploit the selectivity of DNA-DNA hybridization between a complementary pair, as well as specific features of the physics of multivalent interactions. Several reviews on DNACCs have appeared in the literature, both from an experimental 11,12 and from a theoretical 13,14 perspective. Very recently, Jones et al have provided a general overview describing the use of DNA-mediated interactions for the programmable selfassembly of nanomaterials 15 , highlighting the connections, and the differences, between DNACCs and DNAbased nanotechnology.
The focus of the present Perspective is different. It is not our aim to provide a comprehensive overview of all simulation and modelling results in the field, but rather, to present a critical assessment of the various theoretical and computational approaches that have been proposed to describe DNACCs. However, this Perspective is not just a Review: to illustrate the points that we make, we also include new results. We aim to highlight the connections between different theoretical models, and explain the differences between various coarse-graining strategies used to describe DNACCs. We believe that the present perspective is timely, as the rational design of novel DNACC-based materials will require a complete integration of theory and experiment. At present, theory and computational work are mostly used to explain experimental observations a posteriori. Yet these experiments are still largely driven by empirical or semi-empirical rules 16 . Even when model predictions preceded experiments, the link between theory and experiments is often less than clear. At present, there is a bewildering variety of DNACC models in the literature, and this very fact makes it difficult for the non-expert to decide what approach to use. The net result is that most models are never used by experimentalists. More worrying is the fact that theoretical models that have been shown to be based on question-able premises continue to be used in the literature, thus carrying the risk that analysis of the experimental data is flawed. For these reasons, the aim of this review is to discuss the various models in terms of their underlying assumptions, their range of applicability and their relative strengths and weaknesses, so that experimentalists can use them in an informed way. The remainder of the paper will be structured as follows. In Sec. II, we describe the general principles common to the various analytical theoretical models, and then proceed to discuss the connections between them. In particular, we aim to establish a 'hierarchy of accuracy' in terms of the effects included. These analytical models contain input parameters that depend on the molecular structure of the system, e.g. the specific DNA-sequence grafted on the colloids, or details of the spacer used for grafting. When a systematic study of the effect of these parameters is required, and in order to relax some simplifying assumptions necessarily included in the analytical models, one should resort to computer experiments, i.e. simulations. We will deal with these "experiments" in Sec. III, where we will focus on the various coarse-graining strategies that have been proposed in the literature. In particular, we discuss which (atomistic, molecular) features are retained in the various models, and which are ignored. In particular, we will focus on the question if the model can yield meaningful predictions of the relevant experiments. In the concluding section (Sec. IV), we will identify possible new directions where the existing theoretical and computational framework could be profitably integrated with experiments to yield a more rational design of DNACC-based systems.

II. THEORY
The common idea behind basically all theoretical models to describe DNA-mediated interactions in DNACCs [17][18][19][20][21][22][23][24][25][26][27] is to account correctly for the interactions induced by the DNA strands that coat the colloids (a schematic of the system is depicted in Fig. 1). On the one hand, as in general for all colloidal systems stabilised by (non-absorbing) grafted polymers, a non-specific repulsion arises from compression of the DNA strands trapped between the colloidal cores. On the other hand, a highly specific inter-particle attraction occurs because of the decrease in free-energy due to bond formation 1,2 (see for reference Fig. 2). That bond mediated interactions are the driving force leading to aggregation is also evident from experimental studies that show no aggregation between DNACCs grafted with non-complementary strands 1,2,25 . In fact, other (non-specific) attractive interactions, in particular those arising from the van-der-Waals forces between the colloidal cores, are typically much weaker than those arising from bond formation on the length-scales at which bond-mediated binding occurs. This is mainly due to the fact that the non-specific steric repulsive forces due to DNA compression prevent the colloids to approach at short distances where dispersion forces are strong 26 . For similar reasons, the length-scale for binding, i.e. the typical distance between colloidal cores between bound particles, is dictated by the length of the spacer used to graft DNA on the surface of the colloids, which is typically in the range of nanometers to tens of nanometers 17 .
In various practical realisations of DNACCs, other polymer chains than DNA are grafted to the colloidal surface.
Using an additional different polymer for steric stabilisation makes it possible to tune repulsion and attraction separately. Both theory and experiments have shown that tuning the relative strength of attraction and repulsion can lead to self-assembly of aggregates of different types [28][29][30] . It is crucial to note that the interaction between two grafted colloid is due to both energy and entropy, the latter accounting for the fact that often not just one but several binding configurations are possible (see Fig. 1). Hence, to compute the interaction strength, we need to evaluate a free energy, rather than an energy, as would be the case for atoms in vacuum. To compute the free energy of interaction of DNACCs, we start from the statistical mechanical relation between the free energy F of a system and its partition function Z. We consider an arbitrary number of colloids, each of which has an arbitrary number of strands grafted on its surface. We choose a reference state where all colloids are at infinite distance from each other, and calculate the free-energy with respect to this state: where Z = Z 0 + Z bound is the partition function of all strands in the system (i.e. a sum over Boltzmann weights), which depends on their grafting points {r}, as well as on the positions of all colloids {R}. Z 0 is the partial sum over all states where no bond forms, Z bound the partial sum over all states with at least one bond, and finally Z ∞ ≡ Z({R = ∞}) is the partition function when all colloids are at infinite separation. With this splitting, we can identify two terms, F att and F rep , i.e. the attractive and repulsive component of the interaction, respectively. Note that when binding between strands on the same particle can occur, this identification cannot be made, although the general approach to calculate F interaction (see Eq. 1) still holds. We stress that no approximation has been made in the derivation of Eq. 1. Since Z = Z 0 + Z bound , F att is always negative (and, hence, attractive). Instead F rep is the ratio between two partition functions where bonds are never allowed, although in one case (Z 0 ) strands can feel the excluded volume due to all colloids, including the one on which they are grafted, whereas in the other (Z ∞ ) strands can only feel the latter. If we assume that only repulsive interactions between the colloidal cores and the strands occur, i.e. we consider non-adsorbing polymers, a good approximation for most type of polymers used in DNACCs, then Z 0 /Z ∞ < 1, and F rep is always positive (i.e. repulsive).
Use of Eq. 1 would allow one to calculate the interaction between DNACCs including all type of many-body interactions arising from bond formations, and for arbitrary position of the colloids. This expression does not assume that DNA-mediated interactions are pairwise additive -and, in fact, often they are not 28,31,32 . For this reason, the general expression is needed to compute the interaction free energy of dense phases of DNACCs.
Let us next consider how F att and F rep can be calculated. One route is both obvious and pointless: 'bruteforce' atomistic simulations. Simulations that would describe the colloids, the DNA and the solvent in atomistic detail are simply much too expensive to be used to predict, say, phase diagrams. Fully atomistic simulations can certainly be used to study certain aspects of DNACCs, but not to study a suspension containing hundreds or thousands of such particles. In what follows, some degree of coarse-graining is therefore essential. Typically, this means that we will use an implicit solvent model and that we will assume that the binding free energy between complementary DNA strands is known. In Figure 2. Schematic representation of the origin of attractive and repulsive forces, here shown for a simple system of strands considered as rigid rods and in the case where only a single bond can be formed. a) Upon binding, the system acquires an additional energy ∆G bond ij , which depends on both the chemical identity of the active strands (white and red circles) as well as on details of the spacer (black sticks) and position of the tethering points (see Eq. 3). In particular, note that whereas in the unbound state (left) the active strands can freely move on a semisphere, upon binding (right) they are forced to stay on a circle: This reduction in configurational space is at the origin of ∆G cnf , (see Eq. 4). b) When all colloids are far apart from each other, the active strands can span a whole semisphere. However, this space is again reduced once colloids start to get closer. This reduction is at the origin of the dominant repulsive forces Frep between DNACCs.
other words: models to describe systems consisting of many DNACCs are always coarse-grained.
To arrive at an analytically tractable coarse-grained model, the following assumptions are commonly made: 1. we usually ignore the spatial extent of the nucleotide sequences that can hybridise to bind to strands together. The hybridising strand is then represented as a point-like attractive site, also called 'active site' or 'sticky end', tethered to the colloidal surface by a long chain -the spacer or linker 17,19-27 (see also Fig. 1). Representing the 'sticky' binding sequence as a point is a reasonable approximation as long as the spacer is much longer than the sticky end, a situation typically encountered in experimental systems 1,2,5,17,21 . Considering extended sticky ends results in slightly different binding affinities between constructs 23,33 . Accounting for the finite spatial extent for sticky end does not qualitatively change the theoretical treatment of DNACCs, and hence we limit our discussion to point-like sticky ends. 2. We ignore steric interactions between different strands and, more generally, any non-selective strand-strand interaction. This approximation is very common, but it is likely to break down for dense DNA 'brushes'.
With the above assumptions, F rep can be expressed as: where the index i runs over all strands in the system, and Ω i and Ω ∞ i are, respectively, the partition function of a single linker when the colloids are in positions {R} and the partition function for the case where all colloids are at infinite separation. For specific geometries and simple polymer models for the linker, such as rigid rods or gaussian chains, there are either exact or very accurate approximate expressions to calculate F rep 21,23,26 . For more complex cases, F rep can be obtained using standard Monte Carlo techniques to sample polymer conformations 24,25 . Rogers and Crocker showed that a simple algorithm that assumes that polymers can be described as ideal, non-interacting chains that cannot intersect the colloids gave essentially quantitative agreement with the the repulsive interaction determined in experiments 25 . This agreement suggests that in this specific system, the neglect of chain-chain and attractive chain-colloid interactions is justified. In Sec. III we discuss non-specific interactions between spacers in more detail.
The calculation of F att is more challenging, since here we need to account for all possible binding configurations with an arbitrary number of bonds (see Fig. 1). In order to do that, it is useful to split the problem of calculating F att into two parts. The first stage involves calculating the bond formation energy for all individual pairs of (i, j) strands, and the second stage accounts for the fact that there may be many ways in which the DNA strands on adjacent colloids may be connected (see for reference Fig. 1). If the spacer does not influence the structure of the sticky end, one can split the bond formation free energy into two terms 21,23,26,28 : The first term on the r.h.s. of Eq. 3 is the free-energy of binding for untethered, complementary DNA strands i and j in solution: exp −β∆G 0 ij = K eq ij ρ • , where K eq ij denotes the equilibrium binding constant, and ρ • the standard concentration of 1 M. The second term represents the configurational entropic cost associated with linking two tethered strands i and j: the number of configurations of a linked ij strand is less than that of the unlinked i and j strands 34 (see Fig. 2 depends on the exact positions of the grafting points, and on the position of all nearby colloids in the system, since these exclude all strand configurations that would intersect them. Refs. 22 and 23 give an explicit expression that can be used to calculate this quantity for arbitrary linkers and colloids positions, which in its most general case reads: where Ω ij is the partition function for two bound linkers i and j, whereas Ω i,j is the partition function for the unbound case. Both terms depend on r i , r j and {R i }, and can be calculated using coarse-grained models for the linkers 23 . When strand-strand interactions can be neglected, an equivalent and possibly more illuminating form of Eq. 4 useful for calculations is: where p ee (d) denotes the fraction of all random configurations of the two chains tethered at two grafting points at separation d, that have overlapping sticky ends, i.e. it is the probability that two non-interacting theres will form a 'bridging' configuration. The terms W ij and W i(j) ) account for the effect of hard-core overlaps with the colloids: they denote the number of i, j or ij configurations that do not overlap with the colloidal cores (see Fig.2). Note that, W i(j) for the unbound linkers is equal to the Ω i(j) s appearing in the definition of F rep , Eq. 2.
∆G cnf (r i , r j , {R i }) can be either calculated analytically for simple polymer models, or computed via Monte Carlo simulations, as we discuss in Sec. III. In a small number of (important) cases, ∆G cnf can be approximated analytically 17,21,23 . The different theories that have been proposed to estimate F att make different approximations to calculate this quantity. However, all models [20][21][22]25 assume that strands are non-interacting (although the interaction can still be included in calculating the bond energy 23 ). This approximation is reasonable when different chains are not close to each other. At higher grafting densities (see e.g. 31,32 ), the simple analytical approaches break down, but the simple theories can still be used to gain insight in the effect of the structure of the spacer (e.g. rigid rods or freely jointed chains) on the interactions between DNACCs. For example, ∆G cnf accounts quite well for the experimentally observed trends in the melting properties of DNACCs 21,22 .
Under the assumption that the binding free energy ∆G bond ij between i and j does not depend on the presence of other strands (i.e. the non interaction assumption), knowledge of ∆G bond ij for all possible pairs of constructs (i, j) is sufficient to write down an exact expression for the partition function Z: where the sum runs over all binding configuration φ of the system (see Fig. 1), i.e. Z/Z 0 is simply equal to the sum of all Boltzmann weights due to bonds energies over all possible binding configurations. Note that G bond ij depends on the position of the strands via ∆G cnf , see Eq. 3, and hence depends on the spatial configuration of the colloids.
An exact enumeration of the terms in Eqn.6 becomes rapidly intractable since the number of binding configurations increases very rapidly with the number of strands in the system. Hence Eqn.6 is typically simplified under some assumptions. It is at this level of approximation that differences between the the various analytical theories show up 17,19,21,23,25,27 . Short of exact enumeration, Z/Z 0 , and hence F att , can be calculated to any given accuracy via Monte Carlo simulations using thermodynamic integration 35 , providing a route to fully include all possible competitive effects between formation of different binding configurations, as well as about the spatial distribution of the strands. The exact solution reads 23,24,26 : where p ij is the probability that a bond between strands i and j is observed, as sampled by MC simulations. In Refs. 23 and 27 we have shown that the 'exact' Monte Carlo results are approximated extremely well by a very simple expression: and where the index i runs over all possible strands in the system. In Eq. 8, p i is the probability that strand i is not bound, found by solving the set of coupled self-consistent equations represented by Eq. 9. Eqns. 8 and 9 can be derived using a recursion formula to count all possible binding configurations (see Ref. 27).
In our derivation of Eqs. 8,9 we used an approximate yet very accurate formula for counting the degeneracy of each binding configuration, accounting for the fact that no two linkers can bind to the same target 27 . From a combinatorial point of view, it turns out that our procedure can be mapped onto Wertheim's theory for the equilibrium properties of fluids with highly directional inter-molecular forces 36-38 . Wertheim's solution is a basic ingredient of the widely used SAFT theory 39 . In practice, each reactive DNA strand (or, more generally, each binder) must be considered in the Wertheim picture as a spatially and orientationally fixed particle with a single binding site, but each having a unique bond energy, which in our case must also be defined as in Eq. 3. In our system, we also have the additional benefit that some of the approximations Wertheim used are always exactly satisfied conditions, in particular the fact that if two strands are bound a third one cannot bind at the same time. Once the problem is cast in these terms, it becomes mathematically equivalent to that of Wertheim.
The approximations underlying Eqs. 8,9 become exact in the case of mobile constructs, see Ref. 28 and Sec. III for more details. Finally, we stress that although these equations provide the F att induced by the interaction between grafted strands. However, it is easy to show that they remain the same when binding between these strands is not direct, but is mediated by (or compete with), free complementary strands in solution, another widely used bonding scheme in DNACCs. In practice, Eqs. 8,9 remain valid, as well as the configurational part of the bond free-energy ∆G cnf , whereas the solution hybridisation free-energy ∆G 0 ij between strands must be shifted with a density dependent factor. Since Eqs. 8,9 agree extremely well with the corresponding MC simulations, they also properly reproduce all the (important) correlations introduced by the competition between strands for the same binding partner. Moreover, they are completely general in that they can be used to treat an arbitrary number of different binding pairs, with any spatial distribution. Hence, they can be used to calculate colloid-colloid interactions also when highly non-homogeneous DNA coatings are present, as in the experimental system reported by Pine and coworkers mimicking patchy particles 40 . Earlier expressions that have been proposed in the literature 17,19,21,22 can be derived as approximations of Eqs. 8,9 under different assumptions 27 . In particular, the aforementioned models all shared a mean-field approximation for binding, i.e. unlike Eqs. 8,9 they do neglect correlations that are due to the fact that two strands cannot bind simultaneously to the same binding partner (the 'valence constraint' 23 ). Furthermore, strands were not only treated as if they were completely independent from each other, but also as if they were all statistically equivalent. The first approximation becomes progressively worse as the binding strength of bonds increases (relative to k B T ). This drawback can be heuristically corrected by a self-consistent mean field treatment 41 , which yields the mean-field version of Eq. 9. The assumption that all bonds are statistically equivalent gets worse as the grafting of chains becomes more heterogeneous, since in this case each strand will experience a different environment. However, if we assume that both approximations are justified, then the expression for F att simplifies 17,19,21,22 : where < n > is the number of bonds formed in the system. This expression for F att is called the "weak binding" or the "Poisson" approximation 17,19,21,22,25 . It provides a lower bound for the real interaction energy, as it always overestimates the number of possible bonds: any correlation would reduce the number of bonds that can form. It has been pointed out 25 that models that are based on the Poisson approximation may overestimate the attractive interaction by as much as two orders of magnitude, corresponding to approximately ≈ 5kBT per DNA bridge. The "weak binding" approximation can be obtained formally from the expression given by Eq. 7 by a simple assumption. The integrand in Eq. 7 is nothing but the expected number of bound pairs in the system. If this number can be approximated by a Poisson distribution (an approximation that breaks down at higher binding strength 24 ), then Eq. 10 follows from Eq. 7. It can be also shown that Eq. 10 corresponds to a first-order approximation of Eqs. 8,9 (the second-order correction is positive), and its validity is limited to a region where each strand has very few binding partners or small binding energy (hence the "weak binding regime") 27 . Another possible drawback of Eq. 10 is that it cannot be generalised easily to treat an arbitrary number of binding partners with different binding energies, since Eq. 10 was derived considering not just independent, but also equivalent strands. This is a serious limitation, as the phase diagram of DNACCs can be 'designed' by making colloids with more than one type of strand (see e.g. Ref. 29).
A possible way to estimate F att for an arbitrary number of types of strands was proposed by Rogers and Crocker 25 . Heuristically keeping Eq. 10 as the expression for the attractive contribution, they proposed to use mass balance equations valid for chemical reactions in solution to describe binding between grafted strands, an approach dubbed "Local Chemical Equilibrium" (LCE). In Ref. 25 it is assumed that the binding for grafted strands can be treated as an equilibrium binding reaction in solution with a non-homogeneous distribution of active sites. This distribution is obtained by Monte Carlo sampling the position of the end-point of polymer chains. F att is then approximated as: where C α(β) (r) are the equilibrium local concentrations, of free, unbound active sites of type α(β) (note that in this formalism all strands with the same active site are considered equivalent, whereas in Eqs. 8,9 all strands have different identities) C 0 α(β) (r), the initial concentration of active sites when no binding occurs, and C αβ (r) the concentration of bound α−β pairs once binding is allowed. Note that Eq. 13 represents a very strong constraint: in general, mass conservation makes it valid when concentrations are integrated over the whole volume. Eq. 13 instead assume that this is true at each point in space, a much stronger statement. In fact, the approximation inEq. 13 is in general not justified 23 , as we discuss in more detail below.

A. Equilibrium in bulk binary reactions
In order to derive the (LCE) expression for DNA binding, it is important to realise that tethered DNA strands are distinguishable. The derivation of the condition for local chemical equilibrium is therefore different from the case where the reactants and products comprise molecules that are indistinguishable. As an illustration, we will assume that the tethering of DNA strands results in a confinement of the 'reactive' ends, but that the free-energy cost of tethering can be ignored (this assumption can subsequently be refined). We consider the hybridisation reaction α + β αβ. As we can ignore the tethering free energy, we can treat the mixture of α, β and αβ strands as an ideal gas of distinguishable particles. We assume that, α, β and αβ are confined to a volume V that is small, but large enough to contain many molecules. The partition function of a system with N α (N β ) monomers of type α (β) and N αβ complexes is then: where N 0 α (N 0 β ) denotes the original (i.e. pre-reaction) concentration of α (β) , and N 0 ≡ N 0 α + N 0 β . Note that the combinatorial factors in Eqn. 14 are not related to indistinguishability but to the number of ways to select N αβ reaction partners from N α and from N β monomers. The factor N αβ ! accounts for the number of ways in which N αβ distinguishable particles of type α and the same number of type β can be paired. The partition function is a function of N αβ . It is maximised for Hence, for large enough numbers of reactive particles in the volume V , the hybridisation of tethered strands obeys the same equilibrium expression as a binary gas mixture. However, for small particle numbers this expression fails even qualitatively (see Appendix A). This means that the LCE is reasonable in the 'mean-field' limit where any given DNA strand can bind to many others, but it fails badly when this number is O(1) or less. This is typically the case when the average distance between grating points is not small compared with the length of the tether. It may seem surprising that the approach of Ref. 25 yields results that were compatible with the experimental data, given the fact that Eq. 12 is based on an approximation, and that the mean-field assumption is not expected to be particularly good in the case studied in Ref. 25. It is probably more likely that the accuracy of the experiments reported in Ref. 25 was insufficient to discriminate between different models 23,42 .
For this reason, the best way to discriminate between various approximate models is to use computer 'experiments' in which 'exact' Monte Carlo simulations of a specific model are used to compare between different approximations. In particular, if a model cannot reproduce the behaviour of a well-characterised model system, its agreement with experiments must be fortuitous. In Ref. 24 , we compared the different analytical theories with MC calculations to test the validity of mass balance equations and found that LCE can give results that differ substantially from the 'exact' simulation results, especially at high binding strengths (see also Fig. 3). It should come as no surprise that the agreement worsens at high binding energies as Eq. 10 is only recovered from our analytical theory Eq. 8 in the weak-binding limit p i → 1 27 A more positive finding is that, in the weak-binding limit Eq. 10 remains still valid for an otherwise arbitrary distribution of binding strengths. No such good news is in store for the original LCE: it fails, even in the weak-binding limit: mass-balance equations can only be used to calculate the bond densities when considering the binding between free-strands in solution, not for grafted strands, where the local mass-conservation implicit in Eq. 12 does not hold. However, starting from our self-consistent equations, Eqs. 8,9 we can show that a better LCE approximation for the binding densities is given by: together with the self-consistent condition: N α being the number of strands of type α in the system. Once corrected, the LCE treatment gives essentially exact results, at least for weak binding where Eq. 10 is valid, when compared with full Monte Carlo simulations, unless the number of binding partners per linker is very small (see Appendix A and Fig. 3). The advantage of Eqs. 8,9 is that these equations agree with the MC simulations over a wide range of binding regimes, not just in the weak binding limit (see Fig. 3). In this figure, we also show a comparison with a form of LCE that was recently proposed by Rogers and Manoharan. The latter theory assumes local mass balance but uses an different approximation for F att (Ref. 43 -SI).
The theoretical work presented thus far is based on various assumptions, as summarised in the box at the end of this section. The validity of the approximations involved in the theory should be tested against numerical simulations for the same model system, whereas the quality of the underlying molecular model (flexibility, binding strength, inter-molecular interactions) should be checked against experiments. The most stringent experimental test to date is the ability to reproduce the inter-colloidal potentials as measured in optical-tweezer experiments. Probably even more importantly than reproducing quantitative data, the presented theoretical framework also provides important qualitative insight: it allows one to rationalize various important properties of DNACCs, such as the fact the DNA-mediated binding of DNACCs has a much sharper temperature response than free DNA in solution. As a consequence, DNACCs show higher selectivity in discriminating between slightly different target DNA strands as used in biosensing applications 8 . Moreover, the theory explains why DNACCs have a higher dissociation ('melting') temperature than free DNA in solution 21 and it accounts for the dependence the melting temperature of DNACCs on the type of spacer used for grafting DNA 21,44 . Within the model, all these effects follow from either multivalent effects due to the large number of binding configurations, or from the entropic penalty ∆G cnf arising due to grafting constrains upon bond formation.
The availability of an accurate and predictive theory (Eqs. 8,9) made it possible to explore how the competition between different strands for binding would affect the phase behaviour of DNACCs. Examples are the prediction of the design rules for mixed-strand DNACCs that can undergo re-entrant melting, and the possibility to achieve temperature independent interactions due to purely entropy-driven binding 29 . Although the subsequent experiments that reported both effects 43 used a slightly different system than the example discussed in Ref. 29, , the physical basis of the observed effects was the same. The easiest way to see this is by considering the predicted temperature dependence of the inter-colloidal potential of the systems of Refs. 29 and 43. The underlying cause of the reentrant melting is simply that the effective attraction first increases with decreasing temperature, and then drops again, whereas the reason for a widened solid-fluid coexistence stems from purely entropy-driven (and hence temperature independent, considering F att /k BT ) interactions, see Fig. 4. The first behaviour is due to a thermal control of the dominant type of bond in systems featuring competing linkages 29,41 , while the second arises from purely combinatorial factors between configurations featuring the same number of hybridised strands, but a different number of bonds between colloids (since some of the hybridise strands do not contribute to inter-particle binding) 29,43 .
The fact that seemingly different systems can be described by the same theory and exhibit the same behaviour, allows us to make an important general point: as long as we have quantitative information about the strength of the ligand-receptor interactions, the nature of the linkers is immaterial. ssDNA is a very convenient molecule to link different colloids together, but the are many other candidates (for example, the cucurbit [8] uril  (randomly grafted) strands in the high and low density regime, respectively. Color code: Red = Self-consistent theory with added spatial averaging of receptors 23 , Blue = Self-consistent theory, Green = LCE0 as in the original version 25 , Orange = LCE1, as modified by Rogers and Manoharan 43 . Note that only the self-consistent theory properly describe the Monte Carlo data in all ranges of densities and β∆G 0 . Both forms of LCE provide a decent comparison at high ∆G 0 (=high temperatures) and high grafting densities, but fail outside this regime. system 45 , or sialic acid/hemagglutinin complexes when nanocolloids/virus interactions are concerned 46,47 ): the underlying physics remains exactly the same. We stress this fact because the development of applications non-DNA-based multivalent binding is expanding rapidly 48 , and such applications can be rationalized, and designed, using the presented framework.
In the context of DNA-mediated interaction, cooperativity is often mentioned as a crucial aspect. In the theoretical description above, cooperativity plays no role; we assume explicitly that the formation of one bond does not change the binding strength of others, and yet our theoretical predictions agree very well with experiment. To avoid all confusion, we stress that, of course, the fact that one bond between two colloids has formed, makes it easier for subsequent bonds to form, simply because the loss of translational entropy of the binding partner is incurred in the first binding event only. However, that phenomenon has nothing to do with a phenomenon where, at a microscopic level, the strength of subsequent bonds is enhanced by a change in the local environment. In fact, the latter type of cooperativity is very hard to demonstrate [49][50][51][52] , and it would seem that all experiments where special, cooperative effects were invoked (in the context of an overly simplistic theory) [49][50][51][52] can, in fact, be explained within the theoretical framework that we discuss here, without cooperativity. This does not mean that there cannot be cooperative effects but simply that the evidence is lacking, if agreement with a simpler theory can be viewed as Occam's razor.
Having said this, there are definitely cases where cooperativity -but, in this case almost certainly negative cooperativity -could play a role: in nano-sized colloids, and possibly for the recently developed micron-size col-  Figure 4. Schematic representation of the pair potential first predicted by theory 29 and then supported by experiments and an LCE-based approach 43 (see Fig. 2,3 in Ref. 29 and Fig. 1 in Ref. 43 for the original references.). a) The inter-particle potential is non-monotonic, resulting in an experimental phase diagram with re-entrant melting. b) The purely entropic origin of the potential at low temperatures provide a temperature-independent interaction (once scaled for β = 1/kBT ) below a certain critical temperature 29 . This translates into a temperature-density phase diagram with a constant width for the gas-aggregate coexistence region. In Ref. 29, the potential were induced via direct hybridisation between grafted strands, whereas in Ref. 43 an indirect hybridisation to free-strands in solution was used.
Main approximations implicit in analytical theories Our key message is that the sharper melting behaviour of DNACCs, the key experimental finding that is often cited as evidence in support of the cooperativity model 18 , is perfectly well explained by our simple model that ignores cooperativity (and, in fact, by even much simpler, non-cooperative models 12 ). On the experimental front, a sharper melting is also clearly observed for systems of micron-sized DNACCs where grafting densities were too low for any cooperativity to occur 19,20 . It is appealing that a simple theory that needs not invoke specific, system-dependent cooperative effects can provide a unified explanation of DNACC melting for a wide range of systems.

III. SIMULATIONS
Any numerical model of DNACC systems must account for the multi-scale nature of the problem. The reason is that the physical properties of DNACCs depend on both microscopic and mesoscopic features, such as the sticky-end sequences, the structure of the spacers, and the grafting density of the DNA units, that enter into the design of the system at different length scales. In fact, the differences between various types of DNACCs are substantial. For instance, whilst nano-sized DNACCs may be coated with only a few dozen DNA strands, micronsized DNACCs may be coated with tens of thousands of DNA strands 53,56 , although typically the coating density in this case is much lower than for nanocolloids.
Not surprisingly, particles that are that different require different coarse-grained modelling and simulation strategies. The modelling approaches for micron and nano-sized colloids will be reviewed in Secs. III A and III C respectively. In Sec. III B we extend the discussion to micron-sized particles with mobile rather than grafted strands, e.g. suspensions of functionalised lipid-bilayers and emulsions. Although, in principle, fully atomistic modelling of DNACCs is conceivable, the cost of such an approach would be prohibitive and, as in other areas of colloid science, there is no need for such a brute-force approach. However, it should also be clear that microscopic details matter as the interaction between ssDNA strands depends sensitively on their nucleotide sequence. Any model, no matter how coarse-grained, must take this specificity into account.
Roughly speaking, there are three classes of coarsegrained models that account for the DNA specificity in different ways. The most detailed approaches explicitly account for the pairing of the individually-resolved nucleotides (see part 1 of Fig. 5) 57-59 . Recently, more portable but expensive models, also capturing the double stranded structure of DNA, have been introduced [60][61][62] .
A more coarse-grained approach uses empirical rules for the binding free-energy of DNA sequences. In this description, the binding sequences are represented as structureless points (see part 2 of Fig. 5), and the binding freeenergy is given by Eq. 3, as discussed in Sec. II. Finally, a less well-founded, but nevertheless frequently used approach is based on the 'chemical equilibrium' approximation (see part 3 of Fig. 5). The chemical equilibrium theory treats binding reaction as a dimerisation reaction in a confined gas of sticky end groups. The advantage of this approach is that it is simple and, when the balance between free and bound ideal constructs is properly implemented (see Appendix B in Ref. 23), it becomes equivalent to rigorous treatments. However these methods are practical only in the 'mean field limit', where the spatial distributions of sticky ends are not strongly dependent on the exact position of the grafting points.

A. Micron-sized particles
Micron-sized colloids may be coated with several thousands of constructs per particle 17,[20][21][22]25,53,56,63,64 . A detailed representation of such a large number of constructs is usually not computationally feasible even when the aim is only to study the effective interaction between two colloids. The problem is, however, not as serious as it seems, because large particles tend to have a radius of curvature that is much larger than the length of the binding strands. Under these conditions, one can use the Derjaguin approximation (see e.g. 65) to calculate effective pair potentials between colloids 26,29,66 to compute the interaction between two curved surfaces from knowledge of the distance-dependence of the interaction between two flat, parallel surfaces. If this approximation is not used, a local chemical equilibrium approach (or a "spatially averaged" version of the self-consistent equations, see Ref. 23 and 27, and Fig. 3) is still tractable, but for all approaches based on the explicit calculation of interaction between complementary strands, use of the Derjaguin approximation becomes imperative. With the Derjaguin approximation, the fact that micron-sized colloids can be coated with very large numbers of DNA strands does not create a technical challenge. As discussed Sec. II, most theoretical approaches developed to study DNACCs are based on the assumption that non-binding strand-strand interactions can be ignored. We stress that this assumption is not essential (and questionable for nano-sized colloids: see Ref. 31 and 32 and Sec. III C), but in many cases, the neglect of non-binding strand-strand interactions is justified, in which case an accurate estimate of ∆G cnf is all that is required to obtain quantitatively accurate potentials from computer simulations. In practice, we calculate the partition function of unbound strands and the configurational part of the binding free energy of two strands, ∆G cnf , by simulation. The strand-strand binding free energy ∆G 0 is not obtained from simulation but from the empirical SantaLucia rules 67,68 . Again, more sophisticated rules could be used where necessary. Once these two ingredients are known, and if strand-strand interactions can be neglected, the theoretical approaches described in Sec. II can be used to calculate accurate inter-particle potentials. As was demonstrated in experiments of Bracha for DNA-chips et al. 69 strand conformations are weakly affected by DNA-DNA interaction up to thousand strands per µm 2 . Moreover, a rough estimate 26 suggests that the contribution of non-bonded spacer interactions to the free energy of the system is of the order of one k B T unit per strand, which is much smaller in absolute terms than the typical values of the configurational and combinatorial free-energy that are around ≈ 10 k B T for typical spacers 21 .
A notable exception of a micron-size system where nonbonded strand interactions may be significant is provided by the densely grafted micron-sized colloids studied by Pine and co-workers 53,56 . The reason why higher grafting densities could be reached in Refs. 53 and 56 is that in these experiments, grafting was achieved by using click chemistry, rather than via the use of rather bulky streptadivin constructs that anchor the biotinylated spacers.
Having sketched the various theoretical approaches to predict the binding strength of DNACCs, we next to consider how to estimate the change in the configurational free energy ∆G cnf of the grafted strands as the distance between colloids is varied. In general, ∆G cnf can be computed using Monte Carlo simulations 23,24,70 . Practical aspects of these calculations for ssDNA constructs are discussed in Appendix B. In this modelling, it is important to account for intra-chain Coulomb interactions. For instance, ssDNA constructs have been modelled as uniformly charged, freely-jointed chains (FJC) 71 with screened Debye-Hückel interaction (see, e.g. Ref. 72). Such an approach has been used to model micronsized 23,24,70 and, as we will discuss later, nano-sized 31,32 DNACCs. When accounting for electrostatic interactions in this way, the electrostatic contribution has to be removed from the phenomenological DNA hybridisation free energies, to avoid double counting.
The numerical approach to compute the configurational contribution to the DNACC interaction described in Appendix B is efficient for short to medium length linkers that can be coarse-grained into at most a few dozen Kuhn segments. However, the same methods would be computationally unfeasible when applied to very long chains. Numerical studies of colloids functionalised by very long DNA strands such as λ-DNA 76,77 require a further simplification of the description of the DNA degrees of freedom. Bozorgui and Martinez 73-75 employed a numerical approach based on the 'blob' representation of long polymer chains 78,79 in which entire DNA constructs are lumped into sites interacting via gaussian potentials [78][79][80] . Multi-blob models have been used by Mladek et al. 31,32 , but the approach, whilst fairly accurate, quickly becomes cumbersome. The potentials between the blob and the colloids have been parametrised by using explicit 'all-monomer' simulations, while the hybridisation reactions between complementary strands is simulated using a dedicated MC move that attempts to join two complementary blobs with a cost equal to ∆G bond ij , where ∆G cnf is given by the energy of an harmonic spring connecting the centres of the two blobs of the form β∆G cnf = 0.534 (r/R g − 0.730) 2 , where R g is the gyration radius of the strand and the numerical values have also been parametrised using all monomer simulations (see part 2c of Fig. 5).
The ultimate aim of coarse-grained modelling of DNACCs is to predict their phase behaviour. Once the tools are in place to compute the DNACC interactions, standard free-energy calculations 35 can be used to compute phase diagrams. Examples of such studies can be found in Refs. [73][74][75]. These simulations showed that DNACCs exhibit very interesting phase behaviour. For example, the simulations showed that the liquid-crystalvapour triple point disappears as the number of grafted chains drops below a threshold of around 12. In that case,  the 'ground-state' at zero pressure but high densities is a liquid, not a crystal 75 . Recent simulations on 'patchy' colloids 81 , showed similar behaviour in such systems.

B. Mobile tethers
Recent experiments have studied DNACCs where the DNA strands are mobile, rather than grafted in fixed positions. Examples include emulsions 82-84 and lipid bilayers 33,85,86 functionalised by hydrophobised DNA constructs terminated by reactive sticky end sequences.
It is straightforward to extend the theoretical tools to model DNACCs with grafted DNA strands to the case where these strands are mobile: due to the translational degrees of freedom of the binders, the hybridisation freeenergy needs to include an extra configurational term (∆G trn ) that accounts for the constraint on the relative position of two tethering points when bound This confining entropic cost ∆G trn ij destabilises DNA pairing. However, the combinatorial gain is higher in system of mobile tethers because each binder can potentially bind more complementary partners than in a system with fixed tethering points. A quantitative analysis of all the entropic and configurational terms of Eq. 18 for a system containing both bridges (i.e. bonds between strands residing on different particles) and loops (i.e. between strands on the same particle, when this bears complementary sequences) has been given in Ref. 33. Interestingly, in these systems the self-consistent theory developed in Refs. 23,27,and 41 (Eqs. 8,9) becomes exact. This was demonstrated in Ref. 28, starting from an exact free energy functional F({n}, {∆G bond }) calculated as a function of the number of bridges between bound colloids ({n}) and the matrix of the hybridisation free energies ∆G bond ij (where i and j now represents a strand type, not a single specific strand). Taking the large number of binder per particle limit, a saddle-point approximation provided the most probable number of bridges between colloids (n) Remarkably Eq. 19 is equivalent to Eq. 9. Using {n} one can calculate the attractive potential between colloids and recover exactly the expression given in 8 (once the number of bridges n are expressed in term of probability of making bridges between colloids). It is possible to generalise this discussion to systems featuring different types of bonds including not only bridges but also loops 33 . Slightly different expressions are obtained in presence of infinite reservoirs of binders as shown in the study of a vesicle interacting with a large supported lipid bilayer 86 . Eqs. 19, 20 were used to simulate rigid colloids functionalised by mobile strands (a possible experimental realisation being that of Ref. 85) in which colloidal suspensions are sampled using an effective interaction free energy F att ({R i (t)}) + F rep ({R i (t)}) that only depends on the positions of the colloids {R i (t)}.
We note that the behaviour of DNACCs with mobile linkers can be qualitatively different from that of DNACCs with grafted linkers, in other words: mobile linkers introducs new physics in the system. For example, Ref. 28 showed that the inter-colloidal potential induced by DNA-hybridisation of mobile linkers is intrinsically a multi-body interaction. This is due to the fact that each construct on a given particle can bridge all the neighbouring particles. Hence, different neighbouring colloids may compete for the same linkers. As the number of bonds with different neighbours are now correlated, the free-energy cannot be decomposed into the sum of pairinteractions. The fact that DNACCs with mobile linkers have intrinsically multi-body interactions can be exploited used to control the 'valency' (preferred number of complementary neighbours) of a particle 28 . This is interesting because conventional approaches to tune the valency of colloids are based on the use of different number of interacting 'patches'. However, the preparation of well-controlled patchy colloids is non-trivial. In contrast, DNACCs with mobile linkers can have a preferred valency without the need of physically creating real patches on the surface of the colloid 40 .
Even more complex effects should be expected for systems of mobile strands where the colloid to which they are attached is deformable, as in the case of DNAfunctionalised vesicles 33 , where the flexibility of the lipid membrane plays a non-negligible role 86 in determining the bond-mediated interaction. For instance, Hu and collaborators 87 considered the interaction between two membranes that were coated with complementary proteins (rather than DNA). They found that the equilibrium constant for protein-protein binding (i.e. ∆G bond in our nomenclature) decreases by a factor that is controlled by the roughness of the membranes 87 . This effect, when translated to DNA-mediated interaction, can be expected to have important consequences for the selfassembly.
Modelling the interaction between nano-size DNACCs is more demanding than modelling their micron-sized counterparts. The underlying reason is simple: the length-scale separation that justifies the factorisation between the different driving forces and facilitates modelling of micron-sized particles is less pronounced for nano-sized particles.
There are other differences as well: many of the earlier experiments on DNACCs (although not the most recent ones 53,56 ) used a bulky streptadivin-biotin construct to tether DNA strands to the surface of micron-sized colloids. In contrast, DNA anchoring to gold nano-particles is usually achieved by bonding the (much smaller) thiol group to the gold surface. As a consequence, the density of the DNA strands on nano-sized colloids is sufficiently high that the interaction between different DNA-strands cannot be neglected for a quantitative modelling. Constructing a coarse-grained model of nano-sized DNACCs is therefore much more system-specific than for micronsized colloids. A case in point is the coarse-grained model developed by Mladek et al. in an attempt to reproduce a typical experimental system 5 . Construction of the model required three coarse-graining steps. In the first step, simulations were performed on nano-colloids (12 nm diameter) functionalised by 60 homogeneously charged freely-jointed chains representing the ∼65 basepair constructs used in experiments 5 . At this level of description, no DNA hybridisation was taken into account. Hence, the particles were simply treated as nanocolloids sterically stabilised with densely grafted chains. The simulations probed the repulsive interaction between two such colloids. Subsequently, a multi-blob model was constructed to reproduce the computed repulsive interac-tion between the nano-colloids. The ends of the coarsegrained chains were then functionalised with units that could hybridise. However, for the nano-sized colloids, the spatial extent of the 'sticky' end group could not be ignored, nor the fact that, upon hybridisation, two such groups combine to form a rigid double helix that is more rigid than the unbound sticky ends. Hence, although in this model too the hybridisation free energy is calculated using the SantaLucia nearest neighbour rules 67,68 , the conformational changes upon hybridisation are accounted for in the model (see part 2b of Fig. 5).
Using this fairly complex model, the authors computed the crystallization behaviour of the nano-sized DNACCs and they computed the structural properties of the crystalline solid. Encouragingly, the simulations predicted the correct stable crystal structure and gave very good estimates for the temperature at which the solid melts. However, the estimated crystal lattice spacing was off by some 10% (presumably because the experimental estimates of the persistence length of ssDNA show a considerable spread). The other important finding was that three-body interactions play an important role and must be correctly included in order to predict the structure and melting of the crystalline phase. As the above discussion shows, coarse-grained modelling of nano-sized DNACCs is not simple. However, with the use of a coarse-grained polymer model and the empirical SantaLucia nearest neighbour rules, it is still much more efficient than a brute force simulation of nano-size DNACCs. Indeed, both the system sizes (hundreds to thousands of colloids) and the timescale for hybridisation are such that atomistic simulations of the phase behaviour of DNACCs is out of the question for the foreseeable future, whilst calculation of the pair interaction between nano-sized DNACCs is, at lest at present, not an attractive proposition.
There are, however, good reasons to use models for DNA that, whilst not atomistic, due reproduce the structure and dynamics at the nucleotide level. In fact, much effort has been devoted to the development of coarsegrained models that are capable of reproducing the hybridisation free energy, the binding kinetics and the mechanical properties of DNA complexes [60][61][62] . As the interactions that drive DNA hybridisation are primarily the hydrophobic forces between neighbouring bases, such models require a nucleotide-level description of the biopolymer.
Although there have been extensive and very successful simulations of systems with few DNA strands, the models used are still not an attractive proposition to simulate the collective behaviour of DNACCs, as such simulations would require the study of systems of hundreds (or thousands) of colloids each coated with dozens of strands. Moreover, computing the thermodynamic properties from equilibrium simulations requires sampling times that are much longer than the time for a typical hybridisation/de-hybridisation events. Hence, at present, this is not yet possible with the models of Refs. 60-62. For this reason, even more coarse-grained -but still nucleotide-based -descriptions have been developed starting with the work of Starr and Sciortino 57,58 . In the approach of Ref. 57,58 , the nucleotides are described as (repulsive) spheres functionalised with a binding site. Neighbouring monomers are bound together by a socalled FENE potential 88 . A bending rigidity term between each three consecutive monomers is also used. A binding site, specifying the type of base, is connected to each monomer via a FENE potential. These binding sites selectively interact with complementary sites via Lennard Jones potentials. This model has been used to study the assembly of dendrimer-like structures functionalised by four DNA arms 57,58 , self-assembly mediated by linkers 89 and crystallisation of nanoparticles 90 . A model that is closely related to that of Starr and Sciortiono 57 was deployed by Knorowski and collaborators 91 (see part 1b of Fig. 5). As in Ref. 57 and 58, DNA constructs were modelled by mean of beads carrying selective binding sites. However, two flanking repulsive sites were added to every bead belonging to the sticky end part of the construct. The function of the flanking beads is to provide directionality to the interaction between complementary bases and to prevent (unphysical) multi-base interaction. This model has been used to probe the stability of different crystals 91,92 and to investigate the dynamics of crystallisation 59 . Recently the model of Starr and Sciortino 57 has been used by Theodorakis and collaborators to study the pair interactions between nanoparticles functionalised by up to eight constructs 93 . This study highlighted the need for a potential further refinement of the model, as the simulations showed the presence of spurious bound states in which the ssDNA strands that form a duplex were oriented parallel, rather than antiparallel. Parallel hybridisation is not found in real DNA, as the strongly polarity-dependent Watson-Crick pairing between complementary nucleotides uniquely favours anti-parallel hybridised sequences.
The model of Knorowski has been further developed by Li et al. 94,95 . In particular the size of the beads in the sticky part and in the spacer part of the construct have been differentiated to qualitatively account for the different mechanical properties of the ssDNA and of the dsDNA spacers. This model is parametrised in such a way that each bead corresponds to ≈ 2-3 bases. This relatively detailed model was subsequently used to parametrise an effective pair-potential that is composed of a short range repulsion due to the electrostatics of the system, and an attractive interaction provided by the hybridisation of the sticky-ends 96,97 . Using the effective model the authors rationalised experimental finding on the dynamics of crystallisation 96,98 , while the detailed model has been used to probe the stability of the crystals 94,95 and to investigate the microscopic structure of the facets of DNACCs-made crystal 98 . Detailed molecular models have also been used to investigate interaction between DNA constructs and the nanoparticle substrate (e.g. Ref. 99 and 100), and to study the changes in the structure of dsDNA when used to assemble crys-talline structures 101 . However, these studies go beyond the scope of the present review, where we focus on the physical properties of DNACCs that are related to the reversible biding of multiple DNA complementary pairs, rather than the formation of a single pair formation.

D. Modelling perspectives
There are in principle two ways to construct a proper coarse-grained model for DNACCs. One (the top-down) approach is to derive a coarse-grained model from a fully atomistic model for DNA [60][61][62]103 , and use it to simulate DNACCs. Such an approach retains a molecular based description and should be able to describe the hybridisation thermodynamics of DNA. The second approach is instead to start with the choice of the ingredients of the coarse-grained model (e.g. ideal chains, rods, monomers, blobs, hybridisation sites etc) and then to use comparison with the available experimental information to tune its parameters. At present, the top-down approach is not yet as extensively used as that comparing coarse-grained models directly with experiment. An encouraging example in this direction is the work of Lequieu et al. 104 and Ding et al. 105 , who recently reported studies of pair potentials between nanoparticles functionalised by few constructs of reactive DNA, using the 3SPN model previously used to study the dimerisation of DNA in solution (Refs.62 and 103 ). The OxDNA model of Ouldridge and coworkers 60,61 is another valuable CG model that has been tested in multiple DNA nanotechnology systems. To our knowledge, OxDNA has not yet been used to improve the modelling of DNACCs.
There are other aspects of the modelling that need to be improved. One is related to the naive use of nearestneighbour rules to compute DNA binding free energies in constructs. Recent experimental/theoretical investigation by Di Michele et al. 102 showed that the SantaLucia nearest-neighbour rules 67,68 overestimate the hybridisation free energy ∆G 0 in the case of DNA oligomers that are extended to include strings of non-complementary bases 102 . Considering for instance two complementary sequences of ssDNA decorated by two inert strings at their 5 terminals (snapshot of Fig. 6a), Ref. 102 reported that ∆G 0 increases with the length of the inert tails until reaching a plateau for a number of inert bases equal to ≈ 5−7 (plot in Fig. 6a). In particular the presence of the inert tails destabilises double helix formation and hence decreases the melting temperature of the oligomers. Such an effect is due to the electrostatic interactions between the charged backbones of the DNA that is not included in the nearest neighbour rules 106 . Quite surprisingly Ref. 102 showed that the shift in the hybridisation free energy due to the tails (at salt concentration that are typical of DNA coated colloids experiments 25 ) can be as large as the stacking contribution of the nearest-neighbour rules due to the first dangling (i.e. non-hybridised) base in the nucleotide sequence, a standard term entering the nearest-neighbour rules (see Fig. 6b). The relevance of the inert tails to the calculation of the hybridisation free energy ∆G 0 seems to limit the degree of 'responsible' coarse graining of DNA mediated interactions (also in other DNA nanotechnology systems 107 ). In particular if we consider models with explicit hybridisation thermodynamics, the importance of the electrostatic interactions between the inert tails and the DNA duplex requires an explicit description of the charged backbones in the tails and in the duplex at the atomistic level. Interestingly the latest version of OxDNA introduced Debye-Hückel interactions 72 between the backbone sites 61 .
The problem is that the binding free energy computed for a typical configurations of the dangling segments of the free oligomers 102 , may not be transferable. This is the case, for instance, if the tails are more stretched when bridging two colloids than when they are free in solution. Hence, estimates of ∆G 0 obtained from experiments for free strands might not be reliable once the same strands have been grafted. These 'minor' effects are not so minor when one realises that a bias of one k B T in ∆G 0 results in a shift in the melting temperatures of about ≈ ±3.5 K (see Fig. 6b). Hence, when analysing accurate experiments 25 , neglect of the dangling tail effects can affect the interpretation of the results, once the experimental resolution gets better than a few Kelvin 17 .
There is also the 'no-free-lunch' theorem of coarse graining. As was pointed out in a different context many years ago by Louis, the design of an optimal model that carefully reproduces the structural properties of a system may result in a model that is not representative of the thermodynamics of that system (and viceversa) 108 .
Finally we stress something that has been stressed many times: structures may be mechanically stable, but thermodynamically unstable. Demonstrating that one structure is more stable than another typically requires the calculation of the free energy (or, more precisely, the chemical potential) of both systems. Using MC or Molecular Dynamics to see what kind of structure forms by annealing from a disordered state 91,92,94-96 is interesting, but it does not demonstrate that the structure that forms is stable (in fact, Ostwald's rule 109 suggests the opposite). The problem is that the first stages of the self-assembly process are those with a smaller barrier from the starting disordered state. These states can simply represent mechanically stable metastable states, which would transform to the true thermodynamically stable state at much longer times (if at all). These longer times may not be accessible with experiments, and are certainly many orders of magnitude larger than the timescales accessible by computer simulations. We also notice that, in order to correctly reproduce the stable phases observed in experiments, studies using this simulated annealing approach typically start with simulations boxes compatible with the shape and number of particle experimentally observed, which must thus be known a priori.
Since maximal packing appears to be, at least for conventional binary DNACCs system, one of the major driv-Kcal/mol with dangling without dangling ing force behind formation of specific lattices 16 , starting with experimentally available conditions necessarily biases the simulation toward the expected structure. In short: constructing a phase diagram requires the calculation of the free energy of each possible phase 35 , unless a direct, hysteresis-free transition between the two phases can be observed. One of the main challenges in the simulation of the phase diagram of DNACCs was precisely the need to carry out free-energy calculations 31,32,74,75 that properly account for the hybridisation between the binders. Fortunately, efficient MC algorithms have been developed to calculate the density of states of an aggregate as a function of the number of bridges between particles, whose integration can then be used to calculate the free energy of the crystal 70,74,75 . These algorithms have been successfully combined with various techniques to select among the many possible candidate phases. For example, in view of the rich polymorphism offered by DNACCs, genetic algorithms 110,111 have been used to identify possible crystal structures, whose relative thermodynamic stability was then probed by freeenergy calculations, providing phase diagrams in quantitative agreement with experiments without an a priori knowledge of crystals parameters 31,32 .

IV. CONCLUSIONS
Theoretical and computational modelling have proven to be powerful tools to rationalise the observed behaviour of DNACCs 17,[21][22][23]25,91,[94][95][96]112 , and, in some cases, to predict novel phenomena 29,43 , with some theoretical predictions for unconventional yet experimentally accessible systems still waiting to be tested 28,75 . Understanding DNACCs behaviour means understanding their mutual interactions as a function of the system's parameters. In this regard, we have shown how the presently available theoretical framework can be used to design interactions of almost arbitrary complexity: How these could lead to interesting effects (see for instance Refs. 113 and 114 for some applications) is yet to be systematically investigated. An important factor to highlight in this regard is that the analytical framework we described assumes thermodynamic equilibrium for bond formation, i.e. it implicitly considers the bonding network to be equilibrated on the timescale of colloidal diffusion. Hence, in order to connect the effective inter-particle potential predicted by theory with experimental observations (that are of finite duration) one requires both the bond-formation and bond-breaking rates to be fast compared to the timescale for colloidal diffusion. Until now, this was a strong assumption. However, although this was not true for general bonding schemes previously typically employed, Rogers and Manoharan 43 have recently shown a very powerful and general scheme exploiting toe-holding mechanisms that can make such assumption essentially correct. Proof of this point is already observable in their experimental results, whose trends could be rationalised using a local chemical equilibrium approach that implicitly makes such an assumption. Hence, we would expect such scheme to lead to a more reliable comparison between theory and experiments, opening new and truly exciting perspectives for this field. Most of the focus in the DNACCs has probably been on their self-assembly property, and much remains to be done to exploit this system to its full potential, in particular in the design of functional materials for applications. In our view, however, other DNACCs based technologies will benefit from interpreting experiments in view of the presented framework, selective drug delivery and biosensing being two likely candidates. It should be stressed that, besides describing DNACCssystems that have important nanomedicine applications in their own regard 8-10 -the theoretical and computational methods outlined here apply more generally to all systems whose interactions are dominated by the formation of non-covalent, multiple and reversible ligand-receptor bonds, i.e. supramolecular, multivalent based systems. A wide variety of such systems are already under intense investigation for nanomedical applications 46,47,115,116 , and we envisage important cross-fertilization of ideas between these fields and DNACC research. In particular, understanding how to embed binding selectivity in multivalent-based systems, and not just increase their binding strength, seems to be a crucial point where theoretical work 117 might prove useful to rationalise observed effects 115 , and speed up the rational design of applications. Finally, we stress that there is much need for further theoretical and computational developments.
For applications that will require quantitative predictions, more accurate experiments are needed to inform the parametrisation of existing, or the development of, novel models. An illustration is the tail effect discussed in Sec. III. Understanding the role of non-specific attractive interactions, and also of strand-strand interactions, are further challenges that have still not been properly investigated. The need for such improved modelling has become urgent with the arrival of recent experiments of the Pine group, who was able to reach coating densities that are much higher than what was previously achievable 53,56 . Interestingly these experiments reported enhanced crystallisation rates; it would be interesting to understand these results using properly extended models. Finally, other challenges clearly arise when considering systems where the colloid itself can be deformed upon binding (hence effectively coupling ∆G cnf with the shape of the colloidal core), such as the case of functionalised vescicles. After the long gestation period that followed the early work by Mirkin and Alivisatos, the field is moving rapidly. Exciting times lie ahead of the DNACCs field. We hope that this review will have clarified some of the key unifying concepts that need to be understood in the move towards the rational design of DNACC-based structures. In this regard, we make freely available a python-based set of routines that we developed to calculate ligand-receptor mediated interaction free energies using our self-consistent equations at the following address https://github.com/sangiole/DNACC.

ADDITIONAL MATERIAL
Additional data related to this publication is available at the University of Cambridge data repository (https://www.repository.cam.ac.uk/handle/1810/xxxxxx).
To illustrate the breakdown of LCE and mean-field like approaches in the limit of small numbers of binding strands, first consider the normal dimerisation reaction between two "molecules" α and β.
The condition for chemical equilibrium is We assume that the solutions are ideal. Then the partition function of species x (x=α,β or αβ) is x where q int (x) is the contribution to the partition of x due to all internal degrees of freedom. The equilibrium condition A1 can be written as where C x is the number density of species x. To compare with experiments, it is conventional to express the equilibrium condition such that the concentrations can be expressed in Mol/liter.

[αβ] [α][β] = K
In order to convert to number densities, we note that [x] = C x /ρ 0 . where ρ 0 is the "standard" number density (1 molar = 6.022 10 26 molecules per m 3 ). Then C αβ C α C β = K/ρ 0 = exp −β∆G 0 αβ /ρ 0 Next consider the case where we still have an ideal gas mixture, but now with only very few molecules in the 'reaction volume' (in the case of tethers, this translates into: 'a tether of a given type (say α) can only bind with a small number of nearby complementary tethers.
In that case, LCE breaks down. To see this, consider the extreme (but not unrealistic) case of a volume V containing a single molecule of type α and a single copy of β. As before, α and β can react to form αβ. Let us denote by P the probability that molecule α (or β) is part of a dimer, then P/(1 − P ) is simply given by the ratio of the corresponding single-particle partition functions: Now compare this with the LCE expression. The number density of α (β) equals P/V and the number density of αβ is (1 − P )/V . Hence, LCE predicts: which is clearly not correct. Hence, we should not expect LCE and in general Mean Field theories not accounting for the exact position of the tethering points) to work if only small numbers of monomers can interact. However, this is precisely the situation if the grafting distance between strands is not small compared to the effective length of the strands. On the other hand Fig. 3 shows that the self-consistent approach performs very well also at low grafting density.
Appendix B: Numerical estimate of configuration contribution to binding free energy In this Appendix, we describe how the hybridisation free energy between two ssDNA constructs can be been calculated numerically. The method was employed in Refs. 23 and 24. Two independent Monte Carlo runs are used to generate two interacting FJCs with free and bound end points, respectively, using the Rosenbluth algorithm 35 . In the bound case one grows a single longer chain using a bias to constraint the two end-points to the tethering points 118,119 (see the top inset of Fig. 7). In Refs. 23 and 24, while growing a hybridised chain, we biased the choice of the new segments using the end-to-end distribution function of an ideal FJC of length equal to the number of missing segments 118 (notice that this is not the unique choice possible 119 ). By sampling the corresponding Rosenbluth weights 35 (see the right y-axis of Fig. 7 for the bound case) it is possible to calculate W i,j and W ij (where W (f /b) are the Rosenbluth weights calculated for free and bound constructs respectively) that can be used in Eq. 5 to calculate ∆G cnf . Notice that in this case p ee is the end-to-end distribution function of an ideal FJC with the number of segments equal to the hybridised construct (see top inset of Fig. 7) and that W (f /b) also account for the non-selective interactions between constructs. In the particular case of non-interacting constructs, a further simplification can be used since W i,j factorises into the product of two independent Rosenbluth weight (W i,j = W i · W j ), and one can thus use Eq. 5 instead of the more general Eq. 4. Notice that for interacting strands (W i,j = W i · W j ) a consistent approach would require to include the interactions between constructs also when calculating the repulsive part of the potential (see first term of Eq. 1). The accuracy of the aforementioned calculations depends on the quality of the sampling, which can be probed using the overlapping method developed in Ref. 120, a typical output being shown in Fig. 7 (see the caption for further details). Moreover Ref. 70 developed a dynamic scheme based on the configurational bias method 121 that allows to sample between hybridised and free chains on the fly, hence without the need of precomputing ∆G cnf , for every couple of constructs that could potentially bind. Inset bottom: the algorithm used in Refs. 23 and 24 generates chains with Rosenbluth weights that are distributed according to a distribution function (p0) which is different from the distribution obtained using equilibrium runs (p1). The average of the Rosenbluth weight using p0 (right y-axis), together with the corresponding result for free constructs, is used in Eq. 5 to calculate ∆G cnf . Using the overlapping between p1 and p0 and the results of Ref. 120 we can calculate, in an independent way, − log W p 0 (left y-axis and symbols). The agreement between symbols and the straight line is perfect. The algorithm of Ref. 70 samples between free and bound states in a way that the Rosenbluth weights are distributed with p1. In this case ∆G cnf can be measured comparing the time spent by the system in the free and in the bound state 70 .