Simplified computational model for generating biological networks

A method to generate and simulate biological networks is discussed. An expanded Wooten–Winer–Weaire bond switching methods is proposed which allows for a distribution of node degrees in the network while conserving the mean average node degree. The networks are characterised in terms of their polygon structure and assortativities (a measure of local ordering). A wide range of experimental images are analysed and the underlying networks quantified in an analogous manner. Limitations in obtaining the network structure are discussed. A “network landscape” of the experimentally observed and simulated networks is constructed from the underlying metrics. The enhanced bond switching algorithm is able to generate networks spanning the full range of experimental observations.

Two dimensional random networks are observed in a range of contexts across considerably different length scales in nature: from nanometres, in the form of amorphous graphene; to metres, in the form of the Giant's causeway; to tens of kilometres, in the form of geopolitical borders. 1-3 A framework for describing these continuous random networks for chemical systems was rst introduced by Zachariasen to describe silicalike glasses, and has proved to be extremely versatile in the years since, 4 being used to describe biological networks, including actin networks and basement membranes made of collagen and laminin. [5][6][7][8] Despite the physical diversity of continuous random networks, they share some common properties which arise from the fact that all planar 2D networks can be represented as a tiled arrangement of polygons, joined at their edges. 9 Viewing a network as a collection of polygons explains some restrictions on the nature of the network. For example, the angles at each node must add up to 360 and the mean number of edges connected to each node affects the mean number of edges [or ring size] of each polygon. Traditionally, semi-empirical laws provided a theoretical understanding of the complex behaviour of real networks, most commonly for inorganic networks such as MgO grains, silica, or epithelial cells. [10][11][12] Some examples of these laws include: Lewis' law, 12 governing the distribution of polygon areas; Lemaître's law, 13 relating the number of hexagons to the width of the polygon distribution; and the Aboav-Weaire law, 14 governing the preference of polygons to be surrounded by similar or dissimilar polygons. Recent work has helped clarify the latter by re-casting the problem in terms of the assortativity, a metric designed for network theorists to describe social networks. 15,16 Studying generic 2D networks in biology is a potentially difficult task. There are a variety of different techniques used, including uorescence light-, atomic force and electronmicroscopy. [17][18][19] The resolution of light microscopy is limited by diffusion, 18 and the power of electron or laser beams have to be limited to avoid damaging the delicate networks. 17 However, even when a high-quality image is obtained, there is further difficulty in analysis. For example, each edge in a network is a complex molecule made up of tens of thousands of atoms which may not lie strictly in a single plane. In addition, the most interesting dynamic behaviour oen occurs over very long timescalespotentially decades. To make this complexity scientically digestible, different simplied models have been developed for biological networks, as well as semi-empirical laws. A simplied model is useful for exploring the complex behaviour of these biological networks, and to provide greater understanding of experimental data by eliminating the effects of nite sample size and providing precise control over the conditions. These models can then be used to understand macroscopic or mechanical properties of the networks, either directly or to parameterise multiscale approaches. 6,20 Examples of existing simplied graph-based models include Erdös-Rényi random graphs in which edges are randomly placed connecting nodes, Mikado networks for many-layered systems in which edges are randomly placed across one another, and bond switching in which edges in an ordered graph are randomly exchanged. 21,22 In this work, an extension of a bond switching method, known as the Wooten-Winer-Weaire (WWW) algorithm, is presented to generate 2D networks which align with those observed in biological systems. 23,24 Bond switching methods offer good control over physical properties in the networks generated, and are sufficiently exible to reproduce a wide range of networks. To rigorously compare networks generated by bond switching to real biological networks, key metrics are extracted from a range of experimental microscopy images of networks made of collagen IV or laminin.
Bond switching is a stochastic sampling (oen described as a Monte Carlo) method, in which an initially regular network is systematically modied by a series of edge deletion and addition moves (corresponding to bond-breaking and bond-creation in chemical networks). In the WWW method, 24 the moves are chosen randomly to "switch" the coordination arrangements around two nodes. These switches preserve the coordination number k, of each node. For a more detailed discussion of the original method and gures, see Wooten et al. 24 However, in the extended bond switching method, the mean coordination number hki is xed at an initial value whilst the k of individual nodes may change. A schematic image of this is available as The steps in our algorithm are: Initialise using a network with known polygon structure, calculating the energy according to a potential model. Select one edge between two nodes and make a trial move by deleting it, and adding a new edge between the rst node and a neighbour of the second.
Locally optimise the geometry around the switching site according to the potential model and recalculate the energy (giving the energy difference DE).
Accept or reject the trial move randomly with probability p, determined by the Metropolis criterion: 25 If the trial move was rejected, return to the previous accepted state. If it was accepted, replace the network with the trial network and repeat the process.
This algorithm is quick to compute, because the geometry only needs to be optimised in a small region around the switching site. The algorithm works best for potential models with explicit bonds and well-dened neighbours. To this end, a simplied Keating potential is applied which consists of harmonic bond (of length r) and angular (of angle q) terms: 26 with the equilibrium bond length r eqm xed at 1, and the equilibrium angle q eqm chosen to be 2p k with k the coordination number of the central node. The force constants k r and K q are both free parameters. When the coordination number of two nodes changes in the simulation, the q eqm is recalculated for both of them. Fixing q eqm (i.e. not changing with k) led to the nal networks exhibiting more artefacts and oen showing signicant demixing of nodes. This simple, but physically motivated, potential model is useful to explore the range of networks that can be generated. Future developments will show how more detailed and biologically accurate potential models, such as the exponentialquadratic form used by Burd, 27 may further improve agreement with experimental observation. The simplicity of the potential model could be useful as a framework to be combined with more complex requirements. For example, a strong energetic penalty could be applied to congurations exhibiting undesirable artefacts. In this work this technique is used to reject all congurations in which any node has coordination number k $ 8 or a polygon with side count n $ 20. Ormrod Morley and Wilson extend this idea further; by using Monte Carlo to minimise a cost function instead of energy directly they demonstrate how key structural metrics (there the fraction of 6membered rings in amorphous graphene) can be targeted. 23 It is, however, useful to develop an understanding of the structures formed by simplistic models prior to considering how to effectively contrain them to eliminate unphysical properties, rather than pre-judging what is required.
A honeycomb network of hexagons with hki ¼ 3 is used as the starting lattice, to allow comparison with previous networks. 28,29 The use of a well-dened initial polygon structure, and monitoring the changes at each step, allows the algorithm to ensure that the structure generated is physically acceptable. By constraining the mean node coordination, the mean number of edges per polygon is also constrained because of Euler's characteristic equation for polyhedra. For a 2D network which is either periodic or innitely large, hki ¼ 3 corresponds to hni ¼ 6. This does not hold for nite-size aperiodic networks, such as those typically imaged experimentally. However, the approximation for aperiodic systems is small (for example, for 100 rings, hni ¼ 5.94). 16 The biological networks considered are visually extremely diverse. To demonstrate that diversity, three networks are shown in Fig. 1, which are accompanied by snapshots of the simulated networks, showing that these can qualitatively replicate the range of observed structures. The more regular networks in the gure are characteristic of those formed by inorganic materials. 16 The gure also highlights associated structural metrics, the second moment of the ring distribution, m 2 (k), and the assortativity, r, both of which are discussed below. The difference in accompanying structural metrics in the experimental and simulated congurations highlights the need for a quantitative comparison instead of qualitative similarity. The structural metrics were extracted by a combination of image analysis tools and a mathematical approach based on graph theory as described below. A microscope image can be converted into an abstract graph by extracting the positions of edges and nodes, here using the Ridge Detection plugin for the image processing package ImageJ. 30,31 The image analysis algorithms are discussed in detail in the ESI. † Obtaining a set of nodes and edges for each image allows the same analysis methods to be applied for the simulated networks.
Images of biological networks produced by Barnard Table 1, and the complete data can be found in the ESI. † The experimental networks show a mean node coordination hki ¼ 2.78, with a standard deviation of 0.237. The average number of polygon sides hni is 6.88, with a standard deviation of 1.42. The data seen in Table 1 show slightly lower hki and larger hni than for randomised hexagonal networks (where hki ¼ 3 and hni ¼ 6).
However, some caution is required in the interpretation of the results obtained with these methods. Fig. 2 demonstrates the difficulty in assigning edges to an image. The experimental image (le panel, from Yurchenco and Furthmayr 35 ) is analysed using two different manual interpretations of the network. As the assignment of the edges can be ambiguous, there are many equally valid ways to select which edges are retained, and so different polygon structures arise. Here, for example, a small change in the algorithm identies more nodes with a small associated increase in the mean node coordination, hki, and reduction in the mean polygon side count hni (Table 1). Variations in hki are emphasised by an ambiguity as to whether an edge in a biological network is best described as being one biomolecule or two, which makes describing the fraction of 2coordinate nodes problematic. hki is also affected by the nite size of an image, because where an edge is terminated by the image boundary, we remove it from the calculations, reducing k for edge nodes. Missing edges, either from low contrast or when terminated by an edge, leads polygons to be counted as too large or to not be counted at all respectively. These issues are more signicant in smaller or lower-detail images, because the contrast leads to worse ring identication and a greater fraction of polygons are on the boundary.
The data extracted from the experimental images (Table SI 1 in the ESI †) indicate that the most useful metrics for describing these networks are the width of the node distribution (characterised by the second moment, m 2 (k)), and the assortativity r, as described by Newman. 15 For hexagonal inorganic networks, m 2 (k) ¼ 0, whereas all of the simulated and imaged networks have 0.3 ( m 2 (k) ( 1.2. The assortativity represents the similarity of neighbouring polygons, and is bounded as À1 # r # 1 with a negative value corresponding to polygons being neighbours to dissimilar polygons, and positive values indicating similar neighbours. An assortativity of r ¼ 0 means that there is no preference for similar or dissimilar neighbours, and is seen in Erdös-Renyi random graphs. If the assortativity were ruled by geometrical constraints alone in a random point process, we would expect r z À0.15. 16 Hard-disk simulations of colloids have shown positive assortativities of $0.4, 38 networks of biological reactivity have small negative assortativities, 15 whilst inorganic networks show negative values whose magnitudes depend on the details of the building blocks. 16 Although the polygon side count distributions (characterised by the mean number of sides, hni and its second moment, m 2 (n)) have been extensively used to characterise more regular   Fig. 2 The polygon structure of a collagen IV basement membrane as imaged by Yurchenco and Furthmayr. 35 The two images are different ways of highlighting the polygons in the same image, according to which ambiguous edges are included. Reprinted (adapted) with permission from Yurchenco and Furthmayr. 35 Copyright 1984 American Chemical Society.
This journal is © The Royal Society of Chemistry 2020 RSC Adv., 2020, 10, 38275-38280 | 38277 networks (see, for example, Ormrod Morley and Wilson 23 and references therein) the greater disorder inherent in the networks considered here makes these measures more difficult to assess. For example, hni is highly sensitive to nite size effects, whilst m 2 (n) appears equally sensitive to the presence of (rare) large polygons. Fig. 3(a) shows the variation of r as a function of m 2 (k) for the experimental networks, an effective "network landscape". The experimental networks show signicant range of {m 2 (k), r} values, with the majority (14 of 19) in the range À0.25 # r # 0. There is a weak correlation between the two metrics with r becoming less negative as m 2 (k) increases (see below).
To develop a systematic understanding of the range of networks formed, networks were generated using a wide range of potential model parameters. Network congurations were generated with k l and K q ranging from 0.01 to 100 reduced units, changing from networks dominated by bond strain to those dominated by angle strain. The main method of network generation was simulated annealing. First, the networks were thermalised at very high temperature (corresponding to 1 Â 10 4 temperature units), effectively accepting all physical Monte Carlo moves. The thermalisation continued for 10 000 steps at that high temperature, which led to complete melting. Aer thermalisation, the networks were cooled. The cooling proceeded by making n moves at a temperature T, then reducing T by a constant multiplicative factor and making another n steps. This was repeated until a pre-set minimum temperature was reached. One example simulation had 1000 moves made at T ¼ 1 Â 10 4 , then a further 1000 moves at T ¼ 1 Â 10 3.8 , and so on until T ¼ 1 Â 10 À2 . The logarithmic nature of the temperature scaling was necessary to access the wide range of temperatures between the melted and frozen states in a computationally efficient manner. Fig. 3 shows the range of different networks which can be generated. There is a great deal of uncertainty around each experimental point, owing to the ambiguity previously discussed. Fig. 3(b)-(d) show the evolution of r as a function of m 2 (k) for the simulated networks generated under different conditions. Fig. 3(b) shows the effect of cooling the networks for different lengths of time, but at the same rate. Networks that were cooled for longer are in blue, and it shows a reversal of the entropy driven effects evident in Fig. 3(b). This demonstrates the sensitivity of the networks to the annealing procedure. As networks are cooled for longer, the assortativity remains broadly unchanged, but the node distribution histograms become narrower; the highly coordinated nodes were enthalphically unfavourable because of the angular strain they induced in the nodes around them.
Finally, Fig. 3(c) shows the effect of limiting the maximum permissible node coordination to be in the range 2 # k # k max . For a low k max ¼ 4, the networks have narrow node distributions and polygon assortativities close to the random limit. However, as k max increases to 10, the widths of the node distributions increase and the networks become assortative. Nodes with large k were oen surrounded by narrow triangles to accommodate the angular strain. Since there are k neighbouring triangles around each node this leads to a positive assortativity. Fig. 3(d) shows networks generated using an alternative method to the simulated annealing protocol described above. Here, networks are partially thermalised at a range of xed temperatures with xed potential parameters k l and K q . At low temperatures the networks formed are highly correlated (disassortative) whilst at higher temperatures the networks formed are mildly assortative, corresponding to a partial demixing. There are two main areas in this landscape, found at the bottom-le (negative r and low m 2 (k)) and the top right (positive r and high m 2 (k)). The bottom-le region is occupied by networks generated at low temperatures, and corresponds to the presence of uncorrelated defects in the regular lattice structure. The defects are similar to those described as Stone-Wales defects in carbon networks, and since they are surrounded by ordered structures this leads to disassortative mixing. 39 The concentration of independent defects is a function of temperature, and hence an increased temperature leads to a larger m 2 (k) as more defects appear. At higher temperatures, in the top-right region, the networks are dominated by entropy. The width of the node distribution function increases to a maximum, which is discussed below, and the networks end up close to the random limit. Fig. 3(a) shows the combined metrics from Fig. 3(b)-(d) along with the full sets of results obtained by varying k l and K q . Fig. 3 The "network landscape" showing both the simulated and experimental networks. In subfigure (a) different markers represent sets of simulated results, and outlined circles represent experimental data from the references indicated. [32][33][34][35][36][37] Grey and pink squares represent the networks generated by the simulated annealing method with different bond and angle strength parameters. The subfigures highlight subsections of that landscape under different conditions with consistent markers to subfigure (a)-for example, triangles in subfigure (a) represent the same data as the triangles in subfigure (c). The subfigures are coloured according to which parameter was varied in the simulation. In (b), this is the minimum temperature T min the simulation reached. In (c), this is the maximum coordination number of an individual node k max . In (d), it was the fixed temperature T fixed the networks were heated at.
The experimental data, taken from ref. [32][33][34][35][36][37], lies at relatively low m 2 (k) and negative r. This low m 2 (k) and negative r region is occupied by relatively few simulated networks, with most simulated networks showing a larger m 2 (k) value. This is a combination of multiple factors that were explored in the panels of Fig. 3. Highly coordinated tangles such as those seen experimentally in Fig. 1(b) are relatively rare, and node coordinations above 4 are geometrically unstable with respect to splitting into two 3-coordinate nodes. This naturally limits the width of the node distribution histograms, and hence m 2 (k). Next, many biological networks are constructed in an ordered manner, or enthalpically driven in their assembly. For example, collagen IV in basement membranes has been hypothesised to have a square or a regular hexagonal structure. 27,40 The images analysed here could be of regular networks with defects, either from aging or from preparation for imaging. Finally, the position of experimental data shows that real networks are not close to the random limit. This matches up with our intuition, as maximised entropy limits the amount of useful work that can be extracted from a system, which is an unhelpful property for biological networks. We anticipate that future network models which take the enthalpic factors into account more accurately (such as having terms in the potential that disfavour demixing) will ll in the "valley" observed in the landscape.
The simulated networks show interesting physical properties, including demixing, which are not observed in the experimental images. Similar polygons tended to cluster together and become more convex as the networks were cooled for longer periods of time. This demixing is driven enthalpically, as it appears stronger at low temperatures. The enthalpic drive arises from accommodating angular strain, for example, fourcoordinate nodes are least strained when in a region of squares, and two-coordinate nodes are least strained when they are part of a long boundary between two large rings. Aer demixing, the networks show a positive assortativity because of this strain-driven preference for similar shapes to be adjacent. The potential slow evolution of structure has signicant implications for the network aging process. This could be important, for example, in the collagen-IV network in the eye.
In conclusion, the bond switching method developed here can generate 2D polygonal networks similar to those found in biology. It can generate networks with a wide range of polygon side counts and node coordination distributions, mirroring the diversity exhibited by biological networks. A number of experimental biological networks were studied using methods previously used for inorganic chemical networks, and techniques from network theory were applied to understand their short range structure. The biologically-inspired networks showed the importance of letting the node distribution vary, and were almost all disassortative. However, the simulated networks show a range of assortativities approximately in the range À0.2 # r # 0.2. This is an interesting result, as the extra degree of freedom allowed in the simulations by varying k allows the polygon networks to get closer to the maximum entropy solution (with r z 0), which is not possible in traditional bond switching approaches. 16 The enthalphic drive towards disassortative mixing in real networks is an interesting topic, and could be studied in future by improvements to the simplied potential model presented here. Analysing these biological networks presented challenges because of the ambiguity in identifying edges, and it is anticipated that the simulated networks can be used to help resolve this. The simulated networks can bridge the gap between the hexagonal networks favoured by inorganic materials and more disordered biological networks, as well as showing interesting physical properties of their own, such as demixing. The additional degree of freedom allows for more entropically-dominated polygon networks, which allows the assortativity to approach zero. Future work will build on these methods utilising improved potentials, and exploring further the role of entropy, in order to understand the factors which underpin the observed network landscape.

Conflicts of interest
There are no conicts to declare.