KOBRA: a fluctuating elastic rod model for slender biological macromolecules.

KOBRA (KirchOff Biological Rod Algorithm) is an algorithm and software package designed to perform dynamical simulations of elongated biomolecules such as those containing alpha-helices and coiled-coils. It represents these as coarsely-discretised Kirchoff rods, with linear elements that can stretch, bend and twist independently. These rods can have anisotropic and inhomogeneous parameters and bent or twisted equilibrium structures, allowing for a coarse-grained parameterisation of complex biological structures. Each element is non-inertial and subject to thermal fluctuations. The speed and simplicity of the algorithm allows KOBRA rods to easily access timescales from nanoseconds to seconds. To demonstrate this functionality, a KOBRA rod was parameterised using data from all-atom simulations of the Ndc80 protein complex, and compared against these simulations and negative-stain EM images. The distribution of bend angles and principal components were highly correlated between KOBRA, all-atom molecular dynamics, and experimental data. The properties of a hinge region, thought to be found at an unstructured loop, were studied. A C++ implementation of KOBRA is available under the GNU GPLv3 free software licence.


Introduction
Biomolecules often contain long elongated sections such as a-helices, coiled-coils and DNA/RNA double-helices. These motifs behave dynamically in ways that are much more complex than polymer chain models. They can have non-straight equilibrium structures, and anisotropic, inhomogeneous material parameters. Their global dynamics occur on timescales of microseconds to seconds, and they are frequently found in large, interacting, mesoscale systems, millions of atoms in size.
An example of such a biomolecule is the Ndc80C protein complex (Fig. 1), which forms part of the kinetochore, 1 and which anchors the microtubules of the mitotic spindle to the chromatids during cell division. 2 Ndc80C is comprised mostly of an intrinsically-bent coiled-coil around 60 nm in length, with a hinge region at its centre. 3 The extreme flexibility of many such elongated biomolecules means that we do not possess their fully resolved atomistic structures, however, mesoscale structural information is available from lower resolution methods, such as electron microscopy (EM). 1 To build a model which will enable us to understand the full functionality of the kinetochore, we require a simulation algorithm that can capture how the various components interact.
There are few existing algorithms that can help us at this scale. The large size and long timescales of the complete kinetochore system means that an all-atom simulation would be prohibitively expensive computationally. Coarse-grained MD offers a potential solution, 4 however, discrete coarse-grained MD does not scale effectively to systems the size of the kinetochore. As a coarse-grained MD system gets larger, the Fig. 1 Schematic diagram of the kinetochore, an example of a mesoscale biological system containing many slender, rod-like structures. 1

1-D rod models for biological objects
In this paper, we present a model specifically designed for dynamical simulations of the full variety of thin biological structures such as a-helices and coiled-coils, using a Kirchoff rod representation 6 and Brownian dynamics.
Elastic rod models are used as coarse-grained models for static modelling of proteins. For example, Wang et al. 7 use a Cosserat model to predict the relationship between strain and chirality, and Neukirch et al. 8 compute coiled-coil pitch and secondary structure. Linka et al. 9 use a Cosserat representation for constitutive models of coiled-coils, but do not perform any explicit dynamics. Here, a Cosserat rod is commonly used to represent the backbone, and attempts are made to try to create a sequences of residues that matches this structure. 10,11 Elastic rod models are also used to represent DNA. While these models represent DNA as Cosserat curves, they do not perform full dynamical simulations. Manning et al. 12 use Cosserat rods to compute the equilibrium configuration for arbitrary DNA sequences -effectively an energy minimisation. Their more recent work has abandoned rods in favour of using interconnected 2D 'sheets'. Smith et al. 11 use a Cosserat model with anisotropic parameters, but focus almost exclusively on analytical self-contact models for supercoiling and plectoneme formation.
Kirchoff rods 13 are also used for computational models of slender biological objects. Tobias et al. 14 use Kirchoff rods to determine the thermodynamic properties of large ensembles of circular DNA segments. They find exact, analytical equations for the forces and energies in Kirchoff rods, but only for inextensible, uniformly-curved circular rods. Prior et al. 15 develop a Kirchoff rod model for finding the bending and torsional response of helical birods from the bending and torsional responses of the constituent rods.
At larger length scales, Cosserat and Kirchoff elastic rod models are commonly used for macroscopic biological objects. An example is the STRANDS package, 16 which is designed primarily for surgical simulations.
In the next section we set out a general-purpose Kirchoff rod simulation that can incorporate the properties of thin biological objects. Section 3 describes a protocol to set the physical parameters of these rods to match the dynamics of atomistic simulations. Finally, we validate these simulations against both idealised rod models and experimental and atomistic simulations for the dynamics of Ndc80C.
Our goal is to produce a model with the following specific features, in order to capture the essential dynamics of rod-like biomolecules: It must be able to incorporate thermodynamically-correct thermal fluctuations, which are essential to the dynamics of mesoscale biological systems.
The rods should also be able to undergo bending, torsion and extension.
Rods should be able to correctly represent 'hinge regions' in biomolecules, and so must also allow for elastic properties that are inhomogeneous and anisotropic.
The model must allow for a non-straight equilibrium structure. Many biological objects (such as tubulin, which is found in the spindle microtubules) are bent in their equilibrium configuration, and this is integral to their function.
It should be computationally inexpensive, allowing simulation of the equivalent of millions of atoms for timescales up to seconds.
Although, some important biopolymers, for example actin and microtubules are effectively inextensible, this limit is not straightforward because the inextensibility constraint requires that the random forces must be confined to space in which the system motion is constrained. Algorithms for achieving this have been developed by Grassia and Hinch, 17 Morse 18 and Liverpool. 19 2 Definition of the rod model

Kirchoff and Cosserat curves
Almost all models of flexible thin rods describe their configuration in terms of the Frenet triad and Frenet-Serret relations. 20,21 A rod is represented as a continuous material curve. Each point on the curve has three orthonormal vectors associated with it: the tangent vector, the normal vector, and the binormal vector (Fig. 2). A curve with normal and binormal vectors (also called director vectors) is said to be a directed curve.
Most rod models use a formulation by Kirchoff,13 or by the Cosserat brothers 6 to describe their energies. Elastic rods that are based on these curves are called Kirchoff or Cosserat rods.

Rod construction and notation
We construct a discretisation of the curve shown in Fig. 2 as a sequence of extensible straight rod sections connecting a set of discrete nodes (Fig. 3).
This discretisation is based on the discretisation from Bergou et al., 22 but substantial modifications have been made in order to better support modelling of biological molecules: the addition of extension energy, the removal of inertia, the addition of viscous drag, the addition of thermal noise, changes in the bend energy formula to support hinge regions, and changes to the twist energy formula to support arbitrary equilibrium twisting. These differences will be discussed in greater detail in Section 2.4. The nodes are located at positions r i , where i A {0, N À 1}, so that there are N À 1 segments, with endto-end vectors p i = r i+1 À r i , and the unit vector along the rod To represent the internal twisting of the rod, we define the material axes, m i and n i , to be perpendicular to l i , such that 8l i 8 = 8m i 8 = 8n i 8 = 1 and l i Ám i = m i Án i = n i Ál i = 0. The choice of initial direction of m i and n i is arbitrary except that they must obey these constraints. However, these vectors are used to represent the relative twisting of rod elements, and form the basis of a local co-ordinate system which rotates as the rod rotates. The local material properties also rotate with the rod, so the directions of m i and n i are also used to define the locally anisotropic material properties. A method to initialise a rod which is untwisted at equilibrium is provided in eqn (31).
Finally, to describe rods with an arbitrary equilibrium state, we define equilibrium values e p i and f m i , the configuration from which the rod is deformed.

Parallel transport
We use the concept of parallel transport to distinguish between changes in the orientation of the material axes arising from the curvature of the centre line and those associated with twist about the centre line. The easiest way to find the twist between two sets of material axes is via the 'parallel transport' of the material axes of one element onto the other.
For two unit vectors a and b we can construct a rotation matrix R that rotates a onto b: and To parallel transport the material axes of the ith segment onto the (i + 1)th segment, we construct the rotation matrix R, which rotates the normalised segment l i onto the normalised segment l i+1 . 23 We then apply that matrix to the material axis m i to obtain the vector m i 0 , The process of parallel transport is illustrated in Fig. 4, where m i 0 is the result of the parallel transport of the material axis m i onto the p i+1 th segment. This allows the relative rotations of the material axes for element i and i + 1 to be compared in a manner that removes the bend between the elements.

Elastic energy of deformation
We calculate the forces and torques on the rod from the gradients in the elastic energy arising from deforming the rod away from its equilibrium state. This elastic energy is   composed of three components -extensional, torsional, and bending.

Extensional deformation.
We assume that the extensional deformation is sufficiently small that extensional elastic energy remains in the linear regime, which, for a coiled-coil persists up to an extension to 120% of its original length. 24 Thus, the elastic energy of a single segment due to extension is given by: where |p i | is the length of the ith segment and |p i | is its equilibrium length (Fig. 5). The value of the spring constant k i for the ith element can be written as, where k s,i = YA, the product of the effective Young's modulus Y and cross-sectional area A. Note that k s,i is therefore a property of local molecular structure, and so is independent of the rod discretisation, whereas the spring constant k i depends on the discretised element length.

Torsional deformation.
To compute the torsional energy, we need to measure the degree of twist from one rod segment to the next, as shown in Fig. 6. We define an angle of rotation, Dy i , between the material frames of the rod segments i and i + 1, such that where m i+1 is the material axis of the (i + 1)th segment, m i 0 is the material axis of the ith segment parallel transported (eqn (5)) onto the (i + 1)th segment, and l i+1 is the unit vector along the (i + 1)th segment. We use the arctan2(y,x) function, which returns the value of arctan y x , but uses the signs of the parameters to determine which quadrant of the Euclidean plane the angle resides in, thus avoiding a potential sign error. Within the linear elastic regime, the torsional energy at node i arises from the rotation between the material axes of the adjacent segments: Here b i is the torsion constant (analogous to the stretching constant), Dy i and f Dy i are the angles between the material frames in the current and equilibrium configurations, and Note that eqn (8) differs from Bergou et al., 22 who chose to define the equilibrium such that f Dy ¼ 0. The twist energy does not account for the number of turns between two elements being 41, so the discretisation should be chosen so that Dy is confined to the range Àp o Dy o p. By choosing a periodic function in eqn (9) we ensure that the energy remains continuous.

Bending deformation.
To obtain the bending energy ( Fig. 7), we compute the curvature binormal, (kb) i , which defines the change in orientation between two segments due to bending: (kb) i is orthogonal to both of the segments, with a magnitude equal to twice the tangent of half of the angle between them: where p i and p iÀ1 are the ith and (i À 1)th segments. For a rod with an isotropic bending stiffness, the curvature binormal (kb) i is sufficient to calculate the bending energy. However, for a rod with an anisotropic bending stiffness, we need to resolve the components of bend with respect to the local material axes. In our discretisation, the material axes are properties of the elements, whereas the curvature binormal is defined at the node. To deal with this discrepancy, Bergou et al. average the energy arising from projecting the curvature binormal of the node onto the material axis of each the two adjacent rod Changing the length of an element p i creates a stretching energy E stretch . Note that the stretching energy is a property of an element, and that moving a node results in two elements having different stretching energy (e.g. moving the r i+1 th node results in changes in the stretching energy from both the p i th and the p i+1 th node).

Fig. 6
The energy of torsional deformation is defined between two elements (about a node). Therefore, applying twist to the material axes m i and n i will affect the twisting energy about the r i th and r i+1 th nodes. The rod is normally not straight (as shown here), so parallel transport (Section 2.3) is used to ensure that both material axis vectors are in the same basis.
elements. However, this has the effect of averaging the anisotropic bending modulus between the material directions of the two elements. Such averaging is undesirable if, for example, there is a local 'hinge' with strongly localised flexibility, as averaging between elements will remove the low modulus in the flexible direction. Therefore, to retain the functionality of a localised hinge, we need to define a material axis at the node. We do this by defining an intermediate segment, centred on this node, from a weighted average of the two segments on either side of this node, called the mutual segment (Fig. 8).
We define the unit vector of the mutual segment as, and l i and l iÀ1 are the unit vectors of the segments on either side of the node. We use an inverse weighting of element lengths in the average so that the orientation of the mutual material frame is more affected by the shorter of the two adjacent elements (whose centre is closer to the node). Next, we form the mutual material axis at the node by first parallel transporting the material axes of the two adjacent elements onto the mutual segment, and then by averaging these two material axes, weighted according to inverse element length, and then normalising to give: By projecting the curvature binormal into the components of the material axes of the new mutual segment, we obtain the material curvature x, the amount of bending in the local material axis frame, where n m = m m Â l m , and kb i is the curvature binormal about the ith node.
Assuming that the material properties of the rod are such that torque and curvature are linearly related, the centreline curvature can be used to define the bending energy similarly to the previous energies -as a quadratic potential centred about x m i , the equilibrium material curvature: where L i is given by eqn (10) evaluated for the equilibrium configuration, and B i is the bending stiffness matrix in the local material axis frame, a positive definite 2 Â 2 matrix. This formulation does not assume that p iÀ1 and p i are in similar directions, so it can account for arbitrarily large bend angles about a single element.

Dynamical equations
The changes to the configuration of the assembly of rod elements is defined by the changes of node positions, Dr i and rotations Dy i , which determine the rotations of the material axes orientations (m i ). The translational motion of each node is given by a stochastic equation of the form where Dt is the simulation timestep, M i is the mobility tensor of the rod segment in the fluid medium, F i is the internal elastic force, and f i is the random force from stochastic thermal noise. This equation assumes that the system is overdamped, so that the inertia of the rod is negligible. We also neglect hydrodynamic interactions between the rod elements, but include a resistive (damping) force for the motion of an isolated rod element against a background fluid medium. This is therefore a Brownian equation of motion. For a general axisymmetric body the mobility tensor will be anisotropic. However, for simplicity we have approximated the mobility as being isotropic and equal to mobility of a sphere of a radius, a i equal to half the equilibrium length of each segment: 25 Fig. 7 A bend about a node increases the length of the curvature binormal associated with that node. For an isotropic, untwisted rod, the bending energy is proportional to the square of the curvature binormal. Note that the bending energy is the property of a node, not of an element, and that moving a single node won't just affect the bending energy about that node, but the two nodes on either side of it as well. where z i = 6pma i and m is the dynamic viscosity of the medium. Similarly, for the rotational motion of the material axes, where g i is a random thermal torque, t i is the torque due to the internal elastic forces (23), and z y is the corresponding friction constant for rotation. For the rotational friction constant we use the rotational drag on cylinder of radius a and length | À p i |: For a particular configuration of nodes and material axes, the forces acting on nodes (F i ) and the torques acting on material axes (s i ) can be determined from the partial gradients of the energy: where F i is the force on the ith node at r i due to the energy gradient, and where s i is the torque resulting from the energy gradient of segment i, l i is the unit vector representing the axis of rotation, and the total internal energy, E = E stretch + E twist + E bend , where E stretch , E twist and E bend are obtained by summing the node and element contributions to stretch, twist and bend energies from eqn (6), (9) and (17). Due to the complexity of these formulas we compute the gradient numerically using central differences, where x is some co-ordinate (position or angle) in the system. When a node is translated, or when a material axis is twisted, it changes the energies of the two nodes on either side of it. Fig. 5-7 display the affected geometry in orange. Therefore, computing the gradient in energy at node i requires information about a 5-node window of the rod between r iÀ2 and r i+2 . When a node is translated, the elements and the material axes on either side of that node change. The element is updated according to the discretisation given in Section 2.2. The material axis is parallel transported (Section 2.3) from the old rod direction to the new one. More details are available in Section 1 of the ESI. † Using the drag given by eqn (19), the force acting on each node from thermal noise is given by the fluctuation dissipation theorem, 26 where T is the temperature of the system, z the friction constant, Dt the timestep, k B is Boltzmann's constant, and R is a random vector, where R x , R y and R z are independently drawn from uniform distributions in the range À0.5 r R x,y,z r 0.5, such that hRi = 0 and R i R j It would be possible to include hydrodynamic interactions by replacing eqn (18) for each segment with a single equation for the entire rod with a single mobility matrix including the hydrodynamic coupling between the elements. However, this would result in replacing z i in eqn (25) with the inverse of the mobility tensor, which is computationally more expensive. If the rod remains nearly straight that the effect of hydrodynamic interactions can be approximated through a difference in the local mobility parallel and perpendicular to the rod axis.
The rotational thermal torque is given by where T is the temperature of the system, z y is the torsional friction, Dt is the timestep, k B is Boltzmann's constant, and R is a uniform random variable in the range À0.5 r R r 0.5.
To apply the rotation Dy to the material axis, we use Rodrigues rotation formula: v rot = v cos Dy + (k Â v)sin Dy + k(kÁv)(1 À cos Dy) (27) where v rot is the resultant vector, Dy is the angle to rotate, v is the original vector and k is the axis of rotation. In this case, we would be rotating the vector m i about the axis l i .
For information regarding the structure, implementation and performance of this algorithm, see Section 2 of the ESI. †

Parameterisation of rods from atomistic molecular dynamics
In this section, we apply the general model for rod-like biomolecules described in Section 2 to the specific case of the Ndc80 protein complex. Ndc80C is a long, flexible molecule comprised of two coiled-coil sections joined by an unstructured loop. As a consequence, this molecule has both a non-straight equilibrium shape and an inhomogeneous, anisotropic bending modulus, and so allows us to test all of the features of this model. A cartoon depiction of the Ndc80C structure is shown in Fig. 9, showing the connectivity of Ndc80C's four sub-units (Ndc80 in blue, Spc24 in purple, Nuf2 in yellow and Spc25 in green) along with the positions and residue numbers of the cross-links.
First, we will construct an all-atom representation of Ndc80C. Then, we will extract coarse grained rod model parameters from the dynamics of a simulation of the all-atom Ndc80C. Finally, we will compare the results from these models to each other and to measurements acquired from negative stain EM images. 27

Creation of an all-atom Ndc80C model
All-atom molecular dynamics software packages such as AMBER 28 and GROMACS 29 are commonly used to study the dynamics of biological molecules. 30  simulations consider the position and connectivity of every atom (and hence, every residue) contained within a given molecule. The dynamics are computed using a force field, from pre-computed potentials that describe every atom and bond in the system. Therefore, to run an MD simulation of Ndc80C, we need to construct a model of the structure by combining information from experimental studies. As a starting point, we use the so-called 'Bonsai' Ndc80C molecule 3 (2VE7). This molecule contains both globular domains and heavily-truncated coiled-coil region 85 Å in length.
Ciferri et al. 3 provide the positions of cross-links between the two coiled-coils and a prediction of the entire secondary structure by type: a long coiled-coil with a small, unstructured loop at the centre. They also indicate the relative residue numbers at which these features occur as shown in Fig. 9.
To construct an all-atom model of Ndc80C, we use the spatial parameters (such as the pitch and coiled-coil radius) extracted from the Bonsai Ndc80C combined with the residue sequence from the full-length molecule. We remove the globular regions at both ends and retain the central coiled-coil region. Details of the sequences are available in the ESI. † We use ISAMBARD, a software package designed for modelling, analysis and parametric design of proteins, 31 to extract the coiled-coil parameters -such as pitch, radius, crick angles, and fC a values -from the coiled-coil present in the Bonsai molecule. The radius was found to be 7.62 AE 0.12 Å, the pitch 203 AE 26 Å. ISAMBARD is then used to create a coiled-coil backbone with these parameters out of the full Ndc80C residue sequence. This structure has no side-chains, so we recreate the side chains from the residue sequence using SCWRL4, a protein side chain predictor. 32 We use the QUARK ab inito protein structure prediction server 33 to create predictions of the structures in loop region from the residue sequence. These structures were peptidebonded together with the two coiled-coils using UCSF Chimera. 34 The overall length of the generated Ndc80C is 48 nm.
Finally, we refine the resulting model using ModRefiner, 35 and minimise it using the YASARA minimisation server. 36 This brings the protein, which has been created with somewhat arbitrary bond angles, closer to its native state. We then run simulations of this structure using AMBER 28 with an implicit (GBSA) solvent, for 50 nanoseconds. The simulation trajectories and input files are available in the ESI. †

Model building
To construct a rod model for Ndc80C, we need to obtain suitable values for the physical parameters k s , b and B for each node and element. These parameters are obtained from Ndc80C's dynamics -we run an atomistic simulation of Ndc80C and aim to select rod parameters that reproduce the same local fluctuations in shape.
We first map the all-atom MD trajectory directly onto a coarse-grained trajectory for an equivalent rod model. We define the position of nodes in this rod model by averaging over the positions of small clusters of atoms: where r r i is a rod node at index i, k min and k max are the atom indices denoting the edges of the ith cluster, and r a k is the position of the atom at index k. Using only small clusters means that the rod configuration is not unnaturally smoothed during coarse-graining. The all-atom Ndc80C model was coarsegrained to 14 nodes from 2668 atoms. The optimal length for rod segments depends on the system being discretised. It should be as coarse as possible, while being fine enough to capture information about the implicit shape and dynamics of the molecule being modelled. In the case of Ndc80C, the rod should be fine enough to resolve the molecular hinge, but each element should be no shorter than a turn of the coiled-coil, as at this resolution, it would be necessary to represent the coiledcoil explicitly using two rods.
Each node was averaged from a 10-atom wide cluster according to eqn (28) (see Fig. 10). This number is also system-dependent, for Ndc80C it was chosen such that there is more than a factor of 10 difference between number of atoms per rod element and the number of atoms being used to set a node position.
Having defined the nodes, we now need to define the material axis of each coarse-grained segment in order to measure twist. One method for doing this is to first calculate the vector q i between the opposite atoms in the two chains of the coiled-coil (Fig. 11): (29) Fig. 9 Cartoon depiction of the Ndc80C protein complex. 3 The colours distinguish the four connected proteins in the complex, and the black lines denote cross-links at their respective residue numbers.

View Article Online
where r A j is the position of the jth atom in chain A, and r B j is the position of the atom in chain B which is closest to r A j at equilibrium. In practice, for each rod we select a 10 atom pairs which are located halfway between the atoms defining two nodes at the end of the rod; we take the average q j from those atom pairs, as indicated in Fig. 11. A material axis m q,j can then be defined from the orthonormal projection of this vector from the normalised element axis, l j , However, the material axes of adjacent elements obtained from eqn (30) may point in quite different directions, and although the above equations for calculating deformation energy permit this, it is practically better (e.g. when defining the bending stiffness matrix) to maintain similar directions for the material axes of adjacent elements. The material axes of two adjacent elements can be considered to be in the 'same' direction if the material axis m i of element i, when parallel transported onto element i + 1, points in the direction of the material axis m i+1 of that element.
Consequently, we use the following procedure for calculation of material axes. We begin in the equilibrium configuration of the atomistic model. In this equilibrium configuration, we obtain the material axis m 0 of the first element according to eqn (30). We then generate the material axis for all other rod elements in the equilibrium configuration by parallel transport, by iterating along the rod: For each element i we can also calculate a candidate material axis m q,i from the local atomistic positions according to eqn (30). In the equilibrium configuration we may compare the material axis m i obtained by parallel transport with the material axis m q,i . These two vectors will in general differ by some rotation angle g Dy u i , which we then store. Then, during the subsequent dynamics of the MD trajectory, at any moment we can generate the instantaneous m q,i according to eqn (30), and calculate the instantaneous material axis m i by rotating m q,i about the rod by signed angle g Dy u i , using the Rodrigues rotation formula (eqn (27)).
The Python script that builds this model is available in the ESI, † and as part of the FFEA software package. The model, as built, is illustrated in Fig. 12.

Model parameterisation
Having mapped the MD simulations onto a trajectory for an equivalent rod model, our strategy for model parameterisation is to select parameters such that the local mean square fluctuations in bending, twisting and extension match those observed in the equivalent rod model from the MD trajectory. From this trajectory, we therefore compute (for each rod, or node, i) the mean square fluctuations in rod length h(|p i | À |p i |) 2 i, and in twisting angle hDy i 2 i where Dy i is the difference between instantaneous twisting angle and its time average value. Since bending is parameterised by a two-component vector x m i (see eqn (16)), fluctuations in bending are parameterised by a covariance matrix for bending fluctuations, for node i: where Do m i,a is the difference between a component of the bending vector x m i and its time average value (where a = 1 or 2). We aim to parameterise the dynamical rod model so as to match the above mean square fluctuations. As an initial estimate for the model parameters, we note that the energies defined in eqn (6), (9) and (17) are all quadratic in the respective parameters. We might expect fluctuations to be distributed normally and the equipartition theorem to apply. This produces initial estimates for the parameters as: However, the above results neglect the three dimensional nature of the rod model dynamics, and the non-linear  geometrical effect this has on the probability distributions of rod model variables. A uniform density of states for the x-, yand z-coordinates of node positions does not equate to a uniform density of states for rod length and bending coordinates. As a result, the probability distributions for these variables are typically perturbed from their initially expected normal distributions (one can think of this as an extra entropic contribution to the free energy of fluctuations arising from the density of states in the space of the bending or rod length coordinates).
We have found this problem to be especially prominent for the bending fluctuations. If we use eqn (35) for the bending stiffness matrix, then typically the mean square fluctuations in the resulting rod model do not match those expected. However, they are sufficiently close that an iterative scheme is able to rapidly converge on the correct parameterisation. For a given node i, suppose we are aiming to reproduce bending fluctuations C target as measured from the all atom MD simulations. Running a KOBRA model built using the bending energy matrix B old , given by application of eqn (35), will result in a trajectory with observed fluctuations C old which will in general be different from C target (even after reducing statistical errors by running a long trajectory). By considering the expected change in C from a small change in B, we can obtain a new guess for an optimal bending energy matrix as: A KOBRA simulation run using this new guess at the bending energy matrix will result in bending fluctuations closer to the target value. If necessary, the iteration can be repeated, but we have found that a single iteration is usually sufficient to satisfactorily reproduce the target bending fluctuations within typical additional error produced from statistical sampling of trajectories.
Validation of equipartition of energy, and extraction of bending parameters from dynamical simulations, are described in more detail in Sections 3 and 4 of the ESI. †

All-atom parameterisation results
Using the method described in Section 3.3, and the model from Section 3.1, we can determine a set of rod parameters for Ndc80C. The unstructured loop acts as a flexible hinge linking together stiffer coiled coil regions of the molecule. The average values of these constants for the entire molecule (including the flexible hinge) are given here in Table 1.
In addition to making it more flexible, Ndc80C's hinge region means that it is slightly more susceptible to bending in one axis than in the other. This hinge is localised to a small region in the centre of Ndc80C, so we need to examine the parameters extracted on a per-node basis. Fig. 13 indicates the two eigenvalues of B i as a function of the node number. It shows a localised region of decreased stiffness between nodes 4 and 6 in the parameterised rod model, which corresponds to the unstructured loop region (the hinge) in the atomistic model. This hinge is also observed in the negative stain EM imaging of Ndc80C, 27 which will be examined in more detail in Section 4.2.1.

Comparison of dynamics with atomistic molecular dynamics and experimental data
The parameterisation from Section 4.1 was used to create a KOBRA rod model and run a simulation comprised of 2 Â 10 7 steps, with a timestep of 10 ps, for a total simulation time of 200 ms. Note that this is more than three orders of magnitude longer than the atomistic simulations.
4.2.1 Comparison of molecular kink angles. Wang et al. 27 use negative stain EM to obtain 2D images of a variety of Ndc80C conformers. They force the Ndc80C molecules to lie on a flat carbon support, and measure the 'kink angle', defined as the angle between the two halves of the molecule, before and after the unstructured loop. This provides a distribution of angles, based on 83 observations, and so gives an experimental measurement the flexibility of Ndc80C.
From both the KOBRA rod and MD atomistic trajectories, the kink angles can be measured using the end-to-end vectors p a and p b of the two halves of the coiled-coil region (i.e. from one end of the coiled-coil to the loop/hinge region, and then from the loop/hinge region to the far end of the coiled-coil):  Fig. 13 The eigenvalues of the bending energy matrix B i for the rod calculated from the atomistic Ndc80C trajectory. Here, the two lines represent the maximum and minimum eigenvalues, not any particular axis.
This angle is computed for each frame in the trajectory, then binned in increments of 10 degrees in order to preserve parity with the experimental data, and the resulting distribution normalised (see Fig. 14). The distribution of kink angles for the KOBRA rod and atomistic trajectories are very similar. Both possess a broad maximum of bend angles between 25 and 75 degrees centered on 50 degrees, that falls off rapidly for angles above 100 degrees. This occurs because of the density of states from projection of the two possible bending directions onto a single bend angle. Since the distribution of bend angles at each node from the MD trajectory was used to parameterise the KOBRA rod simulation this can be interpreted as an indication of a successful parameterisation.
The distribution of the experimental angles retains the same general features. The experimental distribution does not replicate the maximum at 50 degrees. However, this discrepancy arises in part because the experimental resolution was not able to resolve bend angles less than 30 degrees -so all angles in this range were inserted into a single bin. Otherwise, the experimental data broadly replicates the features of the distributions found from simulation although the range of the bend angles is slightly broader. It should also be noted that the experimental distribution is constructed from only 83 samples, compared to 10 000 frames for the rod trajectory and 50 000 frames for the atomistic trajectory.
The rod simulation trajectories and input files are available in the ESI. † 4.2.2 Comparison of principle components. To understand which types of motion are conserved between the all-atom simulation and the KOBRA simulation, we can analyse the trajectories from the two simulations using principal component analysis (PCA). 37 PCA is a statistical method used to reduce the dimensionality of molecular dynamics data sets, by identifying the principal eigenmodes of motion (the principle components) resulting from thermal (or other) fluctuations. Each eigenmode represents a different structural deformation. The principal components are ordered from the largest to the smallest value of the associated eigenvalue (and thus, amplitude of the component). PCA was performed using the software package pyPcazip. 38 Fig . 15 shows the dot product matrix comparing the rod and all-atom simulations. Each cell shows the dot product of the eigenvector of the rod mode with the corresponding all-atom mode. A dot product close to one means that the eigenvectors are highly correlated. The eigenvectors are sorted by eigenvalue size, so if the matrix is diagonal, it also means that the relative magnitude of the various modes of bending are in the same order. The highest correlation is between the first modes, which correspond to bending about the hinge. Although diagonal elements of the matrix dominate, there are some significant off-diagonal components, which could be evidence of mode mixture, particularly between mode 2 and 4. For reference, when assessing correlations between eigenmodes from multiple all-atom trajectories, these normally display a significantly smaller degree of diagonal correlation between PCA eigenmodes from separate runs on the same model 5 than is shown in Fig. 15.
To observe this mode mixture, and visualise the motion represented by the first few PCA modes, we can create 'PCA animations', that show the range of motion corresponding to each eigenmode. The PCA animations are created using the following formula: where r anim i is the animated node, r i is the original node, l is the eigenvalue for that node, v the eigenvector, and f is a scalar between 0 and 1. The resulting figure shows how the nodes oscillate within that principal component.   the dot product matrix suggests there is mixing between these modes. For example, modes 1 and 2 of the atomistic trajectory show motion on opposite sides of the hinge, whereas in the corresponding rod nodes, this motion is distributed evenly between both modes.
Together, Fig. 15 and 16 indicate a strong correlation between the motions involved in the atomistic MD simulation and the KOBRA rod trajectories. The degree of correlation is typical (and in fact good) in comparison to PCA mode correlations between similar MD simulations, or between MD and coarse grained simulations. One reason for the discrepancy in correlation is the length of the MD trajectories, which are necessarily short (due to computational constraints), and therefore do not sample a large number of different global configurations (the large scale modes) of the molecule, giving rise to a statistical sampling error. It should be noted that in deriving parameter values for the coarse grained rod model we have used local information, such as the variation in local bend and twist angles, rather than the variation in global configurations. For short MD trajectories, matching the local bend and twist variation does not necessarily translate to matching the (statistically limited) variations observed in global configurations.
Animations of each normal mode and pcz-format trajectories are available in the ESI † provided with this paper.

Conclusion
The KOBRA algorithm provides an efficient simulation technique for elongated biomolecules that produces dynamical rod trajectories and conformers with bend angles that closely resemble all-atom MD and experimental data. While here we have focused only on Ndc80C, KOBRA is designed to be a general model for slender biological objects such as coiled-coils and alpha helices. The current version provides a robust coarsegraining methodology designed to capture the dynamics of slender biological objects, which can have anisotropic bending moduli, inhomogeneous bending, stretch and twisting parameters, and arbitrary equilibrium configurations. In addition, it provides tools for model building and a large number of tests for validity. KOBRA is also performant enough to handle simulations of extremely large systems (of order (100 nm) over long timescales (of order 1 ms)), such as the kinetochore.
In the future, we will couple the rod-like KOBRA objects with three-dimensional FFEA objects, in order to construct biomolecules comprised of both globular and filamentary structures. In particular, this will enable us to construct a model of Ndc80C featuring the globular domains at each end, and potentially larger systems including more kinetochore proteins. We will also intend to implement long range hydrodynamics.
The software implementing the KOBRA algorithm is available as part of the FFEA software package (ffea.bitbucket.io), and is free to use and modify under the conditions off the GNU GPLv3 software licence. This includes an implementation of the algorithm, the tools used to convert atomistic trajectories to rod structures, parameter extraction, all of the analysis tools, and a simple visualiser. Developer documentation is also available for those who want to add features or implement KOBRA in their own software.

Conflicts of interest
There are no conflicts to declare.