Carmen
Domene
*ab,
Christian
Jorgensen
a and
Sumra Wajid
Abbasi†
a
aDepartment of Chemistry, King's College London, Britannia House, 7 Trinity Street, London SE1 1DB, UK. E-mail: carmen.domene@kcl.ac.uk; Tel: +44 (0)207848754
bChemistry Research Laboratory, University of Oxford, Mansfield Road, Oxford OX1 3TA, UK
First published on 17th August 2016
Collagen is the single most abundant protein in the extracellular matrix in the animal kingdom, with remarkable structural and functional diversity and regarded one of the most useful biomaterials. Etymologically, the term collagen comes from Greek kola ‘glue’ and gen ‘giving birth to’. Thus, it is not surprising that the various collagens and the structures they form all serve the same purpose, to help tissues withstand stretching. Among the functions the various collagens are involved in are cell adhesion and migration, tissue repair, scaffolding and morphogenesis. Thus knowledge about the structure and properties of collagen, how they change depending on the nature of the local environment as well as the nature and specificity of collagen interactions with its partners is central to discerning the role of collagen in medical applications such as imaging, drug delivery and tissue engineering, and in the design and construction of synthetic collagen-like materials for tools in biomaterial science and nanotechnology. The main focus of this perspective is to review the molecular and packing structures of collagen and the computer simulations work performed up to now to further highlight the significance of collagen.
The molecular organization in the collagen fiber was also later determined and the axial repeat usually designated as the D period was established. D-periodic fibrils contain intermolecular covalent cross-links that lead to their high tensile strength and mechanical stability.16
In 1994, Berman and co-workers reported the first high-resolution crystal structure of triple-helical collagen-related peptides17–20 and subsequently, several other high resolution crystal structures of oligopeptides related to collagen as well as synthetic mimics were determined.
The umbrella term ‘collagen’ covers all proteins that form a right-handed three polypeptide three-dimensional complex (Fig. 1). This denomination is used for all the members belonging to the collagen family characterised by varying tissue distribution, size and function.1,21–25 Collagen is conserved in the animal kingdom, and it is a key component of the majority of tissues accounting for about 20–30% of total body proteins.26,27 Collagen is a primary component of bones, muscles, skin and tendons of vertebrates supporting delicate organs.28–34 Collagen types are classified in several sub-families according to sequence homologies and to similarities in their structural organization and supramolecular arrangement such as fibrils, networks and filaments (Scheme 1). To date, at least 29 different types of collagen have been identified with the designation I–XIX,35 along with additional proteins that have collagen-like domains including adiponectin, C1q macrophage receptor, acetyl cholinesterase, conglutinin, ectodysplasin collectin or ficolin among many others (Fig. 2).1 Crucially, insight into collagen–protein interactions will facilitate the progress in new approaches in drug discovery, targeting and delivery.36 Collagen types I, II and III account for the majority of collagen in the human body, and about one half of the total body collagen is in the skin, and about 70% of the material other than water present in dermis of skin and tendon is collagen.37 In contrast, invertebrate collagen genes encode only for fibrillar and basement membrane collagen.38 The presence of over a hundred bacterial protein sequences containing the collagen characteristic domain in genomic databases has been reported, which suggests a whole new family of collagen-like proteins.39
![]() | ||
Fig. 2 Example of homotrimeric (PDB id 4AE2) and heterotrimeric (PDB id 1PK6) collagen-like proteins. Each monomer is coloured differently. |
Therefore, understanding how the relatively simple molecular building blocks of collagen self-assemble is imperative for artificial tissue development, growth, regeneration, and disease, cosmetics formulation, pharmacology or in its application in plastic surgery and medicine in general.37
Collagen I, IV, V, VI, IX, XI are heterotrimers, and the remaining are homotrimers. Among the 29 different types of collagens characterised up to now, type I, II, III, V, and XI are fibril-forming collagens Collagen fibrils are characterised by remarkable strength and stability conferred by its self-aggregation and cross-linking; for instance collagen type I fibrils are stronger than steel. These fibrils assemble into well-structured supra-molecular linear aggregates of length >1 μm with a characteristic supra-structure. Fibrillar collagens contain a relatively high content of charged residues (∼15–20%) and a small percentage of hydrophobic residues (∼6%).36 Fibers can form hydrogels, films or sponges. Several models have been also proposed for the arrangement of the tropocollagen within collagen fibrils. One of the initial ones was by Hodge and Petruska.47 In this model, five tropocollagen molecules are staggered side-by-side with an offset of 67 nm between two neighbors as revealed by transmission electron microscopy. Subsequently, Schmitt et al. proposed that 234 amino acid residues is the period of the helix. In other words, the length of the tropocollagen molecule is about 4.4 times that of the native collagen period.48 Not all collagens occur as periodic-structured fibrils though. However, the model cannot describe the spatial extension of the quarter staggered in two or three dimensions. This model is known as the Quarter staggered stacking model. Afterwards,48 the Smith model was proposed where five tropocollagen molecules are arranged concentrically into a hollow filament, known as the microfibril.49 Its limitation is that it cannot predict the organisation of fibrils with a diameter greater than 3.5 nm. Next, Hulmes and Miller proposed the Quasihexagonal packing model where periodic tropocollagen molecules were assigned the character of a molecular crystal and without microfibrillar sub-structures.50 The compressed microfibril model followed.51 In this model, five-stranded microfibrils are compressed to place the molecules in fibril cross-section on a pseudo-hexagonal lattice, and in the longitudinal direction the molecules are supercoiled with a left handed twist. Using X-ray diffraction, Orgel et al. presented the microfibrillar model which supported a microfibril structure composed by five staggered tropocollagen molecules arranged with a right-handed tilt, rather than just axially staggered.33 This model seems to be the one that best fits the native X-ray diffraction data and other experimental observations on the organization of the molecular segments in the overlap region of the fibril.
Type I collagen of fibril-forming collagens makes up over than 90% of organic mass of bones, as well as being the main collagen in a number connective tissues, including skin, tendons, cornea, ligaments, as well as vitreous body, brain, cartilage and hyaline tissues.2 Type II collagen makes up over 50% of all protein in cartilage and 85–90% of collagen of articular cartilage.2 Type III collagen is essential for type I fibrillogenesis. Normal type I collagen is a heterotrimer triple-helical linear molecule consisting of two α1 chains and one α2 chain.52 In contrast, type II and III form homotrimers, and can assemble into globular homotrimeric domains.
The fibril-associated collagens with interrupted triple helices (FACIT) subclass, comprise collagen types IX, XII, XIV, XVI, XIX, and XX; these structure are short non-helical domains.2 Type VI is a heterotrimeric collagen with small helical domains and stretched globular termini.53,54 Stretching the α3-chain of type VI, of double length when compared to the rest of chains, is due to the larger globular domain at the N- and C-termini. These extended collagen domains are subject to extensive intracellular and extracellular post-translational modifications.55,56
Collagen types VII and X are formed by short chains, and type X, is homotrimeric exhibiting long C-terminal and short N-terminal domains. In vivo studies revealed it assembles into hexagonal networks.57 Type VIII collagen is similar in structure to type X collagen, although it possesses a different spatial distribution which confers distinctive function.58
Collagen type IV is predominant in membranes, assimilating nitrogen atoms, the laminin proteins of the extracellular matrix, and other elements into a 2-dimensional supra-molecular aggregate. Type IV consists of a conformational flexible triple-helical structure with three domains. Until now, there are six recognised subunit chains (α1 (IV)–α6 (IV)). Among these chains, α1 (IV) 2α2 (IV) heterotrimers have been reported to be vital in forming important network in most embryonic as well as adult basement membranes.
Numerous experimental studies link the misregulation of collagen to a broad range of diseases. Collagen serves as binding sites for cytokines and multiple growth factor proteins. These cytokines and growth factors in turn regulate vital cell functions including survival, differentiation, motility and polarity.5,59 One of the therapeutic potential for collagen is in the field of drug delivery, where the binding ability of collagen makes it a promising agent in the delivery of drugs, while the anchoring and network-generation ability of some collagen types has potential in tissue regeneration and repair.60–62 Furthermore, experimental studies have widely pointed to deficiencies in type III collagen, as well as elastin, as linked to cardiac aneurysm formation.63,64 Finally, collagen bio prostheses have been studied for roles in surgery.65
Mutations in different regions can have different effects and defects at the molecular structure or collagen organization into mature fibers result in different diseases associated with connective tissues and even some types of osteoporosis and arthritis (1, 2). A number of excellent books and reviews on collagen describe the biochemical and biomedical aspects in detail,16,66 and some others focus on describing how understanding the key biochemical and physical properties of collagen lead to strategies to create, control and modify the structure and function of collagen-based biomaterials.36,67
Bella and co-workers19 designed and crystallised at 1.9 Å resolution, a peptide to model the effect of interrupting the repeating (Gly-X-Y) motive with a single Gly substitution to Ala at the center of a 30-amino acid peptide as these substitution had been identified in several diseases. The crystal structure published in 1994 provided structural information on the effect of a glycine substitution in a triple helix, an alteration which usually leads to pathological states in fibrillar collagens.
A first structural analysis of the polypeptide (Pro-Gly-Pro)n was reported by Yonath and Traub in 196969 followed by work on fiber diffraction by Scheraga70–73 and Blout's groups.74–78 In 1981, Okuyama et al.18 presented crystallographic studies on the polypeptide model (Pro-Pro-Gly)10 and reported 72 symmetry as opposed to the 103 triple helical symmetry in earlier natural collagen studies which animated a debate around the actual symmetry of natural collagen. In the 90's, new polypeptide structural models were synthesised and characterised, and continued to illustrate key features of collagen related systems. More recently, the first report of the full-length structure of the collagen-like polypeptide [(Pro-Pro-Gly)10]3 at 1.3 Å was given by Berisio and co-workers in 2001. Model peptides have also been used to define the basic principles of collagen self-association to the supramolecular structures found in tissues.46,79
The interactions between the triple helical structure of collagen and proteins play important roles in collagen binding and degradation and for example in healing and repair of the body's tissues. Crystallography has yielded atomistic structures of a variety of collagen types, which has allowed for detailed studies of collagen complexes. Over the past decades, much information has been also gained about the interactions of collagen with cell surface receptors, extracellular matrix components and enzymes such as matrix metalloproteinases (MMPs). Currently, around over 300 different crystal structures of collagen in complex with other proteins have been reported, and some of these selected examples will be described next to illustrate relevant pivotal interactions and their relation to function. In addition, the triple helix structural motif is found in a few non-collagenous proteins.
The binding of a monoclonal antibody (MAb) to the triple-helical region of type III collagen was one of the first examples where a region of collagen which binds to another molecule was studied in some detail to clarify specific recognition and binding properties.80 The molecular features involved in triple-helix interactions with another macromolecule were characterised, and the observation of unstable Gly-Gly-Y triplets adjacent to the recognition region was made, suggesting involvement of some flexibility or instability near the actual binding site.
Human cysteine cathepsin is a protein crucial in pathophysiological and physiological cellular mechanisms, and is a key therapeutic agent for a range of diseases as it hydrolyses various extracellular matrix components among which are some types of collagens. Sage et al. have described an inhibition mechanism of this protein mediated by glycosaminoglycan that involved in vivo modulation of its collagenase activity.81
Another process of vital importance where protein–collagen interactions are crucial is the degradation of collagen to maintain the correct collagen homeostasis in tissues. In the collagenases, hemopexin C domain exosites bind native collagen, which is required for triple helicase activity during collagen cleavage. The active site of collagenolytic matrix metalloproteinases can only accommodate a single-chain of collagen. Thus the collagen helix must be initially unwound by a triple helicase in order to expose the scissile bonds, and then cleavage of the chains occurs sequentially. Models were proposed for the regulation of type I collagen levels upon stimulation of the activity of several matrix metalloproteinases. The collagen binding properties and the role of the ectodomain and the hemopexin C domain of the collagenolytic membrane type-1 matrix metalloproteinase (MT1-MMP) in collagenolysis were detailed charaterised.82 It was reported that collagen is a unique substrate for the proteases responsible for its cleavage, and these interactions recruit and regulate collagenolytic and gelatinolytic activities in a homeostatic manner.82
Molecular dynamics (MD) is a powerful computational technique that provides accurate descriptions of the structure and dynamics of biological systems, contributing to their understanding at an atomic level. In MD simulations, the motion of interacting atoms is calculated by integrating Newton's equations of motion. The potential energy of the system and the forces, derived from the negative gradient of the potential with respect to displacements in a specified direction, are used to forecast the time evolution of the system in the form of a trajectory. Equilibrium quantities are then calculated using statistical mechanics by averaging over trajectories of sufficient length which would have sampled a representative ensemble of the state of the system. Specific MD procedures to study tightly packed collagen have been described in the literature.83 These protocols are notably different from conventional MD simulations of proteins, which generally only treat individual protein molecules or complexes fully solvated.83 These alternative approaches exploit ideas borrowed from modelling crystalline solids such as periodic boundaries to replicate the super molecular arrangement of collagen proteins within fibrils.83 Numerous MD simulation studies applied to collagen have provided useful information to relate the diverse structural characteristics with the function.84–86 A representative example of a simulation system of tropocollagen–peptide complex in solution is illustrated in Fig. 3.
![]() | ||
Fig. 3 A tropocollagen–peptide complex in solution. Each of the four strands is coloured differently. |
Early computational studies in this area focused on microfibril and fibril packing issues such as longitudinal alignment and stabilization of fibrils.3,87–91 Subsequently, collagen-like peptides were used in combined experimental and theoretical calculations to get insight into the structural features of collagen.92–94 The structural importance of prolines and hydroxyproline for helix and fibril stability was established using computational models by Scheraga and coworkers.91 However, the first simulations in this area are the 0.5–1.0 ns molecular dynamics simulations of collagen-like peptides performed by Klein and Huang.95 This was followed by MD simulations studying the role hydroxylated prolines play in stabilising the collagen triple helix.96
Posterior work focused on the modelling of telopeptides, which are crucial for the formation of enzymatic covalent crosslinks in collagens near their N- and C-ends, as these crosslinks provide structural integrity, strength, and stiffness to collagenous tissues.97 One study reported conformational and packing studies of cross-linked structures of the fibril-forming type I collagen N-telopeptide heterotrimer.98 Due to the absence of high-resolution crystallographic structures of telopeptides, a triple-helical structure was built on the basis of crystallographic coordinates of a collagen-like sequence coordinates and then replaced with the actual bovine collagen residues. It was further found that if individual N-telopeptides were considered, their chain structures were essentially random, but when they were docked to their helix domain receptors, very ordered and specific conformations were created.98 A second study addressed type I collagen C-telopeptide conformations using all three chains of the heterotrimer before and after it was docked to its receptor domain.99 The computational models showed that the N- and C-telopeptide regions have different molecular packing and intrafibrillar crosslinking patterns that control the relative azimuthal orientations of molecules in the fibril.99 In a later study, the deformation mechanisms of N- and C-crosslinks and the functional roles for the N- and C-telopeptide conformations were investigated via MD simulations.97
Other computational work focused on mutagenic disruptions of collagen functioning, and their associated pathologies.100 In one study,101 collagen-like molecules designed to mimic the site of mutations in collagen type I are used in combination with MD simulations to contrast general structural properties of the peptides with and without the mutation to examine the effect of the single-point mutation on the surrounding residues.
On the development front, a novel set of molecular mechanics parameters for hydroxyproline by Park et al. allowed for the reproduction of the correct pucker preference of the collagen backbone motif, which were tested in a set of simulations of collagen-like peptides. The role of hydroxylation in the stability of the collagen triple helix by adjusting to the right pucker conformation was reproduced.102 Various other studies have been performed by several groups to investigate the relationship between interchain salt bridge formation and triple-helical stability using detailed molecular simulations with the aim to guide the design of collagen-like peptides that have specific interchain interactions.103 To further clarify the stereospecificity of ion pairs, MD simulations were computed for triple-helical peptides containing reversed sequences, comparing EGK with KGE, for example. In combination with experimental studies, the results indicated that the reversal of charges lowered thermal stability, highlighting the importance of cross-chain ionic interactions for the stability of the collagen triple helix in solution.104
MD simulations together with experiment have investigated the pathways and molecular mechanisms for peptide assembly into triple-helical protomers, as well as their subsequent organisation into structurally defined, linear assemblies.105 These studies showed that collagen-mimetic fibrils and microfibers, which are very similar to those formed in vivo, could be obtained through the linear assembly of a small collagen-mimetic peptide driven through electrostatic interactions with precisely defined periodic features with potential applications in material design. Experimental studies showed the positional preference of different amino acids to form a stable triple helical collagen motif, the structural basis for the variations in the sequence. The positional propensity was systematically investigated with computational techniques.106 Specifically, MD simulations were carried out on 39 collagen-like peptides showing that the propensity of the different amino acids to adopt collagen-like conformations depends primarily on their φ and ψ angle preferences.106
Several experimental and modelling studies have been carried out to understand mechanical properties of bone, a biological nanocomposite that exhibits a highly optimised and complex multi-level hierarchical structure composed primarily of type I collagen and hydroxyapatite.107 Among the computational studies, molecular dynamics and steered molecular dynamics were employed to characterise directional dependence of deformation response of collagen with respect to the hydroxyapatite surface107 and collagen interactions with rutile surfaces without hydroxylation.108 The early process in the nucleation of hydroxyapatite at a collagen template was studied by immersing a triple helical collagen molecule in a stoichiometric solution of Ca2+, PO43− and OH− ions and compared with simulations of collagen interacting with surfaces of hydroxyapatite from the crystal.109 In the context of drug delivery, medical diagnosis and molecular engineering, the interactions of collagen-like peptides with carbon nanotubes (CNTs) were also investigated with MD simulations.110,111 A collagen-like peptide with a hydrophobic center and hydrophilic surfaces could be inserted spontaneously but slowly and the mechanism of the encapsulation process was characterised. Two studies related to esthetic dentistry112,113 focused on dentin collagen fibrils which are formed during development. Dentin is one of the four major components of teeth. It is composed by 45% of hydroxylapatite and 33% of organic materials out of which 90% is collagen type 1, and the remaining dentine-specific proteins among which are proteases. These proteins add water across specific peptide bonds to solubilise ‘insoluble’ collagen. After development, apatite crystallites replace some of the water molecules in collagen but it is not clear the mechanism by which this occurs. Both studies focused on characterising how collagen interacts with adhesive monomers and whether these could displace all or just some proportion of water molecules from collagen intermolecular spaces using experimental or computational approaches. Other examples of computational studies of the interaction of collagen with materials are those with gold nanoparticles,114,115 graphene nanoribbons116 as well as those to get insight into collagen self-assembly on substrates.117
There are numerous computational studies illustrating collagen–protein interactions, and a few selected representative ones are highlighted below. To start, two related studies,118,119 were reported employing MD simulation to analyse (i) the structural effect on heterotrimeric models of triple helical peptides with interruptions in the Gly-X-Y repeats and (ii) the interactions of collagen with gelatinase-A, a matrix metalloproteinase, and the role of each domain of the protein in hydrolysing collagens with and without interruptions. Matrix metalloproteinases are members of the endogenous proteases mentioned earlier that hydrolyse collagen. The process of hydrolysis is relevant in a variety of physiological and pathological conditions and it involves breaking down the molecular bonds between individual collagen strands and peptides. Results from the first study118 showed the formation of a kink in the interrupted region of the triple-helical peptides and significantly differences in the hydrogen-bonding pattern due to singularities in the staggering of chains. In the second study,119 the authors proposed that the collagen binding domain binds to the C-terminal of collagen like peptides with interruption, helping in unwinding the loosely packed interrupted region. They speculated that the role of the hemopexin domain of the metalloproteinase is to prevent further unwinding of collagen by binding to the other end of the collagen like peptide. It was also postulated that subsequently, the catalytic domain would orient to interact with the partially unwound triple helix structure of the peptide to carry out hydrolysis. Next, extended MD simulations were reported to determine the most likely rearrangements of the domains of metalloproteinase-2 in response to the presence of the collagen triple helix.120 The authors pointed out that in spite of its physiological and pathological relevance, detailed structural information about the enzyme–substrate interactions during collagen hydrolysis catalysed by MMPs is not available. Different models for the interaction of the full-length MMP-2 enzyme and the synthetic collagen-like fTHP-5 were considered concluding that the full multidomain structure of MMP-2 is required for the studies of the interactions with collagen owing to its characteristic flexibility.120 The most significant MMP-2/fTHP-5 interactions at the catalytic and non-catalytic domains were also detailed gathering some clues about the role of the different domains during collagenolysis.
The binding of a C-terminal fragment of collagen XVIII, endostatin, to heparin and heparan sulfate was studied experimentally and further characterised by docking and molecular dynamics simulations.121 Endostatin interacts with the heparan sulfate chains of the cell surface contributing to its biological activities. The aim of this study was to determine the affinity of these interactions, identify the structural features of heparin/heparan sulfate–endostatin complexes, as well as to investigate the effect of divalent cations on the interaction.121
Collagen type II is a specific target in the collagen-induced arthritis model. In a study using a homology model of an antigen–antibody complex, using 200 ns MD trajectory, the critical amino acids conferring Collagen type II epitope specificity to a variety of autoantibodies were investigated. The presence of a few anchoring residues in the antibody regions was shown to be probably sufficient to confer a moderately high affinity key for the recognition.122
A combined computational and experimental study illustrated the nature of the ligand–receptor interactions between single and triple-helical strands of collagen and integrin. Integrins are the main receptor proteins that cells use to both bind to and respond to the extracellular matrix, with these interactions regulating many different cell functions, and thus a detailed understanding of the fine-tuning of collagen binding to integrin is essential as it might be a potential tool for therapeutic purposes.123 Combined NMR and MD simulation methods addressed the question of why single-stranded collagen fragments are unable to establish a stable specific binding interaction with the integrin receptor, finding only weak complexes in solution. In another study,124 a biomimetic design strategy of platelet adhesion inhibitors was proposed to develop potent inhibitors for the integrin α2β1-collagen binding, using a combination of molecular docking, structure similarity analysis, MD simulations and experimental validation.
All these studies highlight the potential for combined in silico and in vitro studies for extending our understanding of collagen–protein interactions.
A wide variety of computational methods are currently used in the field of computational chemistry. Despite the universal availability of MD algorithms and forcefields applicable to macromolecules, the size of model systems and the computing resources simulations require pose inherent limitations. Recent expansion in computer hardware and high-performance computing facilities means MD simulations on a nanosecond timescale are now standard with microsecond simulations attainable in recent years. Using reduced representations, what is known as coarse-grained (CG) models, is one such approach, as this reduces the number of degrees of freedom in a simulation system by treating a group of atoms as a single entity, significantly curtailing the computational expense. Several algorithms also exist to accelerate sampling along a pre-defined set of reaction coordinates and estimate the potential of mean force providing a wealth of information about the simulation system at a fraction of the expense of traditional all-atom MD. Such methodologies are relevant to study and some have already been used to study some phenomena in which collagen is involved, for example, complex associations and conformational changes that are generally unattainable by atomistic equilibrium MD simulations. Computational simulations are steadily guiding the development of promising novel imaging agents for clinical use to facilitate personalised medicine by optimising the selection and dosing of disease therapies, and by improving the understanding of the underlying biology of a disease. By gleaning new insights into collagen interactions in bulk materials and in protein environments, computer simulations may accelerate the ability to understand the potential role of collagen in the design of tools for medical applications and broadly speaking in biomaterial science and nanotechnology. For example, the holy grail in cardiovascular prevention is to identify individuals at risk for myocardial infarction or stroke,126 and this is becoming possible by using non-invasive plaque detection where understanding of collagen–protein interactions at atomistic level is fundamental. Progress in these areas may allow earlier detection, may facilitate monitoring the response of the treatment, and overall, the provision of a more effective treatment.37,127 The increasing availability of high-resolution structural information, growth in computer capabilities and the development of state-of-the art algorithms and accompanying force fields will markedly amplify the use of computational simulations for the study of collagen–protein interactions in the coming years.
Footnote |
† Present address: Computational Biology Lab, National Center for Bioinformatics, Quaid-i-Azam University, Islamabad, Pakistan. |
This journal is © the Owner Societies 2016 |