Roderick E. Hubbardab
aVernalis (R&D) Ltd, Granta Park, Abington, Cambridge, CB1 6GB, UK
bUniversity of York, Structural Biology Lab, York, YO10 5YW, UK
First published on 9th November 2005
In a target-focused approach, the cycle of discovery is very similar with or without a structure for the target. Initial-hit compounds are found that bind to the target and enter a medicinal chemistry cycle of making compound analogues and testing in suitable biological models. From this, the chemist builds hypotheses of what is important for the activity. Using experience (or inspired guesses) the chemist then makes changes that should improve the properties of the compound and the cycle of synthesis, testing and design begins again. These hypotheses develop a model of the conformations the compounds adopt, the chemical surfaces they project and the interactions made with the active site. For example, the optimisation of sildenafil (Viagra),1 included consideration of the electronic properties of an initial-hit compound and how it could be improved to more closely mimic the known substrate in the active site of phosphodi, many years before the structure of this enzyme was known.
Nowadays an appreciation of the 3D structure of both the compounds and their target are a part of just about every drug-discovery project. This target structure can be experimentally determined, a model constructed on the basis of homology or a virtual model of the receptor created on the basis of the chemical structure of the known active compounds. In addition, computational methods such as virtual screening and experimental methods such as fragment screening can generate many new ideas for compound templates and possible interactions with the active site. The major advantage of experimentally determining the structure of these different compounds bound to the target is to increase the confidence in the hypotheses and increase the scope of subsequent design. This encourages the medicinal chemists to embark on novel and often challenging syntheses in the search for novel, distinctive and drug-like lead compounds. Our ability to predict conformational changes in proteins and the binding energy of protein–ligand complexes remains relatively poor, so there is still plenty of scope for experience, inspiration and guess work in the details of design.
This book will provide an overview of the methods currently used in structure-based drug discovery and give some insights into their application. Essentially, all of the examples and methods focus on proteins as the therapeutic target. There has been considerable progress in the structural biology of RNA and DNA molecules and these classes of molecules are the recognised target for some successful drugs. For DNA, our understanding of the binding of compounds that intercalate or bind to the small groove is reasonably well advanced (for an early example, see Henry;2 current perspectives are provided in Tse and Boger,3 and Neidle and Thurston,4). There have also been spectacular advances in determining the structure of whole ribosome sub-units5,6 and of representative portions of the ribosomal RNA7 in complex with known natural product antibiotics. These structures have led to some hope that rational structure-based methods may be applied against the ribosome and also other RNA targets where a particular conformation has a role in disease processes (Knowles et al., 2002).113 Although there has been some progress8 and it has been possible to discover compounds with reasonable affinity for RNA, there remain considerable difficulties in designing small, drug-like molecules with the required specificity to discriminate between the very similar sites presented on RNA. For these reasons, the discussions in this book focus on proteins as the therapeutic target.
Modern, target-oriented drug discovery is usually organised into a series of stages. The definitions of these stages differ from company to company and the details of the boundaries will vary from project to project. The following discussion provides an illustration of the stages, their purpose and duration and the types of resources involved. Clear criteria need to be established for moving from one stage to another as, in general, the stages become progressively more resource and expense intensive (Fig. 1).
![]() | ||
Fig. 1 The drug-discovery process. The lightening, shaded box emphasises where structure-based methods can play a significant role. The horizontal axis only approximately scales to time in each stage. |
The approach to biological research has undergone dramatic changes in the past decade, with successions of omics technologies becoming available. Genomics has recorded the sequence of nucleic acid bases in many genomes, and continuing bioinformatics analyses are identifying the coding regions. Comparing the genomes of both pathogen and host organism can identify potential target genes. Transcriptomics methods monitor the identity and levels of RNA transcribed for each gene, and there have been high hopes that comparison of “normal” and diseased cells will identify targets. There is a vast literature in these areas – Egner et al.11 provide an introduction to the methods, and the recent critique by Dechering12 points out some of the pitfalls. There has been considerable interest (and investment) in applying these methods to find new targets for different diseases and conditions. As the first genomes began to appear, there was intense interest in identifying what all the genes were. An example of a target discovered in this way is the beta form of the estrogen receptor (see Manas et al. in this book).
Whatever the mechanism of identifying a target, there needs to be some level of validation before nominating it for a drug-discovery project. The phrase “target validation” is much misused – a target cannot be said to be truly validated until a drugthat uniquely affects that target is on the market. Even then, there can be issues such as the recent challenges facing COX-2 as a target following adverse effects (see 24 February 2005 news item in Nature, 433, 790).
In general, the requirements for a target are to establish a biological rationale for why affecting the target will have the desired therapeutic benefit. This can include assessing the viability of the organisms produced with a particular gene removed, either through knock-out technology or through RNA interference techniques. These are not ideal methods for emulating the actual effect of a drug – with gene knock-outs,there is much redundancy and subtlety in biological pathways and the removal of a gene can often be compensated in other ways as the organism differentiates and grows. An example here is the attempts to discover a function for the beta form of the estrogen receptor. Once the gene had been identified, there were intense efforts to ascribe a function to the gene, with consider investment in producing and characterising knock-out animals.13 There were hints, but in the end, it took the development of isoform-specific compounds to provide chemical tools which could probe the biology and identify which diseases or conditions were associated with the receptor (again, see the chapter from Manas et al. in this book).
The best case for a target is to have a compound available that can provide the biological proof of concept. This is a compound that is sufficiently specific for the target of interest that can be studied either in cellular assays or in animal models of disease, to demonstrate that modulating a particular target will have the desired therapeutic benefit, invivo. Such compounds could come from natural products, as in the case of antibiotics that validate the ribosome as a target5 and the geldanamycin derivatives that are demonstrating the potential of Hsp90 as an oncology target.14
In addition to biological validation, targets also need to be considered for what is termed, druggability. That is, does the protein have a binding site which can accommodate a drug-like compound with sufficient affinity and specificity? Although some experimental methods may be used to assess these,15 analyses of experiences with many targets have generated some general principles discussed in the chapter by Hann et al. later in this book. In summary, enzyme active sites tend to be highly druggable consisting of a distinct cleft designed to bind small substrates and with defined shape and directional chemistry. In contrast, most protein–protein interactions are less druggable as they cover quite large areas of protein surface with few shape or chemical features that a small molecule could bind to selectively. Unless particular “hot-spots” of activity can be identified, they are generally regarded as unsuitable drug targets (see Arkin and Wells, 2004 for a discussion).
Finally, for a structure-based project, there is a clear structural gate – that is, the structure of an appropriate form of the target needs to be available. Sometimes (for example, in a small structure-based company) this is set as a strict gate – that is, unless the structure is available hit identification cannot begin. There can be additional constraints. For example, if the project is relying on fragment screening using crystallography followed by soaking with compound mixtures, then the protein has to crystallise in a suitable crystal form with an open binding site.
HTS is also very expensive, consuming large quantities of target and compounds and requiring significant investment in robotic screening devices. Smaller compa that rely on screening usually work with smaller libraries of compounds, and depend on a particular “edge” over the larger companies. That distinctiveness could be either in some detailed knowledge or expertise with the biology of the target class and thus more appropriate configuring of the assay, or through a small library of compounds for that particular class of target. It is in the hit-identification phase that structure-based methods have provided smaller companies an opportunity to estab rapidly effective drug-discovery projects, particularly through the use of virtual screening or fragment-based methods (see later).
In most cases, the hit-identification phase relies on configuring a particular assay to monitor binding or inhibition. Usually, a large number of compounds are being screened, so the first experiment is to measure compounds that exhibit activity (above a certain percentage inhibition) at a set concentration. This is usually fol by confirming the hits, that is where an in vitro assay is run at varying con to determine the IC50‡ or the Ki or Kd§ for the compound and the quality of the compound sample checked. Maintaining quality in a compound collection is a major challenge – compounds decompose over time, particularly if held dilute in solution in air. In addition, it is not unusual for 5–10% of compounds purchased from commercial suppliers to either be not what they claim to be, or to contain major contaminants that can give false positive (or false negative) results.
An HTS campaign can require significant resources (compound, target, man) and last 6–12 months, depending on how long it takes to configure a robust assay. Where smaller collections of compounds are being used, or structure-based methods applied, the hit-identification phase usually lasts around 6 months and requires a relatively small team of scientists.
The output from a hit-identification campaign is a set of compounds whose chem structures have been checked and which have reproducibly been shown to have activity.
The detailed work during the H2L phase varies with the nature of the project and, in particular, the origin of the hit compounds. Wherever the compounds come from, it is usual to re-synthesise the compounds for complete validation of the hit and to either purchase or synthesise close analogues of the compounds. In general, it is during the H2L phase that dramatic changes in chemical template are made and the essential core of the lead series established. The usual aims are to establish preliminary structure–activity relationships (SAR) within one or more series, to explore the indicative physicochemical and ADMET¶ properties of the compounds, to consider the chemical tractability or synthetic accessibility of the compounds and to understand the IP position on the compound series and target. Depending on the project (and the company policy), entry into lead optimisation can be gated by demonstrating some in vivo activity in the series. Setting the right barriers for entry into lead optimisation is one of the most challenging aspects of medicinal chemistry.
This phase usually takes around 6 months, depending on the requirements for biological testing and the degree of synthesis required to establish a lead series with appropriate properties.
The early stages of the LO process are usually focused on achieving the desired affinity and selectivity. Selectivity requirements vary from target to target and, in particular, between different therapeutic areas. Where a drug is for an acute condition such as cancer, where rapid intervention is required and the course of treatment is likely to be short term, then side-effects can be tolerated. In fact, it appears that some oncology drugs achieve efficacy by targeting a number of pathways. Where the drug is for a chronic condition, such as arthritis or diabetes, where the drug will be taken for many years, the selectivity requirements can be much more stringent.
In these early stages, there can still be some modest changes in the central core of the compound. However, as LO progresses, the main changes are on the periphery of the molecule. The main driver is the biology – it is remarkable how quite small changes in the chemistry can have a large effect on the biological activity, particularly in vivo.
Lead optimisation typically takes 18–30 months, depending on the complexity of the target biology, the resources deployed and the chemistry of the lead series. The real challenge in lead optimisation is balancing when certain properties need to be introduced and deciding when to abandon a particular project or lead series.
The output from the LO is a compound (or a set of compounds) that meets the required criteria of in vivo efficacy in animal models, with a demonstrable mode of action and with acceptable PK.
The difficulty and cost of synthesising the compounds is considered throughout the discovery process, but becomes particularly important at this stage. A synthetic scheme that works in the laboratory to produce 100 mg of compound may need dramatic modification to produce the many kilograms of compound required for late stage clinical trials. Overall, the difficulty of synthesis or purification of compound will have a marked impact on the cost of goods – i.e. how much it will cost to produce the drug – and this can seriously impact the commercial viability of the project. Similarly, formulation – getting the drug into a form that can be administered both for the animal testing and for clinical trials – can have an impact on the project viability.
This phase is to prepare the way for clinical trials where the drug candidate is given to humans. This is covered by a stringent regulatory regime and many of the steps in the pre-clinical stage are covered by regulations and a need to work to certain legal guidelines.
Phase 1 studies are primarily concerned with assessing the drug candidate's safety. A small number of healthy volunteers are given the compound to test what happens to the drug in the human body – how it is absorbed, metabolised and excreted. A phase 1 study will investigate side-effects that occur as dosage levels are increased. This initial phase of testing typically takes several months. About 70% of drug candidates pass this initial phase of testing.
In phase 2, the drug candidate is tested for efficacy. Usually, this is explored in a randomised trial where the compound or a placebo are given to up to several hundred patients with the condition or disease to be treated. Depending on the condition, the trial can last from several months to a number of years. The output is an increased understanding of the safety of the compound and clear information about effectiveness. Only about one-third of the projects successfully complete both phase 1 and 2 studies, but at the end of this process, the compound can be truly considered as a drug.
In a phase 3 study, a drug is tested in several hundred to several thousand patients. This provides a more thorough understanding of the drug's effectiveness, benefits and the range of possible adverse reactions. These trials typically last several years and can include comparison with existing treatments on the market to show increased benefit. These trials provide the necessary data on which to get approval by the regulatory authorities.
As the drug comes towards, and is launched in the market, continued trials and monitoring is required. Sometimes, adverse reactions can only be picked up when a drug is given to a very large population. Problems can sometimes be dealt with by changes in prescribing practice or through defining particular patient populations. However, it is sometimes necessary to remove a drug from the market (cf. earlier reference to COX-2 inhibitors).
The attrition rates in the early stages of drug discovery are more difficult to quantify as the raw data is not in the public domain. Also, the boundaries between each step vary dramatically between targets, between disease indications and between the varying drug-discovery paradigms of different companies. The definition of success also depends on how high the criteria are set for progression. For example, the problems experienced in clinical trials in the 1990s has led to much more stringent sets of assays and thus higher rates of failure in the research and pre-clinical phase. As a general rule of thumb, the attrition rates in discovery are about the same as in clinical trials – about one in ten. This means that a pharmaceutical enterprise needs to maintain an essentially funnel-shaped pipeline to generate a sustainable business, with larger numbers of projects at the earlier stages. For this to be successful requires some difficult but clear decisions to be made on whether and how to progress the targets from one stage to the next.
The examples include genome sequencing, transcriptomics and proteomics for target identification and validation, protein engineering for biological therapeutics, combinatorial chemistry, molecular modelling as well as structure-based methods. There have been considerable investments in some of the technologies. For example, combinatorial chemistry was a revolutionary technology for synthesising massive numbers of related compounds. The first paper describing synthesis of a single combinatorial library appeared in 199218 and the most recent comprehensive survey of combinatorial library synthesis for 2003 showed 468 new methods.19 The early years of combinatorial chemistry led to massive investment in parallel synthesis and screening methods in the pharmaceutical industry. Very few compounds from this early investment have entered clinical trials as the early methods were flawed. There was insufficient appreciation that the available synthetic methods suitable for such parallel operation would sample only a relatively small chemical space and produce many compounds without the required drug-like properties. In addition, there were many issues in developing robust, reliable synthesis of individual compounds. However, many lessons were learnt and the design of focused libraries, where particular features of templates are elaborated, are now an integral part of most drug-discovery programmes.
There has been some hype associated with the availability and value of structures of therapeutic targets and the ability to use structure and modelling methods to design compounds. At times, some elements in the pharmaceutical industry and, in particular, some start-up companies have been over-optimistic on what the methods can deliver. However, there has been a steady realisation of the power of the methods for the classes of target for which structures can be determined. The evidence for this is that essentially all pharmaceutical companies have some form of modelling group that constructs models of the structure of targets and uses these in discovery and design of new compounds. And an increasing number of small companies have invested in the ability to determine the structure, particularly with X-ray crystallography.
There are three main contributions that structural methods are making to the drug-discovery process – structural biology, structure-based design and structure-based discovery.
Modern structural biology, particularly protein crystallography, is generating the structure for an increasing number of therapeutically important targets (see the chapter by Brown and Flocco). The two main issues limiting the number of structures are the ability to produce sufficient quantities of pure, soluble, functional, homogenous protein for crystallisation trials and the ability of the protein to form regular crystals suitable for diffraction experiments. This combination of limitations often means that a structure is not available for the whole therapeutic target. However, even the structure of individual domains can be sufficient to make a real impact on a discovery project, and provide a context within which to understand the overall function of the protein. The estrogen receptor (see Manas et al.'s chapter) provides one example. Although the receptor consists of a number of domains, the structure of just the ligand-binding domain is sufficient against which detailed structure-based design can successfully design selective ligands. However, the subtleties of the function of the receptor in the cell can only be understood in terms of the interplay between the different domains that have an influence on receptor activity.
Another example of where drug discovery against just one domain can be successful is the molecular chaperone, Hsp90. This protein is up-regulated in cells under stress and, in complex with a varying repertoire of co-chaperone proteins, helps to stabilise the folding of a large number of proteins important for cell proliferation, growth and function, such as the estrogen receptor and key cell-signalling kinases. The real breakthrough in identifying this target came with the discovery that Hsp90 is the primary target for natural products such as geldanamycin and radicicol, the derivatives for which a viable therapeutic window has been identified, such that compounds such as 17-AAG are now entering phase 2 clinical trials.14 Hsp90 contains three domains – a C-terminal domain of unknown function that is thought to be important for the formation of the functional dimer, a central domain with large hydrophobic surfaces that can stabilise nascent, unfolded peptides and an N-terminal domain that harbours the ATP binding site. ATP hydrolysis provides the energy driver for the chaperone function. The natural products, geldanamycin and radicicol, bind to the ATP-binding site on the N-terminal domain, blocking hydrolysis and thereby inhibiting the chaperone action. A number of projects are now embarking on discovery and optimisation of compounds that can selectively inhibit this ATP site.20 However, the detailed mechanism of action has to take into account interactions between the different domains and also the effect of other co-chaperones.21
This type of analysis is now well established and has been used in many drug-discovery projects over the past 15 years. Some of the early disappointments in structure-based design arose because of the difficulty of predicting binding affinities between protein and ligand. Although the predictive power of the calculations is beginning to improve,23 there remain serious challenges in predicting binding affinities. It should be remembered that the equilibrium between target and ligand is governed by the free energy of the complex compared to the free energy of the individual target and ligand. This includes not only the interactions between target and ligand, but also the salvation and entropy of the three different species and the energy of the conformation of the free species. Overall, the equilibrium is a balance between all these different terms and a number of detailed experimental studies have demonstrated that energetically unfavourable changes in the protein, such as conformational strain or disruption of stabilising interactions, can be compensated for by interactions the protein is then able to make with the ligand.24,25 These balances are even more difficult to consider in the cellular context, with the many complicating factors of competing ligands, solvent conditions and partner proteins.
Virtual screening use computational docking methods to assess which of the large database of compounds will fit into the unliganded structure of the target protein. Current protocols and methods can, with up to 80% success, predict the binding position and orientation of ligands that are known to bind to a protein. However, identifying which ligands bind into a particular binding site is much less successful, with many more false positive hits being identified. The major challenges remain the quality of the scoring functions – if these were more accurate, then the challenge of predicting conformational change in the protein on binding of ligand would also be more tractable.
De novo design attempts to use the unliganded structure of the protein to generate novel chemical structures that can bind. There are varying algorithms, most of which depend on identifying initial hot spots of interactions that are then grown into complete ligands. As well as the ubiquitous issue of scoring functions, the major challenge facing these methods is generating chemical structures that are synthetically accessible.
Fragment-based discovery is based on the premise that most ligands that bind strongly to a protein active site can be considered as a number of smaller fragments or functionalities. Fragments are identified by screening a relatively small library of molecules (400–20,000) by X-ray crystallography, NMR spectroscopy or functional assay. The structures of the fragments binding to the protein can be used to design new ligands by adding functionality to the fragment, by merging together or linking various fragments or by grafting features of the fragments onto existing ligands. The main issues are designing libraries of sufficient diversity and the synthetic challenges of fragment evolution.
The above discussion raises a rather semantic question about the use of the words design and discovery. The word design implies some element of prediction – and some of the methods currently used (such as fragment screening, for example) is clearly not design. In addition, although it is sometimes possible to design modifications to a compound to improve its affinity or selectivity for a target, it is rarely possible to be so predictive in introducing drug-like properties into a molecule. The best you can usually rely on is that the structure of a compound bound to its target will show where a compound should be elaborated (perhaps with a focused library) from which a compound with the desired drug like properties (say, cellular penetration or the desired pharmacokinetics) will be found by assay of the resulting library. For these reasons, this book will use the phrase structure-based drug discovery throughout.
The description is chronological and divided into decades. As a starting point for each decade, there is a qualitative summary of the papers in the June issue of the Journal of Medicinal Chemistry (J. Med. Chem.) in 1965, 1975, 1985, 1995 and 2005. This is necessarily a snapshot, but it does give some insight into how far structural methods had affected the papers and thinking of drug-discovery scientists at the time.
The first structures (myoglobin,26 haemoglobin,27 and lysozyme28) laid the foundation of modern protein crystallography. These established that through structure it was possible to understand the mechanism of action of the proteins and relate this to their biological function. The work on haemoglobin extended to the first attempts to provide a structural understanding of genetic disease and Perutz and Lehmann29 mapped the known clinically relevant mutations in haemoglobin to the structure.
The first major developments in molecular graphics came in the mid-1960s when Project MAC at MIT produced the first Multiple Access Computer, a prototype for the development of modern computing. The computer included a high performance oscilloscope on which programs could draw vectors very rapidly, and a closely coupled “trackball” through which the user could interact with the representation on the screen. Using this equipment, Levinthal and his team developed the first molecular graphics system and his article in Scientific American30 remains a classic in the field. In this paper, he described their achievements, and laid the foundations for many of the features that characterise modern-day molecular graphics systems. It was possible to produce a vector representation of the bonds in a molecule and to rotate it in real time. The representation could be of the whole molecule, or a reduced representation such as an alpha carbon backbone. Because the computer held the atomic coordinates of the molecule, it was possible to interrogate the structure, and to use a computational model to perform crude energy calculations on the molecule and its interaction with other molecules. This work inspired various groups to begin building molecular modelling systems.31 Also during this time, scientists such as Hansch laid the foundations for modern predictive cheminformatics methods by establishing that some of the molecular properties of compounds could be computed by considering the individual fragments that make up the molecule (for a fascinating review of the development of ideas on partition coefficients see Leo et al.32).
There was a steady increase in the number of available protein structures during the 1970s. The crystallographer was limited to working on naturally abundant proteins and data collection (in general) used rather slow X-ray diffractometers. There were sufficient structures, however, for a data bank to be required and the Protein Data Bank was established in the late 1970s.34 The depository was run for many years at Brookhaven National Labs and moved to the Research Collaboratory in Structural Biology during the 1990s (http://www.rcsb.org).35
There are three examples of the use of structure to consider ligand or drug binding that should be highlighted. The first is the studies on dihydrofolate reductase (DHFR) summarised in Matthews et al.36 This is a fascinating paper to read. Although the description of the determination of the structure emphasises just how much the experimental methods of protein crystallography have developed, it does illustrate that many of the ideas of modern structure-based design were well established some 30 years ago. The structure of methotrexate bound to bacterial DHFR allowed quite detailed rationalisation of the differences in binding affinity of related ligands and an understanding of why, although there are sequence variations, the ligand binds tightly to all DHFRs known at that time. This type of structural insight led to structure-based design of new inhibitors.37
The second example is the work of the Wellcome group who explored various aspects of ligand binding to haemoglobin through modelling of the interactions of the ligands with the known structure.38,39 The ideas about molecular interactions generated in this work laid the foundation for Goodford's later development of the GRID program (see the 1980s).
The third example is the design of captopril,40 an inhibitor of the angiotensin-converting enzyme (ACE) and a major drug for hypertension. Although sometimes quoted as one of the first examples of structure-based design, the structure of ACE was not known in the mid-1970s. However, the design was strongly directed by constructing a crude model of the active site, based on the known structure of carboxypeptidase A.
These papers demonstrate that the central paradigm in structure-based design was well established during the 1970s. This paradigm is that the structure of a ligand bound to its target protein can be used to understand the physicochemical interactions underlying molecular recognition and binding affinity and this insight can then be used to design changes to the ligand to improve its properties.
Alongside the slow emergence of design based on the structure of the target, there were important developments in ligand-based modelling. Computational methods incorporating molecular and quantum mechanical treatments of ligand conformation and properties were being explored. This included conformational analysis to predict the 3D conformations of small molecules and the calculation of molecular properties such as hydrophobicity and electrostatic potential. Brute force methods of quantitative structure activity relationships (QSAR) were developed that considered large sets of active and inactive compounds, computed many properties and then attempted to construct a predictive correlation between some algebraic combination of computed properties and activity. Alongside this, the ideas of “virtual” receptor-based modelling emerged, where the properties of active compounds were analysed to construct a 3D pharmacophore of the features required for activity. Exploring and then applying this range of methods required the development of suites of molecular modelling methods. However, only a few, large laboratories had dedicated computing facilities and these provided the focus for the development of a number of software systems that laid the foundation for modern modelling systems.
It is possible to chart the development of the ideas and methods of molecular graphics and modelling systems in two distinct communities – protein crystallography and molecular modelling in support of ligand design. The first developments in protein crystallography were by Alwyn Jones who developed the program 41,42 (re-formulated and extended in the program O43). Protein crystallographers required powerful molecular graphics facilities to help in determining protein structures for visualisation of large electron density maps and fitting of a molecular model of the protein structure into the density. Once the structure had been determined, graphics was again vital in allowing interactive analysis of the structure to not only describe the folding of the protein, but also to understand the mechanism and thus function of the protein. Important examples were the development of the earliest space-filling representations of molecular structure by Feldman at the NIH44 and the developments of the Langridge group at UCSF.45
Most of these early developments were in the academic community, but there was also considerable interest in the potential of molecular modelling methods in the pharmaceutical industry and many of the large companies spawned their own software development efforts. The reviews by Gund et al.46 and Marshall47 provide an appreciation of the early developments. The success of these encouraged the development of a whole new industry in the 1980s.
However, the 1980s saw many important developments in the scientific disciplines that underpin structure-based drug discovery. Molecular biology and protein chemistry methods were beginning to unravel the biology of many disease processes, identifying new targets and importantly, providing the over-expression methods with which to produce large quantities for structural study. In protein crystallography, synchrotron radiation not only speeded up the data collection process but because of its intensity and focus allowed usable data to be collected from smaller, poorer crystals. This was complemented by developments in methods for refining structures, initially least squares refinement48 and later in the 1980s, the simulated annealing approach of X-plor.49,50
There were also important developments in techniques in NMR spectroscopy. Isotopic labelling of protein, instrument and method advances led to multi-dimensional NMR techniques for solving small, soluble protein structures51 (see the chapter by Davis and Hubbard in this book). The larger pharmaceutical companies invested in these methods alongside the traditional use of NMR in analytical chemistry. However, the size limitations of the technique meant there were few therapeutic targets accessible to NMR.
This decade also provided the core of the methods in computational chemistry that support analysis of protein–ligand complexes. Molecular mechanics techniques such as CHARMm52 gained wider application and the computational resources available to most groups increased steadily to allow routine use of energy minimisation and molecular dynamics methods. Of particular note are three papers specifically dealing with protein ligand interactions. Jencks53 provided a simple but powerful analysis of the contributions made by different parts of a molecule to binding. His analysis established that the first part of a molecule overcomes many of the entropic barriers to binding, giving higher affinity for subsequent additions of functionality. This firmly established the ideas that led to fragment-based discovery in the early 2000s. In a similar vein, Andrews et al.54 analysed the contributions that different functional groups make to binding. Finally, Goodford developed the GRID approach55 that used an empirical energy function to generate a very visual analysis of where different types of functional group could interact with a binding site. This approach had a significant impact on how chemists and molecular modellers viewed protein active sites and the possibility for rational design. An important factor in their application was in the availability of affordable computing. At the beginning of the 1980s, the necessary computing and graphics hardware to support structure analysis and molecular modelling cost many hundreds of thousands of dollars. By the end of the decade, graphics workstations such as the Silicon Graphics IRIS, meant essentially every scientist had access to the technology and software.
A development that had a major impact on the way scientists thought about protein structure was the Connolly surface. The molecular surface is a fundamental aspect of a structure as it is through the complementarity of shape and chemistry of the surface that molecules interact with each other. A variety of different representations of surfaces were developed, the most enduring and informative of which is that developed by Connolly.45,56 The molecular surface is defined by the surface in contact with a probe sphere as the sphere “rolls” over the surface of the molecule. Alternatively, the extended solvent accessible surface can be calculated in which the surface is traced out by the centre of the probe sphere as it rolls over the molecule. Although the initial graphics devices could only show this as a continuous envelope of dots, it produced a smooth surface that showed where the protein met the solvent. This approach underlies essentially all the surface representations in use today. In addition, there were developments in the treatment of protein electrostatics, and the program GRASP provided a very visual presentation of the electrostatic surfaces of proteins computed using a Poisson–Boltzman treatment.57 These surface images simplified the representation of protein chemistry and provided important insights into function.
A number of structure-based design groups began to emerge in the pharmaceutical companies. One example is the group at Merck. The paper by Boger et al., 1983,111 describes their work on the design of renin inhibitors which summarises many of the aspects of the discipline at the time. They used homology modelling of the protein structure, and manual docking and inspection of ligands to design peptide mimetics that would find application in many protease inhibitor projects in later years. A second example is also from Merck, where structures of carbonic anhydrase were used to successfully design more potent inhibitors that are now established as treatments for glaucoma.58 This work has been cited as one of the earliest examples of structure-based design that has resulted in a drug on the market.
Towards the end of the decade, various scientists within larger companies recognised the power of the structure-based rational approach and established new startup companies such as Vertex and Agouron, where the resources and organisation could be geared to structure-based discovery.
In addition to the continuing increase in the number of targets for which structures were available, the major change during the 1990s was that much of the equipment for X-ray structure determination and the computing and graphics equipment required for molecular modelling was available in most well-found laboratories in both academia and industry.
At the beginning of the 1990s, there was intense interest in de novo design – using the structure of a protein for ab initio generation of new ligands. The binding site of the protein was mapped with methods such as GRID55 or MCSS59 and then a variety of building methods proposed for generating new ligands, such as HOOK.60
There were two important developments for computational methods at this time. The first was the work by Bohm to analyse the growing body of experimental structures to develop the LUDI empirical scoring function for prediction of protein–ligand affinity. The second was the development of virtual screening or molecular docking methods. The pioneer in this area was Kuntz61 and a series of other programs, such as GOLD62 and FLEXX63 emerged (for review of virtual screening see Barril et al.64 and the chapter by Barril in this book).
For X-ray crystallography the major developments were in the speed of structure determination. Synchrotron radiation, coupled to new, faster instrumentation was capable of rapid data collection. A particularly significant development was cry-ocrystallography,65 where flash freezing and maintaining crystals under a stream of dry air at liquid nitrogen temperatures massively reduced the problems of crystal damage. Alongside this, there were continued improvements in methods for structure refinement66 and in semi-automated methods for fitting models of structure to the resulting electron density.67,68
The important development in the NMR field was the work of the Abbott group led by Fesik, who developed the SAR by NMR approach69 and applied it quite dramatically to develop potent, novel leads against a number of targets.70 This approach is described in more detail in the chapter by Davis and Hubbard and exploits the ability of NMR to report selectively on binding events to identify sets of small ligands that bind to the protein and that when linked together produce high affinity ligands. This approach resuscitated interest in protein NMR spectroscopy in drug discovery, but most companies found that there were few targets with appropriate multi-pocket sites and that there were too many challenges in designing appropriate chemistry to link fragments together and maintain binding affinity.
Alongside all this methodology development, there were two high-profile drug-discovery projects that validated the structure-based approach and led to increased investment in the area. The first was work by the groups of von Itztein and Colman who used the structure of the enzyme sialidase to design potent inhibitors against the influenza virus that became the drug, Relenza71 (see the chapter by Colman in this book). This is a classic of structure-based drug discovery – the structure of a weak substrate mimic bound to the protein was used to guide lead optimisation to produce a compound with improved affinity and selectivity that also may minimise appearance of drug resistance. The second was the many efforts in developing generations of HIV protease inhibitors. The first generation of drugs22 included the use of structures of protein–ligand complexes to identify where changes could be made on the ligand to improve bioavailability. A paper by Greer et al.72 summarises how hits were identified by screening of existing aspartyl protease libraries and the structure of these compounds bound to the enzyme was used to guide combining of features of different compounds, adding solubilising groups and making changes to affect PK properties. More recent developments have made wider use of structure-based methods, such as Salituro et al., 1998.114 Developments in this class of inhibitors are summarised in Randolph and DeGoey73 and Chrusciel and Strohbach.74
There are two other major developments of the 1990s that should be summarised – the development of fragment screening methods and the evolution of the ideas of drug and lead-likeness.
The ideas underlying fragment-based discovery can be traced back over many decades. As mentioned above, work by Andrews54 and by Jencks (1981)53 established the idea that the binding affinity of a compound arises from contributions made by different parts of the molecule. This led to the idea of mapping the binding surface of a receptor either computationally (Bohm, 1994112) or experimentally.59 The NMR methods have been mentioned above, but crystallographers also saw the potential. Work by Ringe75 and others76 characterised how different solvent fragments bound to protein active sites. Nienaber et al.77 took the approach a step further, soaking crystals with mixtures of small molecular fragments as a starting point for drug design. These ideas have been taken forward by many other groups to provide a basis for structure-based discovery78,79 described in more detail in the chapter by Hann et al. in this book.
Analysis of the successes and failures of drug discovery in the 1990s has led to some important concepts for modern and future rational drug discovery. The analysis by Lipinski et al.80 has had a profound effect on rational approaches to drug discovery by identifying some relatively simple guidelines on the properties of compounds that are orally bioavailable. This idea has been further refined81 and extended to identify the properties needed for lead compounds to be successfully optimised into leads – lead-likeness and ligand complexity.82,83
Over the past 5 years, the increased ubiquity of structure-based methods has been built on the ideas discussed above and the increased evidence of how structural insights can not only speed up, but improve the success of drug-discovery efforts.
Along with the continuing refinements and improvements in these methods, the principle advance in the past 5 years has been the availability of an increasing number of structures of therapeutic targets. Although there remain considerable challenges, the massive investments in structural genomics are slowly providing improved methods and protocols for generating protein structures for an increased number of proteins. A potentially valuable development for drug discovery is the recently established Structural Genomics Initiative, which aims to generate structures for many hundreds of therapeutically relevant human proteins and place them in the public domain (see http://www.sgc.utoronto.ca).
The complete genome sequences are available for human and for many major pathogens, and many new targets are being identified and validated. Where there is not structure available, there has been considerable interest in using homology models to provide a starting point for structure-based discovery. The review by Hillisch et al.84 summarises the current state of the field.
An alternative strategy is to generate or identify peptide fragments that can disrupt a protein–protein interaction. Structures of the protein–peptide complex can then be used to derive peptidomimetic compounds. Recent successes include the discovery of compounds against MDM289 and XIAP.90
These developments all contribute additional methods that can be used as early filters or structural alerts to guide the design of new compounds. However, the mechanisms contributing to ADMET are clearly very complex and multi-factorial, so it will be a long time before they can replace in vivo experiments.
The following is a summary of the efforts for some of these other target classes.
The determination of the structure of the therapeutic target, if possible in complex with as many different ligand starting points as possible is clearly central to structure-based discovery. Many major classes of therapeutic target are still inaccessible to routine structure determination – such as the GPCRs and ion channels. In addition, many aspects of mammalian biology is governed by the transient assembly of large, multi-protein, multi-domain complexes and these remain a formidable challenge for structural study. These prizes remain available for the ambitious structural biologist.
Our ability to predict the conformational and energetic changes that accompany binding of a ligand to a protein target remains relatively weak. The methods that can be practically applied have remained essentially on a plateau since the development of empirical scoring methods in the early 1990s. Recent advances in techniques such as MM-PBSA108 may offer the next level of improvement (see Barril and Soliva chapter). This ability to accurately determine interaction energy is the key for the next step of being able to model protein conformational change on ligand binding – a phenomenon which currently limits success (and confidence) in detailed structure-based design.
Finally, a major challenge is how to bring this wealth of structural, computational and assay data together to design new, improved compounds that can be readily synthesised. There are few computational/informatics tools available to guide this process currently, and successful design crucially relies on effective interworking and understanding between the different disciplines. It is hoped that the descriptions of the methods and selected applications provided in this book will give some insight into how this integration of the various methods is important, and emphasise how structure can provide insight and confidence to inspire and enable successful design.
Footnotes |
† This is Chapter 1 of the forthcoming book Structure-Based Drug Discovery which forms part of the RSC Biomolecular Sciences series. Structure-Based Drug Discovery is due to be published in early 2006. |
‡ The IC50 represents the concentration of the drug that is required to achieve 50% reduction in activity of the target, usually in vitro. A related term is EC50, which represents the plasma concentration required for obtaining 50% of the maximum effect in vivo. |
§ Ki is the inhibition constant for a reaction. The precise definition of these constants will depend on the chemical nature of the assay. When comparing values, it is important to know the precise details of the assay – variations in pH, buffer composition, ionic strength, temperature, protein activation state, competitor ligands, etc., can all have a real effect. |
¶ There are a number of phrases and acronyms for these important drug-like properties. DMPK is drug metabolism and pharmacokinetics (PK). PK is the characterisation of what the body does to a drug. Conventionally, this is analysed in terms of four main processes – Absorption, Distribution, Metabolism and Excretion or ADME. This is sometimes extended to include Toxicity (ADMET ). All of these processes are due to complex, interdependent factors within the body and although detailed mechanistic (and increasing structural) information is emerging about individual components, empirically derived models are the only route to prediction. The main challenge for these models is the quantity and consistency of experimental data and the transferability of such models from one compound series to another. As many of the processes are due to interaction with and activities of many different proteins, it is often the case that models are constructed within a compound series, but will not transfer. Although some use is made of these predictive models, in most cases, experimental measurements need to be done. Most can be configured as in vitro assays. |
|| The phrase, a nanomolar inhibitor, is frequently used in the literature. Usually, this refers to the dissociation constant (Kd) for the in vitro equilibrium between target–ligand complex and free target and unbound ligand. Usually (but not always), a higher affinity of a compound for a particular target will increase its selectivity over other proteins in the system. |
** Pharmacodynamics (PD) is what the drug does to the body. In many drug discovery programmes, a key part of the early stages of the project is to establish pharmacodynamic markers that can be used to make the link between binding of compound to the target and the effect seen on the cell – i.e. being sure that the activity is from interaction with that particular target. As lead optimisation progresses, it is the cellular (and eventually the in vivo) activity that guides the medicinal chemistry, so it is essential to ensure that the activity being measured is due to the compound binding to the target that is being used to inform the design. |
This journal is © The Royal Society of Chemistry 2005 |