Computational ligand-based rational design: role of conformational sampling and force fields in model development

Jihyun Shim and Alexander D. MacKerell, Jr. *
HSFII, 20 Penn St., Baltimore, MD 21201, USA. E-mail: amackere@rx.umaryland.edu; Fax: +410 706 5017; Tel: +410 706 7442

Received 10th February 2011 , Accepted 17th March 2011

First published on 18th April 2011


Abstract

A significant number of drug discovery efforts are based on natural products or high throughput screens from which compounds showing potential therapeutic effects are identified without knowledge of the target molecule or its 3D structure. In such cases computational ligand-based drug design (LBDD) can accelerate the drug discovery processes. LBDD is a general approach to elucidate the relationship of a compound's structure and physicochemical attributes to its biological activity. The resulting structure–activity relationship (SAR) may then act as the basis for the prediction of compounds with improved biological attributes. LBDD methods range from pharmacophore models identifying essential features of ligands responsible for their activity, quantitative structure–activity relationships (QSAR) yielding quantitative estimates of activities based on physiochemical properties, and to similarity searching, which explores compounds with similar properties as well as various combinations of the above. A number of recent LBDD approaches involve the use of multiple conformations of the ligands being studied. One of the basic components to generate multiple conformations in LBDD is molecular mechanics (MM), which apply an empirical energy function to relate conformation to energies and forces. The collection of conformations for ligands is then combined with functional data using methods ranging from regression analysis to neural networks, from which the SAR is determined. Accordingly, for effective application of LBDD for SAR determinations it is important that the compounds be accurately modelled such that the appropriate range of conformations accessible to the ligands is identified. Such accurate modelling is largely based on use of the appropriate empirical force field for the molecules being investigated and the approaches used to generate the conformations. The present chapter includes a brief overview of currently used SAR methods in LBDD followed by a more detailed presentation of issues and limitations associated with empirical energy functions and conformational sampling methods.


1. Introduction

When ligands and data on the biological activities of those ligands are the only information available for drug development, computer-aided ligand based drug design (LBDD)1–6 is an effective method to extend the knowledge of the known ligands to design compounds with improved biological activity. The importance of LBDD is emphasized by more than 50% of current FDA-approved drugs targeting membrane proteins such as G protein coupled receptors (GPCRs), nuclear receptors, and transporters,7 for which three-dimensional (3D) structures are often not available, a necessary prerequisite for target based drug design approaches.8–10 Considering the difficulties in determining 3D structures of membrane-associated proteins, LBDD methodologies are anticipated to continue to have a significant impact on drug development into the foreseeable future.11,12

Drugs typically exert their pharmacological effects by specific interactions with their target proteins. Such unique interactions have been understood as “lock-and-key”,13 “induced-fit”,14 “conformational selection” or “population shift” hypotheses,15–19 which are based on the inherent chemical structure of molecules, their dynamic conformational properties and how those two influence the receptor. Therefore, identifying any causation or correlation between structures and activities, referred to as a structure–activity relationship (SAR),20–22 can be of utility for ligand design. LBDD based SAR identifies similarities and/or differences in structural or physicochemical properties of compounds and relates them to activity, including efficacy (eg. activation or stimulation of receptors, Vmax of enzymes), affinity (eg. Ki), selectivity (eg. Ki, isoform1/Ki, isoform 2), pharmacokinetics (ADME),23,24drug-drug interactions, or any biological properties of interest. Various descriptors of the ligands are related to biological activities through various statistical methods, for instance, regression, classification, dimension reduction, variable selection, etc. from which important features of the ligands responsible for activity are identified and used to develop new leads or to optimize known ligands.

Three major categories of LBDD are quantitative structure activity relationship (QSAR),25–27 pharmacophore modeling,28–32 and similarity searching.33–36 Over several decades, statistics, computational algorithms, and descriptors comprising the three categories and their pipelining have led to significant improvements both in efficiency and accuracy. Programs can deal with 100–1000s of molecules to build models or search molecular properties against databases of millions compounds in a short period of time. Overall improvements have been achieved by sophisticated data mining techniques and by more accurate mathematical descriptions of molecules through molecular mechanics (MM)37 and quantum mechanics (QM) methods.38

Recent advances in statistical, algorithmic and chemoinformatics in relation to LBDD have been discussed elsewhere in depth.39–42 This chapter will briefly overview LBDD followed by a detailed presentation of new developments in the areas of conformational sampling and force fields (FF) with respect to LBDD.

2. Basic components of computer-aided LBDD

2.1. Representation of molecules

Molecules may be described in different ways ranging from one to three dimensional (3D) and higher methods. For simple counting of molecular constitutions or fragments in 1D, one can use line notation such as SMILES (Simplified molecular input line entry specification)43 and SLN (SYBYL line notation)44 or chemical fingerprints, such as the MACCS representation.45 1D representations are used for fast lookup and comparison and in some cases do not yield a unique description of the molecules, as in the MACCS fingerprints. When molecules are represented as a graph,46 atoms are nodes and bonds are edges connecting the nodes, yielding a 2D description of a molecule. Information on bond or atom types, atom size, or stereochemistry and so on can be stored in the form of COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
matrix
and readily accessed. Along with the graph representation, a simple connection table may be used to calculate 2D molecular properties, for example, molecular weight, molar refractivity, number of rotatable bonds, branching, number of hydrogen bond acceptors and donors, and sum of atomic polarizabilities. Such descriptors are widely used in QSAR analysis. For a more detailed and realistic representation of molecules, a 3D representation, typically in the form of atomic Cartesian coordinates, is required. Such 3D descriptions also allow for the calculations of various descriptors and, more importantly, can represent the bio-active conformations of a molecule. This is particularly important when comparing compounds with different chemical structures that may show similar biological properties by having similar 3D placement of biologically important functional groups. 3D descriptors, such as the spatial relationship between functional groups, may be calculated using semi-empirical or ab initioQM methods for small size ligands (number of atoms ≤ 100) and MM for most ligands. With growing computational power, the use of QM & MM is significantly increasing in the field of LBDD.1,38 Beyond 3D methods are 4D and higher representations. For example, the different possible conformations of the 3D structure of a molecule may be considered a 4D representation. The remainder of section 2 will present different nD representations of molecules and their utilization in LBDD.

2.2. 2D-QSAR

A large number of 1D or 2D molecular descriptors have been developed.47,48 Most software packages that include a QSAR module calculate a range of descriptors such as physicochemical, electronic, topological, and shape properties. Lipinski's rule of five49 is a classical example of a straightforward application of QSAR where bioavailability is related to descriptors including COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
octanol
/water partition coefficient (logP), molecular weight, number of hydrogen bond donors and acceptors, and number of rotatable bonds.

Development of SAR models often requires pre-processing of the descriptors prior to model development. Values are often normalized such that coefficients obtained from fitted models represent the significance of the individual descriptors. Importantly, given the large number of possible 1D/2D descriptors it is necessary that the number of descriptors used during model development be limited by selection methodologies.50,51 In simple terms highly correlated descriptors are typically removed from model development. Descriptor selection is then often linked to model development itself. Those descriptors most predictive of a target property are selected by iterative analysis (stepwise multiple linear regression (MLR),52 replacement method53) or by learning algorithms (Genetic algorithm,54 adaptive fuzzy partition algorithm,55 Gaussian processes,56 or Genetic function approximation (GFA)57). Correlated or redundant descriptors may also be eliminated by partial least square (PLS)58–61 or principle component analysis (PCA).62,63

Given a suitable training set (i.e. set of compounds with known biological activities) and descriptors, one applies statistical methods according to the characteristics of the data set. When linearity is present MLR and PLS are good choices. Nonlinearity64,65 occurs when one handles a large number of non-homologous data sets, for example in pharmacokinetic (PK) studies where multiple biological phenomena such as absorption and metabolism can impact the biological data, or when activities are influenced by many factors such as receptor dimerization, existence of receptor isoforms, and conformational changes. Selection (or design) of the experimental data for model development can minimize nonlinearity, thereby reducing possible ambiguities in the developed QSAR but users need to keep in mind the inherent complexity of biological phenomena. Selection of the appropriate statistical methods also depends on whether the goal of the study is interpretation or predictability. Classical methods such as MLR produce explicit physical meaning but predictability is not as good as using modern statistical tools that improve predictability. However, interpretability is often compromised with improved predictability. While the MLR method, which maximizes interpretability, was the most used method in 2008, it was followed by PLS & support vector machine (SVM) approaches that yield improved predictability. Newly developed statistical methods39 include Gene Expression Programming (GEP),66 Project Pursuit Regression (PPR),67 Local Lazy Regression (LLR)68 while recent variations in QSAR approaches are hologram-QSAR,69 auto-QSAR,70 and inverse-QSAR71 among others. Hologram-QSAR69 partitions molecules into smaller fragments and uses size, length, and as well as additional information on those fragments as descriptors. It is usually combined with PLS to derive a QSAR. Auto-QSAR70 is an automated QSAR where the best descriptors, the best statistical methods, and validations are chosen for given set of molecules and updated as the number of molecules in training set increases. In Inverse-QSAR71 after a QSAR model is built, distributions of descriptors yielding optimal activity are estimated and structures are generated or searched that match those distributions.

As a last step in QSAR model development, the models require validation.72,73 Approaches to do this include cross-validation, y-randomization, or external test set. It is generally perceived that leave-one-out or leave-n-out cross-validations do not necessarily indicate predictability directly and external validation using compounds not included in model development is recommended to verify predictability of the developed models. An overview of statistical methods used in QSAR analysis is given in Table S1 of the supporting information.

2.3. nD-QSAR (3 ≤ n ≤ 7)

QSAR methods based on 3D descriptors (Table 1) such as molecular volume, surface area, ΔGsolvation, dipole moments, HOMO, and LUMO, depend on the chemical and spatial features of molecules. Alternatively, 3D-QSAR74–76 may be based on the molecular interaction field mapped onto a 3D grid surrounding the molecules of interest. The descriptors are the magnitudes of the fields at the grid points, an approach used in CoMFA (comparative molecular field analysis)77 and CoMSIA (comparative molecular similarity indices).78 For example, in CoMFA polar or hydrophobic probes are placed on grid points and non-bonded interaction energies with the ligands are calculated, with the resulting values used as descriptors for each molecule, such that each molecule has, in essence, a number of descriptors that correspond to the number of grid points. The grid point values are subjected to statistical methods such as PLS or PCA and related to biological activities. Notably, these approaches require alignment of the different ligands being studied. When flexibility is added to the shape information by using multiple conformations, it is classified as 4D-QSAR.79,80 Although 3D-QSAR can use multiple conformers it means multiple model evaluations, with the input into each model being one static conformer for each ligand in the training set. 4D-QSAR, in one embodiment, overcomes this limitation by using grid cell occupancy descriptors calculated based on multiple conformers. 5D-QSAR81,82 attempts to construct a pseudo-receptor based on ligand information in combination with a GA to vary the grid point locations to produce a favorable induced-fit state. Beyond this, 6D-QSAR83 incorporates solvation energy terms. Finally, the inclusion of 3D structure of the target from X-ray crystallography or NMR in the models represents the highest dimension, 7D, applied to date, though the approach is no longer formally LBDD.
Table 1 Overview of nD-QSAR approaches (3 ≤ n ≤ 7)
Method Description
CoMFA Probes placed on grid points in the 3D field around a molecule experience an interaction energy with the ligands that defines the molecular shape and electrostatic properties in the surrounding environment.
CoMSIA It expands CoMFA by including hydrophobic and hydrogen bonding contributions and calculates how these contributions are similar between molecules.
GRIND It eliminates alignment-dependency by using distances between 3D grid points. Highly relevant regions among a set of molecules are selected as nodes and the intensity of molecular interaction field at those nodes are used as descriptors.
VolSurf Information on 3D grid voxels (shape, electrostatic, volume) are compressed into 2D numerical descriptors by image analysis tools.
4D-QSAR Multiple conformations in a grid box generate the occupancies at grid points, with those occupancies used as the descriptors.
5D-QSAR Multiple hypothetical binding pockets are generated around ligands based on a 3D grid and the receptor models are evolved by GA with the most favorable binding pocket model evaluated by relative free energy of ligand binding.
6D-QSAR Includes optimization of structures in aqueous solution and calculates solvation energy and charges by semi-empirical QM method, AMSOL.72 Ligands' arrangement in pseudo-binding pocket is determined by MC simulation.


Of the methods in Table 1, CoMFA and CoMSIA are the most widely used. Their main limitations are their dependency and sensitivity to conformations and alignments of the molecules under study. Different occupancies by different conformations or changes in molecular alignments can cause different interaction fields yielding different QSAR models. To overcome the limitations, alignment-independent 3D-QSAR was developed by transforming 3D-grid data into 2D descriptors such as GRIND (Grid independent descriptors)84,85 and VolSurf.84 The approaches are listed and summarized in Table 1.

2.4. Pharmacophore modelling

The IUPAC definition86 of a pharmacophore is “the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interaction with a specific biological target structure and to trigger (or block) its biological response.” Pharmacophore modelling is closely related to 3D or 4D-QSAR and commonly used pharmacophoric keys include hydrogen bond donors and acceptors, ionizable groups, aromatic rings, aliphatic hydrophobic groups, among others. In many cases superimposed compounds based on pharmacophore models are the starting point for 3D-QSAR analysis. Alternatively, pharmacophore features may be identified by building 3D-QSAR iteratively with different conformations or alignment attempted to increase activity prediction.28 As with 3D-QSAR methods, critical steps in pharmacophore modelling are selection of bio-active conformations and structure alignments. Conformations can be pre-calculated and saved in a database or be generated on-the-fly during alignments. Various sampling methods used to generate multiple conformations will be discussed below. Using a collection of pharmacophoric keys or points (i.e. functional groups that may contribute to biological activity) and a conformational ensemble which is expected to include the active conformation, molecular alignment (or superposition)1,28,31,32,87 is carried out by mapping fragments of the compounds with a target function being minimized. In the clique detection algorithm88 distances between features are stored in a matrix and the differences between two matrices become the target function. Another way is using GA,89 where a chromosome describes a molecule and genes encoded in it represent the ligand fragments to be matched to the features of the reference compound. The target function or fitness function in GA can include a range of information such as the similarities between features and volumes of aligned structures. The GA is often extended for on-the-fly conformational sampling by including geometric information of a molecule into the chromosomes. FlexS algorithm90 is also used in pharmacophore feature mapping. It decomposes a molecule and grows it on top of the reference compound beginning from an anchor fragment. In all cases, it is important to select a reference structure that has high experimental activity, a known 3D binding conformation or a favorable docking score to facilitate interpretation of the obtained model. Details of methods used by the major pharmacophore modelling programs and recent research on alignment methods were reviewed in Leach et al.31 and Lemmen and Lengauer.87 Such alignment methods are also of general utility for pharmacophore-based similarity searching.

2.5. Ligand based similarity searching

Similarity searching is an effective, computationally accessible method to identify compounds with qualities similar to that of an active, lead compound. For example, if the number of ligands for which biological activity is known are too few to build a QSAR model, similarity searching may be effective. Similarity searching can be based on a number of features including chemical fingerprints, physiochemical properties as well as 2D and 3D features selected by QSAR or pharmacophore models. If compounds structurally similar to active compounds are desired, searches based on chemical fingerprints are appropriate. However, if the goal of the study is the identification of ligands that have a new scaffold/chemical structure but maintain the desired biological activity (i.e. scaffold hopping91), searching based on physiochemical properties may be of utility. The most widely used descriptors in similarity searching are chemical fingerprints or large numbers of physiochemical properties. Fingerprints can have diverse properties and combinatorial or compressed fingerprints are emerging and efforts are being made to improve the fingerprint representations. To quantify the extent of similarity between compounds, different similarity measures are used alone or in combinations; these include the Tanimoto, Cosine, Hamming, Russel-Rao, and Forbes indices. Finally, it should be noted that it is useful to perform successive searches using the nearest neighbors of a query compound. A number of software packages, including MOE (Molecular Operating Environment, Chemical Computing Group92) and Discovery Suite (Acclerys Inc93) allow for similarity searching. Public small molecule databases such as PubChem94 ZINC95 and ChEMBL96 or open source software97 using those databases provide similarity searching and clustering tools. Notable are clustering techniques98 where output structures from a similarity search are further grouped into subsets to reduce redundancy and to check diversity in compounds selected, for example, for a target-based database screen.99 Results of clustering vary based on classification algorithms, descriptors, and similarity measures100–102 and there is no gold standard to obtain the best clustering. Therefore it is desirable to perform clustering with a combination of methods, descriptors, and similarity coefficients followed by manual evaluation of the results to achieve the desired outcome.

3. Conformational sampling

For the 3D and higher order methods it is essential that the appropriate, biologically relevant conformations be identified. Considering that drug-like molecules can have 10 or more rotatable bonds and each such bond may have 3 accessible rotamers a compound may have 310 conformations that must be considered. The need to access a large number of these conformations is further emphasized by studies showing that bioactive conformations of compounds in X-ray crystal structures of ligand-protein complexes can have energies 15–20 kcal mol−1 higher than the global minimum.103–106 While the value of 310 is likely an overestimation due to steric clashes, it is evident that a major concern in modern LBDD methods is securing bioactive conformations given that model assessment includes multiple conformations in a large number of studies.104,105,107–111 Furthermore, taking all conformations into account during model development can account for the dynamic nature of molecules; 4D-QSAR79,80 and the Conformationally sampled pharmacophore (CSP)112–114 are representative methods that use this information.

Various methods are employed to generate multiple conformations of a ligand. Systematic search approaches115–117 formally perform an exhaustive sampling of conformational space, thereby covering the whole energy surface. However, as the degrees of freedom increases the number of possible conformations becomes enormous and often includes non-physical structures. Systematic search procedures therefore often limit the number of conformations accessed and select conformations within a user-defined energy difference range. For example, MOE supports a maximum of up to 10000 conformers and each one is subjected to trial model buildup. An alternative to systematic sampling is Monte Carlo based approaches118–120 where random changes in structures (i.e. trial moves) are attempted by rotation about a dihedral angle or other geometric change with the new conformations associated with the trial moves accepted or rejected according to the Metropolis criterion.119,120 In the Metropolis method a conformation with an energy, ΔE, lower than that of the previous conformation is accepted while conformations with higher, less favorable energies are kept based on the acceptance probability, p = e−ΔE/kT where k is the Boltzmann constant, T is the temperature and p is compared to a random number. If p is greater than the random number the conformation is accepted, allowing higher energy conformations to be sampled. In this method energy barriers are easily overcome by increasing the effective temperature but random elements still exist during sampling leading to inefficiencies due to similar conformations being accessed. The Poling method121 adds a penalty function (poling function) to the energy during the conformational search that is inversely proportional to the root mean square distance between conformations so sampling of similar conformation is avoided while accessing new conformation is maximized. With the same goal as the Poling method, Tabu search122 keeps a record of previous sampled states thereby maximizing the exploration of previously unsampled conformations. When GA123,124 is applied for conformational sampling, each chromosome is a conformer and contains genes corresponding to structural degrees of freedom in the molecule. Chromosomes undergo mutations and crossover resulting in the sampling of diverse conformations in the descendant conformations. Molecular dynamics (MD) simulations sample conformations deterministically according to Newton's equations of motion and overcome trapping in local minima due to the inclusion of kinetic energy in the system, as described below. All of the methods mentioned in this paragraph have advantages and disadvantages. For small molecules systematic search algorithms in combination with an accurate force field, as discussed below, can assure that all relevant conformations are taken into account. With larger molecules, exhaustive sampling of all accessible conformations is not feasible, MC or MD methods allow for extensive sampling of accessible conformations, though care must be taken to assure that all the relevant conformations are being sampled. One outstanding advantage of both MD and MC methods is that a variety of methods that allow for detailed representation of the biological environment of a molecule by, for example, the explicit treatment of waters and ions have been developed for these approaches. Given the wide use of MD methods for conformational sampling as well as for studies of the dynamics of molecules ranging from small ligands to large macromolecular complexes containing 1 million or more atoms, the remainder of the discussion of conformational sampling will focus on MD based approaches.125–129

MD simulations126,130,131 are based on Newton's equations of motions. The second law F = ma states that from position ri(t), velocity vi(t), and mass mi for an atom i at time t, force Fi(t) can be calculated. In MD simulations, forces are usually obtained from analytical derivatives of the potential energy function and integration methods are used to obtain new positions ri(t + δt) and vi(t +δt) from the previous states, ri(t), vi(t), and ai(t). For example, the original Verlet integrator132 uses a Taylor series expansion of position. Summing eqn (1a) and 1b yields 1c which determines the new position. This integration minimizes memory requirements as it is not necessary to store velocities, although they can readily be calculated using eqn (1d) if required.

 
ugraphic, filename = c1md00044f-t1.gif(Equation 1a)
 
ugraphic, filename = c1md00044f-t2.gif(Equation 1b)
 
r(t + δt) = 2r(t) − r(tδt) + a(t)δt2(Equation 1c)
 
ugraphic, filename = c1md00044f-t3.gif(Equation 1d)

Leapfrog Verlet integration133,134 uses an expansion of positions and velocities to the second order and an interval 1/2δt instead of δt. Subtraction of eqn (2b) from 2a yields 2c. New positions may then be obtained by substituting v(t) from rearrangement of 2a into v(t) of 1a and truncating the expansion at the velocity term yielding eqn (2d).

 
ugraphic, filename = c1md00044f-t4.gif(Equation 2a)
 
ugraphic, filename = c1md00044f-t5.gif(Equation 2b)
 
ugraphic, filename = c1md00044f-t6.gif(Equation 2c)
 
ugraphic, filename = c1md00044f-t7.gif(Equation 2d)

Extended integration methods have been developed to enhance accuracy and provide special features for the simulation system of interest.135 For example, for simulations of aqueous solutions it may be desirable to reproduce constant pressure, temperature, or volume in accord with the specific ensemble being targeted. For example, simulations in the constant pressure, temperature and number of particles ensemble (NPT) may be used to calculate Gibbs free energies while a constant volume, temperature and number of particles ensemble (NVT) yields Helmholtz free energies. However, such simulations require the appropriate boundary conditions, such as periodic boundaries.135 While these are necessary for sampling the conformations of ligands in the presence of explicit solvent, for LBDD MD simulations are typically performed in the absence of explicit solvent. Such simulations may be performed in the “gas phase” or using implicit solvent models to treat the solvent environment, as detailed below.

High temperature MD has long been used to facilitate the crossing of high energy barriers to assure a broad sampling of conformational space. In high temperature MD, the probability of particles having the necessary velocity (or kinetic energy) to cross energy barriers is increased over room temperature simulations. While sampling in MD is driven by information on the molecular forces thereby guiding conformational sampling to physically meaningful regions, unwanted sampling may occur in high temperature MD. This is due to the high temperature leading to sampling of conformations that are inaccessible at room temperature thereby causing inefficient use of computational effort. However, care to avoid excessively high temperatures can minimize this problem and a number of protocols, referred to as simulated annealing,136 perform high temperature MD followed by room temperature MD to assure that conformations relevant to the latter are being sampled.

A simple way to improve sampling via MD simulations is to perform multiple simulations of the system starting with different initial random number seeds to assign the velocities to the particles in the system. Typically, a Gaussian distribution of velocities are randomly generated and assigned to each particle with those initial velocities satisfying a Maxwell-Boltzmann distribution defining a selected temperature. While the overall velocity distribution is approximately reproduced with the different random number seeds yielding the same macroscopic temperature, the individual atoms have different velocities, thereby directing the molecule to sample different conformations. However, this approach does not always avoid kinetic trapping for larger molecules due to large barriers often associated with large conformational changes.

Methods that go beyond the use of high temperature and multiple MD runs are referred to as generalized ensemble (GE) algorithms.137–139 These include replica exchange MD (REXMD),140meta-dynamics,141,142 accelerated MD (AMD)143 and λ-dynamics144 among others. In GE algorithms energy barriers are overcome by adding an external biasing potential(s) to the system. This may be performed by accessing additional conformations from additional simulations, as in REXMD, or by approaches that directly modify the free energy landscape of the system. Many GE MD simulation approaches sample the free energy landscape efficiently and may be used to calculate accurate free energy differences. The free energies are often calculated by thermodynamic integration (TI)145 or the weighted histogram analysis method (WHAM).146

Standard REXMD involves parallel independent simulations (replicas) at a range of temperatures and exchanges conformations between replicas according to an exchange probability (eqn (3)).

 
ugraphic, filename = c1md00044f-t8.gif(Equation 3a)
 
ugraphic, filename = c1md00044f-t9.gif(Equation 3b)
 
ugraphic, filename = c1md00044f-t10.gif(Equation 3c)

In eqn (3)Ti indicates the temperature of replica i, qi is the configuration of replica i at the point of exchange, and Ei represents Hamiltonian energy of replica i. The main idea behind REXMD is that one MD trajectory in a local minima can take conformational information from another replica which may be found in another region of conformational space (e.g. across an energy barrier) but have similar energies. The probability of exchange between replicas is such that it enforces sampling of a Boltzmann distribution of conformations, thereby satisfying a proper thermodynamic ensemble as defined by the simulation conditions. Implementation of REXMD is not straightforward, with issues including how to set up the proper temperature spacing, the number of replicas, and the exchange frequency. For large systems with explicit solvent, REXMD requires a large number of replicas and small temperature difference between adjacent replicas to achieve an acceptable exchange ratio. To overcome this implicit solvent models, as discussed below, may be used thereby allowing for a significant increase in the difference in temperature between adjacent replicas. For example, in the presence of explicit solvent replicas may have temperature differences of 10 K while when implicit solvent is used 30 K may be used. In addition, hybrid methods have been developed.147 Concerning, exchange frequency, higher exchange frequencies typically lead to enhanced sampling.148 However, care must be taken as although high temperature enhances barrier crossing, it may shift the equilibrium between two states and make high temperature states more favorable throughout all replicas.

Hamiltonian replica exchange molecular dynamics (HREXMD)149–151 overcomes drawbacks of REXMD by scaling the potential energy function (i.e. Hamiltonian) rather than the temperature (eqn (3c)). Perturbation of the Hamiltonian can involve almost any term in a force field such as the peptide backbone conformational energies, dielectric constant or ligand-solvent interactions. HREXMD needs a lower number of replicas than REXMD since the perturbation is applied locally on selected components of the system. Generally the perturbation is expressed as a function of an order parameter λ in

 
H(λi) = λiH1 + (1 − λi)H0(Equation 4)
where 0 ≤ λi ≤ 1, such that the Hamiltonian is the reference state H0 (i.e. the ground state) when λ is zero and target (i.e. fully perturbed) state H1 when λ is one. In HREXMD, simulations are carried on each replica with H(λi) and conformations exchanged between adjacent H(λi) replicas thereby facilitating the crossing of energy barriers.

There are single MD approaches using dynamic variations of λ. Lambda-dynamics144,152 implements λ as an artificial particle that is propagated during the MD simulations thereby sampling various λ values as dictated by the free energies of the system without the need of using pre-defined λ values. The Hamiltonian is expressed by adding two more terms for dynamic λ variables to H(λi), as shown below, and {λi} is used since multiple λs can be used to perturb different components of the Hamiltonian such as electrostatic or dihedral energies. In the approach the extended Hamiltonian, comprised of the standard, ground state Hamiltonian and the perturbations associated with the λ terms is defined as

 
ugraphic, filename = c1md00044f-t11.gif(Equation 5)
where mi and λi are dynamic variables which overcome the limitation of using discrete λ values and U*({λi}) is a λi-dependent biasing potential which can take various forms to sample as many states as possible.

Another single dynamics GE method, Metadynamics141,142,153 uses a history-dependent biasing potential to force selected degrees of freedom (e.g. collective variables, CV) of the system being sampled away from conformations visited frequently. This is performed by “lifting” low energy regions with a biasing potential as those regions are being sampled, thereby facilitating conformational changes away from the low energy regions. The biasing potential is the sum of Gaussian functions, VG, that are used to fill valleys of the free energy surface as defined as follows

 
ugraphic, filename = c1md00044f-t12.gif(Equation 6)
where U(q) is the potential energy of coordinates q at time t, h is Gaussian height, w is Gaussian width, and CV(t) is the value of the CV at time t. The simulation remembers information about the added biasing potentials and the final VG is a negative image of the free energy surface thereby allowing reconstruction of the original free energy surface. Metadynamics are able to run with multiple CVs such as distance between two atoms, angles, or torsion angles; the choice of CVs, and optimal h and w for each CV are user selected and optimization of these parameters for the system of interest is often required.

Taking a similar strategy, the orthogonal space random walk (OSRW) algorithm154,155 is another efficient way of conformational sampling. This strategy simultaneously perturbs the order parameter space (general term for λ or CV above) and generalized free energy space to overcome not only local minima trapping but also lagging of changes in the environment surrounding the CV or λ required for conformational changes to occur. OSRW uses 2D Gaussian-shaped repulsive potentials to flatten the free energy surface and avoid often-visited states. After searching the whole conformational space it is possible to select accessible conformations by order parameters associated with the conformational change.

Another approach is accelerated MD (AMD).143AMD is a simple but efficient sampling method that has shown good performance for biomolecules. Compared to metadynamics and OSRW, it uses a simpler form of the biasing potential (eqn (7)). When applying the method, the boost energy E and α, which is a tuning factor for the biasing potential's well-depth, need to be pre-defined and the biasing potential is applied when the potential energy is less than E.

 
ugraphic, filename = c1md00044f-t13.gif(Equation 7)

REXMD and HREXMD have been successfully used for conformational sampling of flexible ligands,113,156–158 while the other GE algorithms have been used mainly in biomolecules to date. However, considering their success in conformational sampling, problems involved in flexible protein loops and ligand passage in receptors,159,160 it is anticipated that they will be of utility to conformational sampling in LBDD.

In addition to the sampling algorithms, an important consideration is the role of solvent in conformational sampling. Conformational changes and sampling are dependent on the surrounding environment of all molecules such that energetically favorable conformations in gas phase may not be favorable in solution (or a receptor binding site) and vice versa. This occurs due to COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
water
competing for favorable intramolecular interactions, for example, by disrupting intramolecular hydrogen bonds or reduced intramolecular dipole–dipole interactions. Alternatively, COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
water
can impact the orientation of hydrophobic groups, which may remain “accessible” in the gas phase, but cluster together in the presence of solvent. Therefore sampling conformations in the presence of explicit COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
water
molecules is ideal when studying biological systems but such calculations are computationally expensive due to the presence of additional particles in the system. In addition, the viscosity of the water surrounding a molecule can slow conformational sampling during MD simulations, thereby further increasing the computational costs. An effective alternative to explicit solvent are continuum solvent models that allow for distributions of conformations to be obtained that approximate an explicit solvent environment while allowing for efficient sampling by avoiding the increased number of particles and the viscosity issues with COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
water
. A large number of implicit solvent models are available and recent reviews on the topic have been presented.161–163 Simple implicit solvent models use constant or distance-dependent dielectric constants. Another approach is using solvation parameters for each atom based on their solvent accessibility so that particles in the system have varying responses to environments. More accurate solutions to solvation effects are obtained by the Poisson–Boltzmann (PB)164,165 or Generalized Born (GB)166 models. In MD, the most widely used methods are GB models due to the high efficiency of their analytical solution and comparable accuracy with respect to PB models that typically require numerical solutions. GB models calculate the electrostatic component of the solvation free energy as shown in eqn (8), with that term added to the total energy.

 
ugraphic, filename = c1md00044f-t14.gif(Equation 8)

In eqn (8) ε is the dielectric constant of solvent, qi is the partial atomic charge of atom i, rij is the interatomic distance between atom i and j, and Ri, which is the most important parameter in GB models, is the effective Born radii of atom i. The effective Born radii can be understood as an atom's degree of burial within the solute or radius exposed to the solvent environment. A number of GB models have been developed such as GBMV167 and GBSW168 in CHARMM169,170 and GBOBC162 in AMBER171,172 which differ primarily in the way that the effective Born radii are calculated.

The final portion of this section presents a simple example of sampling the conformational space of a peptide using three MD-based sampling approaches based on the CHARMM22/CMAP protein force field.173Fig. 1 shows the extent of sampling attained by MD in both vacuum and explicit solvent (Fig. 1a and 1b) and by two sampling methods (single MD vs. HREXMD using explicit solvent, Fig. 1b and 1c) for the flexible opioid pentapeptide, Leu-enkephalin. For the explicit solvent HREXMD, λ = 0, 0.14, 0.19, 0.27, 0.37, 0.52, 0.72, and 1.0 where 1.0 represents a CHARMM phi, psi energy surface that is flat for non-glycine residues, as implemented using the CMAP tool.173 Simulations were performed using the REPDSTR module in CHARMM for 10 ns, attempting exchanges every 0.5 ps for the HREXMD. The range of conformational sampling was measured by the 2D probability distribution of the distance between two aromatic rings (A and B) and the angle between two aromatic rings and N-terminal nitrogen (N). Additional details of the simulation methodology are included in the supporting information. Comparison of Fig. 1a and 1b show that the sampling of conformational space by MD differs in gas phase vs. explicit solvent. In explicit solvent, structures with longer AB distances and larger ANB angles are being sampled. These represent more extended structures due to the presence of solvent; in the gas phase the peptide folds back on itself leading to more compact structures as no solvent is available to compete for intramolecular interactions. As expected, HREXMD (1c) samples a similar range of conformations as the standard explict solvent MD (1b), but the extent of sampling is more complete using the same amount of simulation time. The more complete sampling by HREXMD is due to the simulation overcoming an energy barrier in the vicinity of 10–12 Å for the AB distance and 60–100° for the ANB angle. Although this example is not a rigorous test from which better performance can be proved based on more efficient sampling, it points out the importance of the simulation method and solvent environment when performing conformational sampling.


2D-probability distribution of AB distance-ANB angle pair of Leu-Enkephalin. Pharmacophoric point A represents the centroid of the aromatic ring of tyrosine, B is the centroid of the aromatic ring of phenylalanine, and N is the basic nitrogen. a) through c) compares different sampling of conformational space by a) gas phase MD, b) explicit solvent MD and c) explicit solvent HREMD. Simulation details are in the supporting information.
Fig. 1 2D-probability distribution of AB distance-ANB angle pair of Leu-Enkephalin. Pharmacophoric point A represents the centroid of the aromatic ring of COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
tyrosine
, B is the centroid of the aromatic ring of COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
phenylalanine
, and N is the basic nitrogen. a) through c) compares different sampling of conformational space by a) gas phase MD, b) explicit solvent MD and c) explicit solvent HREMD. Simulation details are in the supporting information.

An important consideration when performing conformational sampling is the extent of convergence; have all the accessible conformations of the molecule been sampled? When conformational sampling is done by MD, convergence of sampling may be checked by continuing the simulation time and testing if additional conformations are being sampled. If additional conformations are not being sampled the sampling may be considered converged. For conventional MD, root mean square deviations of overall structure, distance between atoms or functional groups (as in Fig. 1), or torsions may be used for simple evaluation of convergence. Alternatively, differences in probability distributions between two intervals of a trajectory (e.g. the first and second half) can also indicate if the simulation has reached convergence. For GE methods, convergence of the calculated free energy surface indicates adequate sampling. However, it should be emphasized that the appearance of convergence does not necessarily mean that true convergence has been attained. There is always the possibility that a molecule, especially more complex molecules such as polypeptides, may have access to significantly different conformations than those accessed in the performed simulations.

4. Force fields (FF)

While the appropriate sampling approaches can assure that the required range of conformations is being accessed it is the underlying energy function that largely determines the probability of the conformations being sampled. While QM methods can supply this information their computational demand limits their utility for sampling large numbers of conformations for even small molecules. Accordingly, it is necessary to use molecular mechanics energy functions. While such functions are computationally efficient, they are based on simple terms that require a set of parameters to allow for the energy and forces on a molecule to be accurately calculated, as described below. These parameters, therefore, dictate the applicability and quality of the force field and a number of force fields are available for drug-like molecules. In the remainder of this article an overview of the force fields most commonly used for ligands will be presented, including examples of the ability of selected force fields to reproduce QM conformational energies of two example ligands.

A force field consists of a potential energy function and the associated parameters that allow the energy and forces to be calculated as a function of the molecular structure and conformation. Potential energy functions used in molecular mechanics typically include terms for bond stretching, angle bending, rotation around bonds (dihedral or torsion angles), out of plane motions (improper angles), and non-bonded interactions (electrostatic and van der Waal energies). Such force fields are referred to as Class I models. In Class II force fields crossterms to treat correlation between bonds and angles, angles and torsions, and so on are included and different variations of the nonbond terms may be used. Eqn (9) and 10 show examples of potential energy functions in the two classes, which are classified by their simplicity and potential transferability. Detailed descriptions of the example functional forms are found in Brooks et al.169 for the class I force field in CHARMM and in Plimpton174,175 for a class II FF. Class I FFs include those specialized for biomolecules (see below). The simple functional form shown in eqn (9) is computationally efficient allowing them to handle macromolecules in aqueous or other condensed phase environments. Eqn (10) is a typical energy function used in a Class II FF and employs more complex form which facilities (but doesn't necessarily dictate) transferability across a wider range of molecules.

 
ugraphic, filename = c1md00044f-t15.gif(Equation 9)

In eqn (9) b, θ, S, n, φ, δ, and ω represent the bond length, angle, distance between 1–3 atoms, multiplicity, torsion, phase, and improper angle, respectively, and the subscript zero indicates the equilibrium geometry parameter. Kx are the associated force constants. The Lennard-Jones 6–12 equation is commonly used to model the van der Waals energy between atom i and j, where εij, rij, and Rmin,ij are the well-depth, interatomic distance, and interaction distance at minimum the energy between atoms i and j, respectively. In the electrostatic or Coulombic term, qi and qj are partial atomic charges, and ε is the dielectric constant.

For eqn (10) the same symbols are used as in eqn (9) and additionally b′ and θ′ used in the crossterms represent the second bond or angle associated with the cross interaction.

 
ugraphic, filename = c1md00044f-t16.gif(Equation 10)

To move from a potential energy function to a FF requires determination of the values of the parameters, a process referred to a parameter optimization or parametrization. Parameters to be optimized in eqn (9) and (10) include force constants, equilibrium geometries, partial atomic charges, well depth and interaction distance at minimum energy (Rmin,ij) and so on. The goal of parameter optimization is reproducing a collection of quantum mechanical and/or experimental observables for the ligands of interest.

Parameters for macromolecules such as proteins, nucleic acids, lipids and carbohydrates have been paid special attention and optimized extensively in Class I FFs such as AMBER(Assisted Model Building with Energy Refinement),171,172 CHARMM(Chemistry at Harvard Molecular Mechanics),169,170 GROMACS176 and OPLS(Optimized Potential for Liquid Simulations).177,178 Subsequently, parameters for drug-like molecules were added to be compatible with the individual biomolecular FFs while maintaining the accuracy of the “parent” biomolecular FF. To achieve this, the parent and related small molecule parameters use the same form of the potential energy function and strategy used for optimization of the FF. This is necessary to provide consistent and balanced energy and force evaluations during simulations of small molecule-biomolecular complexes. For example, the CHARMM General FF (CGenFF)179 follows the standard optimization procedure of the CHARMM additive biomolecular FF.180

As an example of parameter optimization the approach used in the CHARMM additive force field for small molecules, with which we are intimately familiar, will be used. Typically, optimization of CGenFF parameters is performed as follows. Force constants, equilibrium bond lengths and valence angles are parametrized to reproduce experimental or QM vibrational frequencies and geometries. Dihedral angle force constants, phases, and multiplicities are optimized targeting QM potential energy scans or spectroscopic data such as NMR J coupling constants. Charges are optimized by evaluating optimal distances and interaction energies of water interaction with the drug-like molecule based on QM calculations as well as dipole moments and optimization of LJ parameters is guided by the reproduction of pure solvent or crystal experimental data. To date, CGenFF includes approximately 150 atom types, 400 bond, 1200 angle, 3000 torsion parameters explicitly optimized based on 500 model compounds. When extending the force field to new chemical entities, parameters for the new molecules not already available in the CGenFF may be assigned by analogy for the bond/angle/dihedral and LJ terms, while determination of the partial atomic charges is based on a bond-charge increment algorithm extended to included angle- and dihedrals increments that has been trained to reproduce CGenFF charges for over 500 model compounds (K. Vennomeslaegh and A.D. MacKerell, Jr., work in progress). A web-based utility in the context of the ParamChem project is available to perform these functions. An important feature of CGenFF when automatically assigning parameters is information about the quality of the assigned parameter based on a penalty score. This is important as the ability of parameters to be transferred between molecules in the context of empirical force fields is limited, as shown below, and it allows users to know which parameters require validation and further optimization to obtain the required level of accuracy. However, as with conformational sampling, even in cases where the parameters are directly transferred to a new molecule, the possibility that those parameters may not perform with adequate accuracy exists, such that the aware user is advised to perform validation tests of the transferred parameters, as previously described.179

Beyond CGenFF there are a number of other small molecule FFs designed to be compatible with biomolecular FFs. GAFF (General AMBER FF)181 was developed for the simulation of pharmaceutical compounds with the AMBER biomolecular FF. It is based on QM optimization of about 3000 model compounds and geometric information from the Cambridge structure database (CSD).182 It has 57 atom types, 700 bond length parameters, 3000 angle parameters, and 500 dihedral angle parameters. Beyond these available parameters, the Antechamber toolkit183 is used to assign parameters for novel molecules. SwissParam is a web-based utility used to generate CHARMM consistent parameters for ligands. It takes internal energy parameters and charges from MMFF (Merck Molecular FF)184,185 while cubic and quadratic terms for bond, angle, and improper energies that are present in the Class II force field are truncated as required for use with the Class I CHARMM additive FF. In addition, van der Waals energy parameters are from the CHARMM additive FF based on atom type similarity. However, it should be noted that the nonbond parameters being derived in a different manner than that of the parent biomolecular FF make them formally incompatible with the CHARMM biomolecular FF. OPLS-AA (All Atom) emphasizes parameters to reproduce the conformational energetic and condensed phase properties of small molecules for use in biological environments. Initial parameters were adopted from the OPLS-UA (united atom), AMBER, and CHARMM FFs and 50 model compounds were optimized focusing on torsion and non-bonded parameters.177 OPLS-AA uses experimental liquid properties as target data during parameter optimization. Thus, Class I biomolecular FFs have been extended to include parameters for a range of small molecules though the extent of chemical space covered and the quality of the parameters for those molecules vary significantly.

Class II FFs were initially designed to treat a wide range of small molecules. Examples include CFF/CVFF (consistent valence FF),186MM2 (molecular mechanics),187MM3,188MM4,189MMFF94,184 and Tripos 5.2 FF.190 These FFs are typically not optimized with respect to interactions with the environment, with the exception of MMFF, limiting their applicability. In general, Class II force fields were optimized to reproduce geometries, vibrational spectra and conformational energies in the gas phase, with the various cross and higher order terms in the energy functions (eqn (10)) included to allow for both better reproduction of those properties as well as facilitate transferability of the parameters to a wider range of compounds. MMFF94 is currently one of the most widely used FFs for small molecules and is available in numerous LBDD software packages for small molecule simulations. Its goal is broad applicability and QM data for over 3000 molecules and condensed phase data for 2800 CSD compounds were used to optimize and validate the parameters. Allinger and coworkers have developed the MM1-4 FF series achieving high accuracy for organic molecules with respect to geometries, conformational properties and heats of formation. Upon going from MM1 to MM2, the MMx series shifted to a simpler form of Class II FF and MM3 lead to further improvements by including more model compounds, additional experimental data, and higher energy conformations during parameter optimization. MM4 represents a further extension of a Class II FF due to the inclusion of four-fold torsional energy terms, torsion-improper-torsion cross terms, bond-torsion-bond cross terms, two torsion-bond cross terms for central and terminal bonds each and so on. MM4 was optimized targeting thermodynamic quantities ΔH, ΔS, ΔG, and geometries from QM calculations or experimental spectroscopic data. All of these class II FFs are primarily utilized for organic compounds in the gas phase, though MMFF94 has seen limited use in macromolecular condensed phase simulations.

Use of a FF for energy evaluation, energy minimization, MD simulation or other sampling approach represents a significant, important step forward in most modern LBDD studies and the quality of the FF plays an important role in the outcomes of such studies. As emphasized in the preceding paragraph, the various FFs were optimized targeting a training set of molecules. Accordingly, each FF may be anticipated to reproduce the energies and forces of the molecules in the respective training sets with reasonable accuracy. However, the question of transferability remains such that how accurate is the treatment of a molecule not in the training set originally used to optimize the FF. While a full investigation to address this issue represents a significant challenge, two examples of the transferability of selected FFs will be given targeting QM dihedral potential energy scans of COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
dimethyltryptamine
and dimethylamino[1,4]diazepine which are analogues of COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
serotonin
and COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
clozapine
.

Fig. 2 shows dihedral potential energy surface generated by MMFF, SwissParam, CGenFF, and, in 2b, CGenFF after additional parameter optimization along with QM data obtained at the MP2/6-31G* level using the Gaussian03 program.191 Dihedral angles, shown as curved arrows in the figure, were rotated in 15° increments and the geometries were optimized at each step. MMFF, SwissParam and CGenFF parameters were input into CHARMM and geometries were minimized to an RMS force of <10−6 kcal mol−1/Å. With COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
dimethyltryptamine
, MMFF and SwissParam underestimated the height of energy barriers and CGenFF had a different peak shape at 45° and 330°. Since SwissParam adopted parameters from MMFF, its energy surfaces were similar to that of MMFF, though not identical. This is due to the different representations of the nonbond terms, which contribute to the energy surfaces and emphasize the problem with mixing parameters from different force fields. Overall, the shape of the surfaces for COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
dimethyltryptamine
are acceptable for all three FFs. However, results for dimethylamino[1,4]diazepine emphasize that caution needs to be taken as when transferring parameters to new compounds (Fig. 2b and Table 2). The MMFF energy surface has local minima around 150° and 270° which will lead to errors in conformational sampling; similar problems are present with the parameters generated by SwissParam. When energies as a function of conformation are incorrectly represented the conformations selected from the sampling approach will typically be incorrect or the probabilities of those conformations improperly represented. Initial parameters from CGenFF showed poor agreement with the QM PES, but the FF is in significantly better agreement following optimization of selected dihedral angle parameters. The results with all the tested FFs indicate the limited ability to transfer parameters to new molecules. An advantage of CGenFF is that penalty values are provided for each parameter assignment. This alerts the user to possible limitations in the FF, such as occur in dimethylamino[1,4]diazepine's conformational energy. In such cases validation of the parameters and additional optimization should be performed as required. Efforts to extend the ParamChem web server to include an automated interface for parameter validation and optimization are ongoing (K. Vennomeslaghe, S. Pamidighantam, M. Sheetz and A.D. MacKerell, Jr. Work in progress).


Comparison of conformational energy surface. A) is potential energy surface (PES) of dimethyltryptamine and B) shows that of dimethylamino-dibenzo[1,4]diazepine.
Fig. 2 Comparison of conformational energy surface. A) is potential energy surface (PES) of COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
dimethyltryptamine
and B) shows that of COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
dimethylamino-dibenzo[1,4]diazepine
.
Table 2 Root mean square deviation from QM potential energy surface
FF RMSD
COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
dimethyltryptamine
COMPOUND LINKS

Read more about this on ChemSpider

Download mol file of compound
clozapine
MMFF 0.87 5.04
SwissParam 1.42 5.42
CGenFF 0.62 12.84
CGenFF* n.d. 3.59


5. Conformationally sampled pharmacophore (CSP)

Leveraging the ability to perform extensive conformational sampling of small molecules using a properly optimized FF for the ligands of interest facilitated the development of a novel approach in our laboratory, the conformationally sampled pharmacophore (CSP).112–114,158,192,193CSP is a LBDD approach based on extensive sampling of conformational space, under the assumption that such sampling will lead to inclusion of the bioactive conformation being sampled despite that conformation not being known. The use of all accessible conformations in the CSP approach allows for probability distributions of different geometric features and/or physical properties to be determined, as shown in Fig. 1 for Leu-Enkephalin. As 4D-QSAR uses occupancy of lattice points on a 3D grid by conformations of the ligands being studied as descriptors, CSP uses probability distributions of pharmacophoric features (e.g. distances, angles and dihedrals) as descriptors for model development. The use of all accessible conformations in CSP has allowed it to be applied successfully to highly flexible molecules such as peptidic opioids112–114 and bile acids.158,192,193 A strength of the method is the ability to connect the pharmacophore models to molecular details of the ligands being studied thereby facilitating physical interpretation of the models and applying the knowledge for ligand optimization, including rational drug design. By including all conformations, CSP can often recognize subtle differences among structurally similar compounds as well common pharmacophore features among diverse compounds. By using the quantitative overlap of pharmacophoric feature probability distributions of different ligands rather than conformations of the ligands themselves during model fitting, the molecular alignment problem is eliminated. However, once a suitable model is developed the conformational distributions from the MD simulations used in CSP model development may be used to guide possible superposition thereby identifying the biologically relevant conformations of the ligands. For example, CSP model for δ-opioid receptor ligands demonstrated how flexible peptidic opioids can be superimposed with non-peptidic opioids.113

For proper CSP modelling, accurate FF and efficient conformation searching are needed. To date, MMFF and CGenFF have been used successfully. Conformational sampling has used extended MD simulations alone at both room and high temperatures112–114 and temperature REXMD simulations in implicit solvent.113,158,192Fig. 3 shows the general procedure used in CSP modelling. Once conformations are pre-enumerated for the training set of compounds, which may be performed using any of the above sampling methods, pharmacophore development is performed in an automated, computationally feasible fashion. As for identification of pharmacophore features, aromatic ring, ionizable groups, or hydrogen bond donors and acceptors can be identified, the associated probability distributions between the features calculated and, based on the extent of overlap of those distributions, the various combination of overlaps iteratively regressed against biological data, with those features yielding the best correlation with experimental data used for further model development. Notably, the CSP method can be readily combined with physicochemical descriptors to further facilitate model development.158,192


Schematic diagram of CSP procedures.
Fig. 3 Schematic diagram of CSP procedures.

6. Conclusions

Presented is an overview of computational ligand-based drug design approaches currently in use in rational drug design. Over the last 2–3 decades, a large number of methods have been developed and many of these have been implemented in readily accessible software packages. While this convenience is important for utilization of these methods, it is essential that users understand the assumptions and limitations in those methods allowing for decision on the suitability of the methods for a given project, what kind of knowledge one can obtain through the study, and which aspect is the limiting factor with respect to producing accurate SAR models. As many LBDD approaches require extensive sampling of conformational space emphasis in this article was placed on recent FF development and the use of MD simulations and related techniques for conformational sampling.

Acknowledgements

This work was supported in part by National Institutes of Health grants DK67530, DA13583, and DA19634, Accelrys Inc., and the University of Maryland Computer-Aided Drug Design Center for computational resources.

Notes and references

  1. A. R. Leach and V. J. Gillet, An Introduction to Chemoinformatics, Springer, 2007 Search PubMed.
  2. K. M. Merz, D. Ringe and C. H. Reynolds, Drug Design: Structure- and Ligand-Based Approaches, Cambridge University Press, 2010 Search PubMed.
  3. C. G. Wermuth, The practice of medicinal chemistry, Elsevier/Academic Press, 2008 Search PubMed.
  4. D. C. Young, Computational drug design: a guide for computational and medicinal chemists, Wiley-Interscience, 2009 Search PubMed.
  5. A. D. Andricopulo, Curr. Top. Med. Chem., 2009, 9, 754–754 CrossRef CAS.
  6. I. M. Kapetanovic, Chem.-Biol. Interact., 2008, 171, 165–176 CrossRef CAS.
  7. J. P. Overington, B. Al-Lazikani and A. L. Hopkins, Nat. Rev. Drug Discovery, 2006, 5, 993–996 CrossRef CAS.
  8. P. J. Gane and P. M. Dean, Curr. Opin. Struct. Biol., 2000, 10, 401–404 CrossRef CAS.
  9. H. Jhoti and A. R. Leach, Structure-based drug discovery, Springer, 2007 Search PubMed.
  10. T. J. Marrone, J. M. Briggs and J. A. McCammon, Annu. Rev. Pharmacol., 1997, 37, 71–90 Search PubMed.
  11. S. Costanzi, I. G. Tikhonova, T. K. Harden and K. A. Jacobson, J. Comput.-Aided Mol. Des., 2009, 23, 747–754 CrossRef CAS.
  12. A. Tropsha and S. X. Wang, Ernst Schering Foundation Symposium Proceedings, 2006, 49–73 Search PubMed.
  13. E. Fischer, Ber. Dtsch. Chem. Ges., 1894, 27, 2985–2993 CrossRef CAS.
  14. D. E. Koshland, Proc. Natl. Acad. Sci. U. S. A., 1958, 44, 98–104.
  15. D. D. Boehr, R. Nussinov and P. E. Wright, Nat. Chem. Biol., 2009, 5, 789–796 CrossRef CAS.
  16. S. Kumar, B. Ma, C. J. Tsai, N. Sinha and R. Nussinov, Protein Sci., 2000, 9, 10–19 CAS.
  17. B. Ma, S. Kumar, C. J. Tsai and R. Nussinov, Protein Eng., Des. Sel., 1999, 12, 713–720 CrossRef CAS.
  18. C. J. Tsai, S. Kumar, B. Ma and R. Nussinov, Protein Sci., 1999, 8, 1181–1190 CrossRef CAS.
  19. C. J. Tsai, B. Ma and R. Nussinov, Proc. Natl. Acad. Sci. U. S. A., 1999, 96, 9970–9972 CrossRef CAS.
  20. S. M. Free and J. W. Wilson, J. Med. Chem., 1964, 7, 395–399 CrossRef CAS.
  21. C. Hansch and T. Fujita, J. Am. Chem. Soc., 1964, 86, 1616–1626 CrossRef CAS.
  22. C. Hansch and C. Selassie, Quantitative structure–activity relationship-a historical perspective and the future, Elsevier, Oxford, 2007 Search PubMed.
  23. S. Ekins, B. Boulanger, P. W. Swaan and M. A. Z. Hupcey, Mol. Diversity, 2002, 5, 255–275.
  24. S. Ekins, G. F. Ecker, P. Chiba and P. W. Swaan, Xenobiotica; the Fate of Foreign Compounds in Biological Systems, 2007, 37, 1152–1170 Search PubMed.
  25. D. A. Winkler, Molecular analysis and genome discovery, John Wiley and Sons, 2004 Search PubMed.
  26. P. Gedeck and R. A. Lewis, Curr. Opin. Drug Discovery Dev., 2008, 11, 569–575 CAS.
  27. T. Scior, J. L. Medina-Franco, Q. T. Do, K. Martínez-Mayorga, J. A. Yunes Rojas and P. Bernard, Curr. Med. Chem., 2009, 16, 4297–4313 CrossRef CAS.
  28. Y. C. Martin, B. T. John and J. T. David, in Comprehensive Medicinal Chemistry II, Elsevier, Oxford, 2007, pp. 119–147 Search PubMed.
  29. O. F. Güner, Pharmacophore perception, development, and use in drug design, Internat'l University Line, 1999 Search PubMed.
  30. C. G. Wermuth, Pharmacophores and pharmacophore searches, Wiley-VCH, 2006 Search PubMed.
  31. A. R. Leach, V. J. Gillet, R. A. Lewis and R. Taylor, J. Med. Chem., 2010, 53, 539–558 CrossRef CAS.
  32. G. Wolber, T. Seidel, F. Bendix and T. Langer, Drug Discovery Today, 2008, 13, 23–29 CrossRef CAS.
  33. V. J. Gillet and P. Willett, Compound selection using measures of similarity and dissimilarity In Comprehensive medicinal chemistry II, Elsevier, 2007 Search PubMed.
  34. P. Willett, Drug Discovery Today, 2006, 11, 1046–1053 CrossRef CAS.
  35. H. Geppert, M. Vogt and J. r. Bajorath, J. Chem. Inf. Model., 50, pp. 205–216 Search PubMed.
  36. K. Shanmugasundaram and A. C. Rigby, Comb. Chem. High Throughput Screening, 2009, 12, 984–999 CrossRef CAS.
  37. A. R. Leach, B. T. John and J. T. David, in Comprehensive Medicinal Chemistry II, Elsevier, Oxford, 2007, pp. 87–118 Search PubMed.
  38. T. Zhou, D. Huang and A. Caflisch, Curr. Top. Med. Chem., 10, pp. 33–45 Search PubMed.
  39. P. Liu and W. Long, Int. J. Mol. Sci., 10, 1978–1998 Search PubMed.
  40. C. W. Yap, H. Li, Z. L. Ji and Y. Z. Chen, Mini-Rev. Med. Chem., 2007, 7, 1097–1107 CrossRef CAS.
  41. O. Güner, O. Clement and Y. Kurogi, Current Medicinal Chemistry, 2004, 11, 2991–3005 Search PubMed.
  42. J. Gasteiger and D. T. Engel, Chemoinformatics: a textbook, Wiley-VCH, 2003 Search PubMed.
  43. D. Weininger, J. Chem. Inf. Comp. Sci., 1988, 28, 31–36 CrossRef CAS.
  44. S. Ash, M. A. Cline, R. W. Homer, T. Hurst and G. B. Smith, J. Chem. Inf. Comp. Sci., 1997, 37, 71–79 CrossRef CAS.
  45. J. L. Durant, B. A. Leland, D. R. Henry and J. G. Nourse, J. Chem. Inf. Comp. Sci., 2002, 42, 1273–1280 CrossRef CAS.
  46. D. Bonchev and D. H. Rouvray, Chemical graph theory: introduction and fundamentals, Taylor & Francis, 1991 Search PubMed.
  47. R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2002 Search PubMed.
  48. A. M. Helguera, R. D. Combes, M. P. González and M. N. D. S. Cordeiro, Curr. Top. Med. Chem., 2008, 8, 1628–1655 CrossRef CAS.
  49. C. A. Lipinski, F. Lombardo, B. W. Dominy and P. J. Feeney, Adv. Drug Delivery Rev., 2001, 46, 3–26 CrossRef CAS.
  50. M. P. González, C. Terán, L. Saíz-Urra and M. Teijeira, Curr. Top. Med. Chem., 2008, 8, 1606–1627 CrossRef CAS.
  51. A. J. Miller, Subset selection in regression, CRC Press, 2002 Search PubMed.
  52. N. R. Draper and H. Smith, Applied regression analysis, Wiley, 1981 Search PubMed.
  53. A. G. Mercader, P. R. Duchowicz, F. M. Fernández and E. A. Castro, Chemom. Intell. Lab. Syst., 2008, 92, 138–144 CrossRef CAS.
  54. J. Devillers, Genetic algorithms in molecular modeling, Academic Press, 1996 Search PubMed.
  55. M. Pintore, H. van de Waterbeemd, N. Piclin and J. R. Chrétien, Eur. J. Med. Chem., 2003, 38, 427–431 CrossRef CAS.
  56. O. Obrezanova, G. Csányi, J. M. R. Gola and M. D. Segall, J. Chem. Inf. Model., 2007, 47, 1847–1857 CrossRef CAS.
  57. D. Rogers and A. J. Hopfinger, J. Chem. Inf. Comp. Sci., 1994, 34, 854–866 CrossRef CAS.
  58. P. Geladi and B. Kowalski, Anal. Chim. Acta, 1986, 185, 1–17 CrossRef CAS.
  59. R. Rosipal and N. Krämer, in Subspace, Latent Structure and Feature Selection, 2006, pp. 34–51 Search PubMed.
  60. H. Wold, Partial least squares, Wiley, New York, 1985 Search PubMed.
  61. S. Wold, A. Ruhe, H. Wold and W. J. Dunn III, SIAM J. Sci. Stat. Comput., 1984, 5, 735–743 CrossRef.
  62. I. T. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 2002 Search PubMed.
  63. S. Wold, K. Esbensen and P. Geladi, Chemom. Intell. Lab. Syst., 1987, 2, 37–52 CrossRef CAS.
  64. L. Michielan and S. Moro, J. Chem. Inf. Mod., 50, pp. 961–978 Search PubMed.
  65. Y. Sakiyama, Expert Opin. Drug Metab. Toxicol., 2009, 5, 149–169 Search PubMed.
  66. H. Z. Si, T. Wang, K. J. Zhang, Z. D. Hu and B. T. Fan, Bioorg. Med. Chem., 2006, 14, 4834–4841 CrossRef CAS.
  67. J. H. Friedman and W. Stuetzle, J. Am. Stat. Assoc., 1981, 76, 817–823.
  68. A. J. Kulkarni, V. K. Jayaraman and B. D. Kulkarni, Comb. Chem. High Throughput Screening, 2009, 12, 440–450 CrossRef CAS.
  69. W. Heritage Trevor and R. Lowis David, in Rational Drug Design, American Chemical Society, 1999, pp. 212–225 Search PubMed.
  70. J. Cartmell, S. Enoch, D. Krstajic and D. E. Leahy, J. Comput.-Aided Mol. Des., 2005, 19, 821–833 CrossRef CAS.
  71. W. W. Wong and F. J. Burkowski, J. Cheminf., 2009, 1, 4–4.
  72. A. Golbraikh and A. Tropsha, J. Mol. Graphics Modell., 2002, 20, 269–276 CrossRef CAS.
  73. R. Kohavi, in IJCAI, 1995, pp. 1137–1145 Search PubMed.
  74. R. D. Clark, Curr. Top. Med. Chem., 2009, 9, 791–810 CrossRef CAS.
  75. S. Cross and G. Cruciani, Drug Discovery Today, 15, pp. 23–32 Search PubMed.
  76. J. Verma, V. M. Khedkar and E. C. Coutinho, Curr. Top. Med. Chem., 10, pp. 95–115 Search PubMed.
  77. R. D. Cramer, D. E. Patterson and J. D. Bunce, J. Am. Chem. Soc., 1988, 110, 5959–5967 CrossRef CAS.
  78. G. Klebe, U. Abraham and T. Mietzner, J. Med. Chem., 1994, 37, 4130–4146 CrossRef CAS.
  79. C. H. Andrade, K. F. M. Pasqualoto, E. I. Ferreira and A. J. Hopfinger, Molecules (Basel, Switzerland), 15, pp. 3281–3294 Search PubMed.
  80. C. D. Klein and A. J. Hopfinger, Pharm. Res., 1998, 15, 303–311 CAS.
  81. J. Polanski, Curr. Med. Chem., 2009, 16, 3243–3257 CrossRef CAS.
  82. A. Vedani and M. Dobler, J. Med. Chem., 2002, 45, 2139–2149 CrossRef CAS.
  83. A. Vedani, M. Dobler and M. A. Lill, J. Med. Chem., 2005, 48, 3700–3703 CrossRef CAS.
  84. G. Cruciani, M. Pastor and W. Guba, Eur. J. Pharm. Sci., 2000, 11(Suppl. 2) Search PubMed , S29-39-S29-39.
  85. M. Pastor, G. Cruciani, I. McLay, S. Pickett and S. Clementi, J. Med. Chem., 2000, 43, 3233–3243 CrossRef CAS.
  86. C.-G. Wermuth, C. R. Ganellin, P. Lindberg, L. A. Mitscher and A. B. James, Academic Press, 1998, pp. 385–395.
  87. C. Lemmen and T. Lengauer, J. Comput.-Aided Mol. Des., 2000, 14, 215–232 CrossRef CAS.
  88. A. T. Brint, J. Chem. Inf. Comput. Sci., 1987, 27, 152–158 CrossRef.
  89. G. Jones, P. Willett and R. C. Glen, J. Comput.-Aided Mol. Des., 1995, 9, 532–549 CrossRef CAS.
  90. C. Lemmen, T. Lengauer and G. Klebe, J. Med. Chem., 1998, 41, 4502–4520 CrossRef CAS.
  91. G. Schneider, P. Schneider and S. Renner, QSAR Comb. Sci., 2006, 25, 1162–1171 CrossRef CAS.
  92. Chemical Computing Group, http://www.chemcomp.com.
  93. Acclerys, http://accelrys.com.
  94. PubChem, http://pubchem.ncbi.nlm.nih.gov/.
  95. J. J. Irwin and B. K. Shoichet, J. Chem. Inf. Model., 2005, 45, 177–182 CrossRef CAS.
  96. ChEMBL, http://www.ebi.ac.uk/chembldb.
  97. B. O. Villoutreix, N. Renault, D. Lagorce, O. Sperandio, M. Montes and M. A. Miteva, Curr. Protein Pept. Sci., 2007, 8, 381–411 CrossRef CAS.
  98. J. M. Barnard and G. M. Downs, J. Chem. Inf. Comp. Sci., 1992, 32, 644–649 CrossRef CAS.
  99. L. C. Cerchietti, A. F. Ghetu, X. Zhu, G. F. Da Silva, S. Zhong, M. Matthews, K. L. Bunting, J. M. Polo, C. Farès, C. H. Arrowsmith, S. N. Yang, M. Garcia, A. Coop, A. D. MacKerell, Jr., G. G. Privé and A. Melnick, Cancer Cell, 17, pp. 400–411 Search PubMed.
  100. Y. C. Martin, J. L. Kofron and L. M. Traphagen, J. Med. Chem., 2002, 45, 4350–4358 CrossRef CAS.
  101. H. Matter, J. Med. Chem., 1997, 40, 1219–1229 CrossRef CAS.
  102. J. H. Nettles, J. L. Jenkins, A. Bender, Z. Deng, J. W. Davies and M. Glick, J. Med. Chem., 2006, 49, 6802–6810 CrossRef CAS.
  103. E. Perola and P. S. Charifson, J. Med. Chem., 2004, 47, 2499–2510 CrossRef CAS.
  104. D. K. Agrafiotis, A. C. Gibbs, F. Zhu, S. Izrailev and E. Martin, J. Chem. Inf. Model., 2007, 47, 1067–1086 CrossRef CAS.
  105. N. Foloppe and I. J. Chen, Curr. Med. Chem., 2009, 16, 3381–3413 CrossRef CAS.
  106. M. C. Nicklaus, S. Wang, J. S. Driscoll and G. W. Milne, Bioorg. Med. Chem., 1995, 3, 411–428 CrossRef CAS.
  107. S. Günther, C. Senger, E. Michalsky, A. Goede and R. Preissner, BMC Bioinformatics, 2006, 7, 293–293 CrossRef.
  108. J. Kirchmair, C. Laggner, G. Wolber and T. Langer, J. Chem. Inf. Model., 2005, 45, 422–430 CrossRef CAS.
  109. J. Kirchmair, G. Wolber, C. Laggner and T. Langer, J. Chem. Inf. Model., 2006, 46, 1848–1861 CrossRef CAS.
  110. M. J. Loferer, I. Kolossváry and A. Aszódi, J. Mol. Graphics Modell., 2007, 25, 700–710 CrossRef CAS.
  111. T. E. Moock, D. R. Henry, A. G. Ozkabak and M. Alamgir, J. Chem. Inf. Comp. Sci., 1994, 34, 184–189 CrossRef CAS.
  112. D. Bernard, A. Coop and A. D. MacKerell, Jr., J. Am. Chem. Soc., 2003, 125, 3101–3107 CrossRef CAS.
  113. D. Bernard, A. Coop and A. D. MacKerell, Jr., J. Med. Chem., 2007, 50, 1799–1809 CrossRef CAS.
  114. D. Bernard, A. Coop and A. D. MacKerell, Jr., J. Med. Chem., 2005, 48, 7773–7780 CrossRef CAS.
  115. D. D. Beusen, E. F. Berkley Shands, S. F. Karasek, G. R. Marshall and R. A. Dammkoehler, THEOCHEM, 1996, 370, 157–171 CrossRef CAS.
  116. M. Lipton and W. C. Still, J. Comput. Chem., 1988, 9, 343–355 CrossRef CAS.
  117. Garland R. Marshall, C. D. Barry, Heinz E. Bosshard, Richard A. Dammkoehlerand Deborah A. Dunn, in Computer-Assisted Drug Design, American Chemical Society, 1979, pp. 205–226 Search PubMed.
  118. Z. Li and H. A. Scheraga, Proc. Natl. Acad. Sci. U. S. A., 1987, 84, 6611–6615 CAS.
  119. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller, J. Chem. Phys., 1953, 21, 1087–1087 CrossRef CAS.
  120. N. Metropolis and S. Ulam, J. Am. Stat. Assoc., 1949, 44, 335–341.
  121. A. Smellie, S. L. Teig and P. Towbin, J. Comput. Chem., 1995, 16, 171–187 CrossRef CAS.
  122. D. Cvijovicacute and J. Klinowski, Science, 1995, 267, 664–666 CAS.
  123. N. Nair and J. M. Goodman, J. Chem. Inf. Comp. Sci, 1998, 38, 317–320 CrossRef CAS.
  124. A. L. Parrill, Drug Discovery Today, 1996, 1, 514–521 CrossRef CAS.
  125. C. Brooks and D. A. Case, Chem. Rev., 1993, 93, 2487–2502 CrossRef.
  126. C. L. Brooks, M. Karplus and B. M. Pettitt, Proteins: A Theoretical Perspective of Dynamics, Structure and Thermodynamics, John Wiley and Sons, 1990 Search PubMed.
  127. M. Karplus and G. A. Petsko, Nature, 1990, 347, 631–639 CrossRef CAS.
  128. J. A. McCammon and S. C. Harvey, Dynamics of Proteins and Nucleic Acids, Cambridge University Press, 1988 Search PubMed.
  129. W. F. van Gunsteren and H. J. C. Berendsen, Angew. Chem., Int. Ed. Engl., 1990, 29, 992–1023 CrossRef.
  130. D. Frenkel and B. Smit, Understanding molecular simulation: from algorithms to applications, Academic Press, 2002 Search PubMed.
  131. D. C. Rapaport, The art of molecular dynamics simulation, Cambridge University Press, 2004 Search PubMed.
  132. L. Verlet, Phys. Rev., 1967, 159, 98–98 CrossRef CAS.
  133. D. E. Potter, Computational Physics, Books on Demand, 1988 Search PubMed.
  134. R. W. Hockney, Methods Comput Phys, 1970, 9, 136–211 Search PubMed.
  135. M. E. Tuckerman and G. J. Martyna, J. Phys. Chem. B, 2000, 104, 159–178 CrossRef CAS.
  136. S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, Science, 1983, 220, 671–680 CrossRef.
  137. C. Chipot and A. Pohorille, Free energy calculations: theory and applications in chemistry and biology, Springer, 2007 Search PubMed.
  138. A. Mitsutake, Y. Sugita and Y. Okamoto, Biopolymers, 2001, 60, 96–123 CrossRef CAS.
  139. Y. Okamoto, J. Mol. Graphics Modell., 2004, 22, 425–439 CrossRef CAS.
  140. Y. Sugita and Y. Okamoto, Chem. Phys. Lett., 1999, 314, 141–151 CrossRef CAS.
  141. A. Laio and M. Parrinello, Proc. Natl. Acad. Sci. U. S. A., 2002, 99, 12562–12566 CrossRef CAS.
  142. C. Micheletti, A. Laio and M. Parrinello, Phys. Rev. Lett., 2004, 92 Search PubMed.
  143. D. Hamelberg, J. Mongan and J. A. McCammon, J. Chem. Phys., 2004, 120, 11919–11929 CrossRef CAS.
  144. X. Kong and C. L. Brooks, J. Chem. Phys., 1996, 105, 2414–2414 CrossRef.
  145. J. Hermans, R. H. Yun and A. G. Anderson, J. Comput. Chem., 1992, 13, 429–442 CrossRef CAS.
  146. S. Kumar, J. M. Rosenberg, D. Bouzida, R. H. Swendsen and P. A. Kollman, J. Comput. Chem., 1992, 13, 1011–1021 CAS.
  147. A. Okur, L. Wickstrom, M. Layten, R. Geney, K. Song, V. Hornak and C. Simmerling, J. Chem. Theory Comput., 2006, 2, 420–433 CrossRef CAS.
  148. D. Sindhikara, Y. Meng and A. E. Roitberg, J. Chem. Phys., 2008, 128, 024103–024103 CrossRef.
  149. H. Fukunishi, O. Watanabe and S. Takada, J. Chem. Phys., 2002, 116, 9058–9067 CrossRef CAS.
  150. Y. Sugita, A. Kitao and Y. Okamoto, J. Chem. Phys., 2000, 113, 6051 , 6042-6051, 6042.
  151. Y. Sugita and Y. Okamoto, cond-mat/0009119, 2000.
  152. J. L. Knight and C. L. Brooks, J. Comput. Chem., 2009, 30, 1692–1700 CrossRef CAS.
  153. A. Laio and F. L. Gervasio, Rep. Prog. Phys., 2008, 71, 126601–126601 CrossRef.
  154. L. Zheng, M. Chen and W. Yang, Proc. Natl. Acad. Sci. U. S. A., 2008, 105, 20227–20232 CrossRef CAS.
  155. L. Zheng, M. Chen and W. Yang, J. Chem. Phys., 2009, 130, 234105–234105 CrossRef.
  156. K. Y. Sanbonmatsu and A. E. García, Proteins: Struct., Funct., Genet., 2002, 46, 225–234 CrossRef CAS.
  157. L. Su and R. I. Cukier, J. Phys. Chem. B, 2007, 111, 12310–12321 CrossRef CAS.
  158. P. M. González, C. Acharya, A. D. MacKerell, Jr. and J. E. Polli, Pharm. Res., 2009, 26, 1665–1678 CAS.
  159. S. Lee, M. Chen, W. Yang and N. G. J. Richards, J. Am. Chem. Soc., 132, pp. 7252–7253 Search PubMed.
  160. D. Provasi, A. Bortolato and M. Filizola, Biochemistry, 2009, 48, 10020–10029 CrossRef CAS.
  161. M. Feig and C. L. Brooks, Curr. Opin. Struct. Biol., 2004, 14, 217–224 CrossRef CAS.
  162. A. Onufriev, D. Bashford and D. A. Case, Proteins: Struct., Funct., Bioinf., 2004, 55, 383–394 Search PubMed.
  163. P. Koehl, Curr. Opin. Struct. Biol., 2006, 16, 142–151 CrossRef CAS.
  164. B. Honig and A. Nicholls, Science, 1995, 268, 1144–1149 CrossRef CAS.
  165. J. D. Jackson, Classical electrodynamics, Wiley, 1999 Search PubMed.
  166. C. Still, A. Tempczyk, R. Hawley and T. Hendrickson, J. Am. Chem. Soc., 1990, 112, 6127–6129 CrossRef CAS.
  167. M. S. Lee, M. Feig, F. R. Salsbury and C. L. Brooks, J. Comput. Chem., 2003, 24, 1348–1356 CrossRef CAS.
  168. W. Im, M. S. Lee and C. L. Brooks, J. Comput. Chem., 2003, 24, 1691–1702 CrossRef CAS.
  169. B. R. Brooks, C. L. Brooks, A. D. MacKerell, Jr., L. Nilsson, R. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D. M. York and M. Karplus, J. Comput. Chem., 2009, 30, 1545–1614 CrossRef CAS.
  170. A. D. MacKerell, Jr., B. Brooks, C. L. I. Brooks, L. Nilsson, B. Roux, Y. Won and M. Karplus, CHARMM: The Energy Function and Its Parameterization with an Overview of the Program, John Wiley & Sons: Chichester, 1998 Search PubMed.
  171. D. A. Case, T. E. Cheatham, T. Darden, H. Gohlke, R. Luo, K. M. Merz, A. Onufriev, C. Simmerling, B. Wang and R. J. Woods, J. Comput. Chem., 2005, 26, 1668–1688 CrossRef CAS.
  172. W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell and P. A. Kollman, J. Am. Chem. Soc., 1995, 117, 5179–5197 CrossRef CAS.
  173. A. D. MacKerell, Jr., M. Feig and C. L. Brooks, J. Comput. Chem., 2004, 25, 1400–1415 CrossRef CAS.
  174. S. Plimpton, J. Comput. Phys., 1995, 117, 1–19 CrossRef CAS.
  175. LAMMPS, http://lammps.sandia.gov.
  176. B. Hess, C. Kutzner, D. van der Spoel and E. Lindahl, J. Chem. Theory Comput., 2008, 4, 435–447 CrossRef CAS.
  177. W. L. Jorgensen, D. S. Maxwell and J. Tirado-Rives, J. Am. Chem. Soc., 1996, 118, 11225–11236 CrossRef CAS.
  178. G. A. Kaminski, R. A. Friesner, J. Tirado-Rives and W. L. Jorgensen, J. Phys. Chem. B, 2001, 105, 6474–6487 CrossRef CAS.
  179. K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Darian, O. Guvench, P. Lopes, I. Vorobyov and A. D. MacKerell, Jr., J. Comput. Chem., 2009, 31, 671–690.
  180. A. D. MacKerell, Jr., J. Comput. Chem., 2004, 25, 1584–1604 CrossRef.
  181. J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman and D. A. Case, J. Comput. Chem., 2004, 25, 1157–1174 CrossRef.
  182. F. H. Allen, Acta Crystallogr., Sect. B: Struct. Sci., 2002, 58, 380–388 CrossRef.
  183. J. Wang, W. Wang, P. A. Kollman and D. A. Case, J. Mol. Graphics Modell., 2006, 25, 247–260 CrossRef CAS.
  184. T. A. Halgren, J. Comput. Chem., 1996, 17, 616–641 CAS.
  185. T. A. Halgren, J. Comput. Chem., 1996, 17, 490–519 CrossRef CAS.
  186. S. Lifson, A. T. Hagler and P. Dauber, J. Am. Chem. Soc., 1979, 101, 5111–5121 CrossRef CAS.
  187. N. L. Allinger, J. Am. Chem. Soc., 1977, 99, 8127–8134 CrossRef CAS.
  188. N. L. Allinger, Y. H. Yuh and J. H. Lii, J. Am. Chem. Soc., 1989, 111, 8551–8566 CrossRef CAS.
  189. N. Nevins, J.-H. Lii and N. L. Allinger, J. Comput. Chem., 1996, 17, 695–729.
  190. M. Clark, R. D. Cramer and N. Van Opdenbosch, J. Comput. Chem., 1989, 10, 982–1012 CrossRef CAS.
  191. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople, Gaussian, Inc., Wallingford CT, 2004.
  192. R. Rais, C. Acharya, G. Tririya, A. D. MacKerell, Jr. and J. E. Polli, J. Med. Chem., 2010, 53, 4749–4760 CrossRef CAS.
  193. R. Rais, C. Acharya, A. D. MacKerell, Jr. and J. E. Polli, Mol. Pharmaceutics, 2010, 7, 2240–2254 CAS.

Footnote

Electronic supplementary information (ESI) available. See DOI: 10.1039/c1md00044f

This journal is © The Royal Society of Chemistry 2011