Konstantinos
Konstantinidis
a,
Ioannis
Karakasiliotis
a,
Kostas
Anagnostopoulos
b and
Georgios C.
Boulougouris
*c
aLaboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
bLaboratory of Biochemistry, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
cLaboratory of Computational Physical Chemistry, Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece. E-mail: gbouloug@mbg.duth.gr
First published on 12th August 2021
In this paper, a novel approach is proposed based on the accurate computation of the inaccessible volume and the corresponding surface area which is defined by the locus of points where a ligand molecule can be placed so that it “touches” a protein molecule at a preset minimum interatomic distance without resulting in overlaps between the atoms of the protein and the atoms of the ligand. The proposed approach can be considered an extension of the widely used concept of the solvent accessible surface area (SASA). The SASA is defined as the surface where a solvent molecule can be in contact with the initial one without any overlaps. This excluded volume interaction is evaluated by treating atoms as hard core spheres, with the limitation of the solvent molecule being represented as a single sphere. In the proposed concepts of the molecular accessible surface (MASA) and the molecular inaccessible volume (MIV) we have practically removed this limitation and all atoms, both in the initial and the “inserted” molecules, are represented as hard spheres. In this paper we focus our examples on biological systems, especially on studying protein–ligand systems, since we expect that this will be one of the most promising fields of applications where the MASA and MIV extensions of the SASA will be of practical and immediate use. Therefore, the MASA and MIV are evaluated based on the surface generated by the ligand while it is being rolled over on all the atoms of the protein without penetrating them. Identification of the inaccessible volume of each candidate protein–ligand pair is also provided in the context of this study, along with the boundary surface where the ligand can be placed so as to be in “contact” with the protein. The proposed concepts of the MASA and MIV are expected to significantly enhance the ability to investigate specific protein–drug interactions by explicitly taking into account the polyatomic nature of a ligand. Several trials have been conducted using the analytical method of Dodd and Theodorou leading to accurate volume and surface area measurements of an arbitrary set of fused spheres in systems of various scales.
Design, System, ApplicationThe concept of the solvent accessible surface (SAS) has been widely used in order to depict the surface that bounds the excluded volume around a biomolecule. Despite the fact that the SAS treats all potential ligand molecules as a single sphere, it is a powerful tool that is widely used especially in the field of drug design, where identification of possible binding sites of a potential ligand to a protein is essential. The proposed extension of the concept to a molecular accessible surface allows the estimation of the surface that is accessible to any polyatomic ligand molecule by estimating the locus of points where the placement of the ligand will result in an overlap with the biomolecule. As in the case of the SAS, we expect that the proposed estimation of the molecular accessible surface will prove to be a powerful tool not only in the fields of drug discovery and drug design, but also in the field of material design. Potential applications of the proposed method are expected to include development of new sampling algorithms that further facilitate: a) identifying potential docking sites, or b) performing accurate estimations of binding affinities via novel free energy schemes. |
In this work, our aim is to extend the notion of the solvent accessible surface area (SASA) by explicitly taking into account the confirmation of potential ligands and thus provide an alternative tool for methodologies, that up to now had to rely on a spherical approximation of the contact molecule in order to estimate the accessible surface around a protein. The concept of the accessible surface area was firstly described by Lee & Richards1 in 1971 as the solvent accessible surface area (SASA). The SASA traces the geometrical locus derived from the centre of a hypothetical probe-sphere rolling on the van der Waals surface of the molecule without penetrating its atoms. It is also equivalent to the van der Waals surface, with the difference that the atomic radii ri have been substituted with the sum of ri + rp (rp is equal to the atomic radius of the hypothetical probe-sphere, typically 1.4 Å). Various approaches have been developed for calculating accessible surface areas, with the “rolling ball” algorithm by Shrake–Rupley2 being one of the earliest and most popular methods among others. Additional improvements to these methods delimited the solvent-excluded surface (SES) or widely known molecular/Connolly surface3,4 which consists of two segments. The first one is the contact surface (part of the van der Waals surface of the atoms), tangent to the hypothetical “rolled-over” probe sphere. The second one is the reentrant surface, which comprises the inward-facing surface of the probe sphere when it is simultaneously tangent to two or more atoms. Analytical calculation of Connolly surfaces is founded upon numerical algorithms retrieving data from the atomic coordinates, van der Waals radii and probe radius, thus generating finite sets of points constructing a network of convex, saddle-shaped and concave faces defined in terms of vertices, circular arcs, spheres and tori so as to compute the solvent-excluded surface.
Apart from the classical Lee–Richards1 and Shrake–Rupley2 approaches, the solvent accessible surface area (SASA) can also be calculated analytically5 using analytical equations plus their first and second derivatives6–8 or by various other approximations.9–12 Additionally, computational tools are able to predict the SASA in the unfolded state of the examined molecules incorporating methods such as the artificial neural network (ANN),13–15 Markov chain model,16 PredAcc17 and PSAIA (protein structure and interaction analyzer).18 In Table 1, a list summarizing some of the available computational tools and online servers that provide SASA calculations based on Lee and Richards' fundamental definition1 is presented.
Program | Specification | URL |
---|---|---|
PoreBlazer19,20 | Fortran implementation for structural characterization of porous materials including calculation of SASA | https://github.com/SarkisovGroup/PoreBlazer |
Molecularvolume21 | Voxel-based volume calculations for molecular systems via Python interface | https://github.com/ajd98/molecularvolume |
Jmol22 | Free open source viewer of molecular structures providing SASA and molecular volume calculations | http://jmol.sourceforge.net/ |
VMD23 | Molecular visualization program for analyzing biomolecular systems including calculation of SASA | https://www.ks.uiuc.edu/Research/vmd/ |
DSSP24,25 | Database of secondary structure assignments for Protein Data Bank (PDB) entries providing SASA estimation | http://swift.cmbi.ru.nl/gv/dssp/ |
GETAREA5 | Analytical calculation of SASA, atomic solvation energies and their gradients based on Monte Carlo simulation | http://curie.utmb.edu/getarea.html |
TRIFORCE10 | Semi-analytical tessellation approach of SASA and derivatives | http://dillgroup.io/ |
PDBePISA26 | Interactive tool for the exploration of macromolecular interfaces including SASA calculation | https://www.ebi.ac.uk/msd-srv/prot_int/cgi-bin/piserver |
NACCESS27 | Fortran implementation of the Lee & Richards approximation for SASA calculation of atoms and residues constituting proteins and nucleic acids | http://www.bioinf.manchester.ac.uk/naccess/ |
FreeSASA28 | Command line tool incorporating C-library and Python module for macromolecular SASA calculation | https://freesasa.github.io/ |
CCP4-AreaMol29–31 | Supported program of CCP4 for SASA calculation of individual residues and proteins | https://www.ccp4.ac.uk/html/areaimol.html |
ProtSA32,33 | Web application for sequence-specific SASA calculation in the unfolded state | http://webapps.bifi.es/protSA/ |
Estimations of the SASA, using tools such as the ones described in Table 1, serve as the basis for several computational tools and methods that have been designed to assist in a variety of more complex problems, with the most prominent being that of estimating free energy differences and protein structure-folding prediction,34 or even aid in simulations and design of novel molecular structures by predicting physical and chemical properties of polymers prior to synthesis.35 To a large extent, applications of the SASA contribute to characterizing relationships between the structural and biological properties of chemical compounds via quantitative structure–activity relationship (QSAR) models as well as quantifying molecular lipophilicity (logP), a highly significant pharmacokinetic factor in medicinal chemistry, essential for drug discovery.36–38 Another important application of the SASA has been visualization, with molecular visualization tools like Jmol,22 VMD23 and PyMOL39 capable of providing visual representations of “cavities” and “pockets”, as potential candidates for binding sites in proteins. Estimation of the SASA can also be used as part of a “docking” strategy, with docking computation being considered a significant approach for studying protein–protein or protein–ligand interactions, guided by several theories behind binding phenomena, such as the “lock-key” model,40 the “induced-fit” theory,41 the “conformational selection” mechanism42 and similar established approaches. Development of structure-based virtual screening and construction of novel therapeutic agents via computer-aided drug design (CADD) have all been achieved by molecular docking software applications.43 Sampling algorithms implemented in docking software programs like DOCK,44 FLOG,45 FlexX,46 Hammerhead,47 SLIDE,48 DIVALI49 or DARWIN50 are intended to predict the structure via conformational ensemble. Scoring algorithms can also predict the binding affinity of the tested biomolecules during their interactions by scoring functions under certain docking methodologies such as GOLD,51 AutoDock,52 LUDI,53 PLP,54–56 DrugScore57 or CScore58 programs. These algorithms rely on a variety of theoretical, chemical and geometrical approaches to visualize molecular structures and processes. Interactions are handled based on the properties of the amino acid residues found on the surface of molecules. Examination of amino acid charge, polarity, shape, potential for intercalation with other molecules, high evolutionary conservation of surface amino acids and estimated energy of molecular interactions constitute the primary elements for the functional interpretation and calculation of molecular surfaces. Molecular surfaces may have dual use; their graphical representations can provide a prediction of the possible functions and interactions which may take place by visualizing the shape, electron distribution or evolutionary conservation of molecular surface sequences. Moreover, quantification of surfaces is mainly used as a descriptor in an attempt to quantify the binding Gibbs free energy.
As it turns out, the problem of estimating the locus of points where a molecule can be inserted without overlapping is very similar to the estimation of the SASA with the main difference being that in the SASA one has to consider where to insert a single sphere by estimating the surface and the volume of a set of “inflated” fused spheres (one for each atom in the system), whereas when a polyatomic molecule is considered, one has to estimate the surface and not the volume of a set of auxiliary fused spheres like we describe in the next paragraph. Once the set of auxiliary spheres has been created the problem of estimating the surfaces and volumes of fused spheres can be handled by any program that has been developed for the SASA. On the other hand, what is probably one of the most accurate ways of estimating the surface and the volumes of any set of fused spheres is the method of Dodd and Theodorou8 that we have chosen to implement in our calculations.
In this work we propose the extension of the notion of the SASA around a protein molecule to the proposed notions of the molecular accessible surface area (MASA) and the corresponding molecular inaccessible volume (MIV) by explicitly taking into account the polyatomic nature of a ligand molecule that is to be placed in contact with the protein without overlapping. To achieve this, one has to create an “auxiliary” sphere for each intermolecular pair of atoms and define the range of the overlap by setting the minimum intermolecular distance that this pair of atoms can reach without overlapping. This minimum intermolecular distance is then set at the radius of the auxiliary sphere creating a set of fused spheres. The difference with the SASA is that the number of fused spheres is no longer equal to the number of atoms in the protein, but is equal to the product of the number of atoms in the protein and the number of atoms in the ligand, and that the radii of the spheres now depend on a pair of atoms, with one belonging to the protein molecule and one belonging to the ligand molecule. As is described in the following paragraph, the process of creating the necessary set of auxiliary spheres whose surface and volume correspond to the proposed notions of the molecular accessible surface area (MASA) and the corresponding molecular inaccessible volume (MIV) is relatively simple and can be summarized in the following steps as have been used in all calculations reported in this work:
Step 1. Define the minimum intermolecular distance dij between a possible pair of atoms:
For a given pair of protein–ligand configurations (i.e. the set of positions r and types of all atoms) both the MASA and MIV are a function of the minimum interatomic distance dij between the possible pairs, consisting of protein atom i and ligand atom j. In our paper we have chosen to express this minimum interatomic distance dij as a function of the type of atom i, j. By assigning a value for the hard core radius Rα of each atom α based on the types of both protein and ligand molecules, we express dij as the sum of the hard core radii scaled by a common factor fR (i.e. dij = fR × (Ri + Rj)). In all calculations reported in this paper, the hard core radius Rα for each type has been based on the van der Waals radius used in Jmol in order to have a common reference. Future application may require to either extend the set of types or even to define the minimum distance between pairs of atoms dij differently. This can be achieved by changing the values of the radius for the auxiliary spheres and should be reported along with the estimation of the MASA and MIV.
Step 2. Generate the set of auxiliary spheres whose surface and volume are the MASA and MIV:
Given a relative orientation of the two molecules, the positions ri and the type of atom in the system (protein atoms) and the position rj and the type of inserted atom (atoms in the ligand) create an auxiliary sphere at ri − (rj − rref) with radius dij for each pair of atoms i, j, where rref can be chosen as an arbitrary reference point. The operation ri − (rj − rref) practically translates each protein atom i, by a vector −(rj − rref) for each ligand atom j. Although the choice of the reference point can be arbitrarily selected in the local coordinate framework of the ligand, it is preferable to be either the center of the mass of the ligand molecule or the position of one of the atoms in the molecule. Another way of selecting the reference point is by setting the origins for a local coordinate framework of the ligand configuration. In the protein coordinate framework, the proposed molecular accessible surface is defined as the locus of points where, placing this referee point and positioning all ligand atoms relative to the point, the ligand will touch without overlapping the protein. In this work we have chosen as a reference position rref, the position of the first atom in the ligand (highlighted in Fig. 1) and as a consequence, the surface of auxiliary spheres corresponds to the locus of points for which the placement of the first atom of the ligand will guarantee that all possible distances between any pair of protein (i) and ligand (j) atoms will be bigger or equal to dij, with at least one distance being exactly equal. Finally, different relative orientations can be examined via rigid rotation of either the protein or the ligand model before the creation of the auxiliary spheres.
![]() | ||
Fig. 1 (a) Graphical representation of the molecular accessible surface volume methodology in Jmol. An illustration of the test system consisting of two molecules, an ethane60 representing the protein molecule of the system and a methane61 displaying the inserted ligand molecule distinguished by its cyan haloed atoms. (b) The generated pink auxiliary spheres surrounded by their created 3D isosurface. (c) An illustration of the excluded volume around the ethane molecule of the test system, where the generated gray surface points coincide with the center of the first atom (yellow highlighted sphere) of the inserted methane ligand in opaque color and cyan halos. Two additional ligand molecules colored semi-transparently while maintaining all the same orientations are positioned according to their first atom also highlighted in yellow at the minimum interatomic distance. |
Step 3. Evaluate the surface and the volume of the set of fused auxiliary spheres:
The molecular inaccessible volume i.e. the volume of the locus of points where the ligand cannot be placed due to the presence of the protein can be evaluated from the set of auxiliary spheres defined in steps 1 and 2 using any algorithm capable of estimating the volume of fused spheres. Similarly, the molecular accessible surface can then be estimated using any algorithm capable of estimating the surface of fused spheres. Although we strongly recommend the use of the Dodd and Theodorou approach in estimating both the surfaces and volumes, we also provide as part of the ESI† scripts that can utilize the available “approximate” tools in Jmol and VMD for the cases that accuracy is not essential. Following the proposed steps, the problem of estimating the molecular inaccessible volume (MIV) and accessible surface area (MASA) is now being expressed as a problem of evaluating the volume and surface of a set of fused spheres similar to that of the SASA with the main difference being in the number of fused spheres that one has to consider, which is now equal to the number of atoms in the system, times the number of atoms of the inserted molecule.
In Fig. 1, a graphical representation of the basic concept is depicted, referring to a simple molecular system of ethane60 and methane,61 with the former acting as the protein molecule of the system and the latter as the inserted ligand molecule. The developed method is founded upon creating multiple images of the inserted atoms by maintaining the internal degrees of freedom and relative orientation. The algorithm generates 40 (40 = 8 “protein” atoms × 5 “ligand” atoms) auxiliary spheres (32 of them depicted in pink plus the remaining 8 which are placed at the same position as the protein molecule in the system). The gray 3D surface created by the 40 auxiliary spheres delineates the geometrical locus where the center of the first atom, as ordered in the inserted molecule (here the hydrogen atom as a yellow highlighted sphere), can be placed so that the two molecules of the system can be in “touch”. Additionally, placing the center of the first atom of the inserted methane molecule (translucent methane molecules with cyan halos) in different positions on the generated gray 3D surface surrounding the auxiliary atoms brings the hypothetical ligand and protein molecules in contact without overlap. According to step 1, the annotated distances are equal to the sum of the corresponding atomic radii multiplied by the algorithm's scaling factor fR (here adjusted at 1.0). This scaling factor is used to describe the excluded volume interactions of the closest atoms between the inserted and native molecule of the system. In this example, hydrogen–carbon and carbon–carbon atoms are found in the minimum interatomic distance and those distances are equal to the sum of Rhydrogen (= 1.2 Å) + Rcarbon (= 1.95 Å) and Rcarbon (= 1.95 Å) + Rcarbon (= 1.95 Å) atomic radii times the scaling factor fR (= 1.0), respectively. Notably, the connectivity between the inserted atoms does not add significant complication at this computational stage. This allows the insertion of two or more molecules simultaneously, as long as the relative position between atoms is maintained during the geometrical calculation and the relative intermolecular degrees of freedom are sampled in an outer loop. Furthermore regarding the SASA, the proposed method is expected to be used in ensembles, where the system configurations are created based on desirable statistical ensembles. Similarly, the internal degrees of freedom of the ligand could be sampled by simulating the inserted molecule under ideal gas conditions. The geometrical calculation would then be performed over a double nested loop over the configurations of the ligand and the system ensembles.
In most calculations reported in this work, the relative protein–ligand orientation has been kept to its original value based on the pdb configuration file downloaded from the web. In the cases that we examine the effect of the relative orientation in our calculations, we have performed random rigid body relations using quaternions. More precisely, the generation of random molecular orientations has been based on the Marsaglia G. method,62 implemented as follows:
- First, two numbers x1 and y1 are selected from a random uniform distribution between (−1, 1), until s1 = x12 + y12 < 1 is satisfied.
- Similarly, two more numbers x2 and y2 are selected respectively from a random uniform distribution between (−1, 1), until s2 = x22 + y22 < 1 is satisfied.
- The generated values of s1 and s2 are used for the production of a random unit quaternion q as
With the previous steps, a set of unitary quaternions is generated. By applying rigid rotations to the ligand molecules using these unitary quaternions, a set of protein–ligand relative orientations is created.
From a technical perspective, the greatest challenge and major concern in developing a computational tool capable of estimating the molecular inaccessible volume and molecular accessible area in protein–ligand systems has been the memory usage due to the large size of the resulting system. In order to be able to use Dodd and Theodorou's analytical approach8 as a black-box library, the distribution of the computational load using message passage interface (MPI)63 over a number of processors was compulsory. This was to ascertain the efficient handling of the memory load. As a result, the user can perform analytical calculations of the molecular inaccessible volume and molecular accessible area in realistic protein–ligand systems using reasonable computational resources.
Most of the calculations reported in this work have been based on PDB files that contain both ligand and protein molecules and are available via Protein Data Bank. Given such a PDB file, we proceed by first parsing the PDB file in order to acquire the protein and ligand molecule configurations from PDB along with the type of each atom. Based on whether we want to examine the MASA and MIV at a relative orientation different from the original PDB file (as we do in Fig. 2), we choose whether we are going to perform a random rigid rotation of the ligand molecule. Having assigned an atomic radius to each type of atom of both the ligand and protein molecules, and the atomic position at the desired relative orientation, we perform calculations for the MASA and MIV. In all calculations used in this work for the validation of our code, we used as default for the hard core radius in each atom Ri the values of van der Waals radii defined for each atom type in Jmol (version 14.6.1), which can be retrieved by executing the command “show vdw” and defined by typing “set defaultVDW Jmol” on a Jmol console. In order to investigate the effect of the hard core radius length in sum calculations, we performed uniform scaling in all hard core radii using a common scaling factor fR (see Fig. 6). Depending on the practical application, the potential user of our computational tool may choose to alter the assignment of each atomic radius, taking into consideration the difference between ions and uncharged atoms for instance. Nevertheless, in this study, given that the main concern is to provide validation of our approach, the simplest reproducible cases were selected while the option of changing the values of atomic radii was deferred for future version purposes. Due to this reason, no further processes were performed on protein molecules extracted from the downloaded PDB files, like restoring missing atoms or imposing the suitable protonated state under a given pH.
![]() | ||
Fig. 2 Analytical calculation of the ligand–protein molecular accessible surface area (a) and molecular inaccessible volume (b) of the HIV-1 protease and AB-2 inhibitor complex retrieved from the 1zp8 PDB entry,70 at different orientations relative to the ligand–protein orientation of the original PDB file configuration (marked as original orientation). In both charts, the original input molecular configuration is shown at x = 0, followed by 50 random ligand–protein orientations sampled by Marsaglia's method.62 The estimations are plotted as a function of the quaternion distance (in radians) between each orientation and the relative ligand–protein orientation of the original PDB molecular configuration. The labeled data points A, B, C and D help the reader compare the plotted information against the corresponding modelled structures shown in Fig. 4. |
Despite the development of this computational tool taking advantage of the analytical calculation of Dodd and Theodorou8 to a large extent, the proposed calculations of the molecular accessible surface and molecular inaccessible volume can also be carried out by making use of any other computational tool capable of calculating the SASA. To achieve this, one has to generate the set of auxiliary spheres in the same way as described in the previous paragraph (Fig. 1) and then perform the calculations with the tool of choice. In the context of this research, visual representation was accomplished by using Jmol and its ability to draw 3D isosurfaces. It should be noted that visual rendering of the aforementioned isosurfaces is an arduous computational task, with memory requirements increasing significantly as the size of the molecular systems grows. Nevertheless, most of the available visualization tools output significantly less accurate results when compared to the analytical estimation of Dodd and Theodorou.8 However, due to graphical representation necessities in many studies, the best strategy is to combine both approaches. Subsequently throughout this work, we report our estimations using the analytical method of Dodd and Theodorou8 whereas Jmol is used for visualization purposes. Finally, in order to provide better insight into molecular accessible surfaces, ligand placement on a point of the protein surface is also presented, highlighting the contact between the test molecules given the selected relative ligand–protein orientation.
In Fig. 2, the estimations of the molecular accessible surface area (Fig. 2a) and molecular inaccessible volume (Fig. 2b) are presented for various ligand–protein orientations of the 1zp8 test system (HIV-1 protease with its AB-2 inhibitor73). Relative orientations were randomly produced via quaternion formulation of Marsaglia62 on the ligand–protein pair found in the 1zp8 PDB file downloaded from the Protein Data Bank. The estimated values of the molecular accessible surface area (Fig. 2a) and molecular inaccessible volume (Fig. 2b) are plotted versus the quaternion distance. The baseline against which the quaternion distance was calculated is the ligand–protein orientation of the original input 1zp8 PDB file configuration. Since plotting against the quaternion distance constitutes projection onto one-dimensional space, the reader should bear in mind that only distance relevant to the original orientation retains the properties of distance, meaning that any of the expressed orientations depicted as triangles or star points in close proximity in Fig. 2 may actually be far apart. Nonetheless, the above representation style was selected since the deviation of 50 sampled ligand orientations relative to the original one found in the 1zp8 PDB file is better illustrated. Similarly for the 1zp8 test system, an additional 1000 sampled ligand orientations relative to the original one found in the 1zp8 PDB file were generated and the analytical area and volume calculated values of each ligand–protein sampled orientation were plotted as in Fig. 3.
![]() | ||
Fig. 3 Histograms showing the distribution of the estimated values of the molecular accessible surface area (a) and molecular inaccessible volume (b) for a total of 1000 sampled orientations of the 1zp8 test system.70 |
The effect of sampling random relative orientations can be seen in more detail in Fig. 3, where the values of the inaccessible volume and accessible surface area appear to be normally distributed around an average value. At this point the reader should note that the actual shape of such distributions is probably driven towards a “normal” like distribution via the center limit theorem, as is the case for many physical quantities. On the other hand, both the volume and area are bounded continuities and therefore the distributions can never become truly normal. It is therefore advised that the type of the distribution is not taken for granted but considered verified for the particular case of interest. In Fig. 4a, samples of 3D representations of the protein–ligand molecular accessible surface are shown, mainly for configurations retrieved from the minima or maxima of Fig. 2a and b using Jmol. According to the displayed molecular states, the inhibitor can “fit” in the original binding site of the protein with significant changes in the relative orientation. Notably, several of the sampled ligand orientations could potentially bind in the opposite direction, reverse to the ligand configuration of the original 1zp8 PDB file (Fig. 4b). We should point out that calculations in Fig. 2 pertain solely to excluded volume interactions. Therefore such observations may serve exclusively for initial screening. Moreover, in the calculations of Fig. 2, we do not distinguish between placing the ligand into pocket cavities or onto the outer surface of the protein. Nevertheless, once the total molecular accessible surface is evaluated, it is also possible to partition the area based on concavity, charge, polarity, or hydrophobicity of the protein contact atom utilizing the tools which have been developed for the SASA and are available in visualization software like Jmol. It should be noted that in the molecular accessible surface, each point corresponds to a specific atom–atom interaction between the ligand and the protein molecule, with more than 3 body contacts, mapped onto lines and points which form from the intersections of spheres at the surface. It should also be noted that although identification of the hydrophobic part of the accessible surface is straightforward in the proposed methods, by simply identifying auxiliary spheres based on the characterization of the corresponding protein–ligand atom pair, studying “hydrophobicity” as a phenomenon requires much more than simply measuring the amount of the hydrophobic molecular accessible surface since there are many aspects behind the term “hydrophobicity”, with some of them being non-local76 and a strong function of the unique properties of water as a solvent. On the other hand, measuring the amount of the hydrophobic molecular accessible surface appears as a promising potential candidate for developing descriptors in QSAR studies similar to the ones performed using the SASA.77
![]() | ||
Fig. 4 (a) 3D illustrations of the 1zp8 PDB70 protein–ligand molecular accessible surfaces using Jmol, extracted from a selected set of sampled relative orientations presented in Fig. 2 (labeled as A, B, C and D). The protein molecule is displayed according to its secondary structure (yellow b-sheets and pink a-helix) while the ligand molecule holds the typical ball & stick representation style. Ligand and protein molecules are presented at the corresponding relative orientation by placing the sampled ligand configuration onto the molecular accessible surface close to the original binding cavity. Unlike the SASA, our molecular accessible surface is a function of both the actual ligand and the ligand–protein relative orientation. (b) A more detailed view of the sampled ligand configuration C versus the original ligand configuration of the 1zp8 PDB file. Interestingly in this sampled orientation, the sampled ligand configuration C (bright green color) can “fit” in the binding site in the opposite direction contrary to the original ligand configuration of the 1zp8 PDB file depicted transparently in red. |
In an effort to verify and validate the accuracy of the proposed approach, the estimate of the molecular accessible surface area and a numerical finite difference estimate of the inaccessible volume are shown in Fig. 5. The numerical derivative has been estimated by performing volume calculations over slight increments of the radius parameter δR incorporated in the algorithm. Confirming the consistency between our estimations of the molecular accessible surface and inaccessible molecular volume, the analytical calculation of the molecular accessible surface can be estimated using finite differences provided that the alteration in the radius parameter is neither too small nor too big as it is the case with most numerical estimations based on finite differences.
![]() | ||
Fig. 5 Comparison of the analytical estimation of the molecular accessible area (black continuous line) with estimates based on numerical forward finite differences of the inaccessible molecular volume (red crosses). The estimations based on finite differences are presented as a function of the discrete increase in the radius of all auxiliary spheres by the same parameter δR (i.e. area ∼ (volumedij+δR − volumedij)/(δR)). The estimations have been performed upon the original input configuration of the ligand molecule inside the 1zp8 PDB70 file acting here as the protein and the water molecule62 as the “ligand”. |
Having established the consistency between the molecular inaccessible volume and accessible molecular surface, in Fig. 6 we demonstrate the validity of the molecular inaccessible volume calculation by comparing the proposed analytical calculation with the estimation based on random “Widom”-like test insertions57 under the original input relative orientation regarding a simple test system, where methane61 and caffeine68 act as ligand and protein molecules, respectively. To perform the stochastic estimation, we initially enclosed the molecule of caffeine in a box and then measured the ratio of attempts which failed to place the methane molecule without overlap in the box, given the original relative orientation. An estimate of the inaccessible volume was produced after multiplying the volume of the box by the ensemble average of the ratio of failed “test” insertions. In Fig. 6, the stochastic estimation is reported as a function of the number of insertions, alongside the analytical volume estimation at the same original relative ligand–protein orientation, where the results from the stochastic method coincide with our analytical calculation output.
![]() | ||
Fig. 6 Comparison of analytical inaccessible volume calculation (blue circular points) versus the stochastic evaluation based on test “Widom”-like insertions in a simple molecular system, consisting of methane61 and caffeine68 as ligand and protein molecules, respectively (blue triangular points). The stochastic estimation results were acquired after 5 repetitive runs on the aforementioned test system at the original relative orientation with different seed numbers for each given number of insertions. All calculations coincided with the analytical estimation of the inaccessible volume, within the 95% confidence interval (depicted as error bars in the above graph). |
In Fig. 7, we observe the effect of scaling all interatomic contact distances by a common factor fR, regarding 3 molecular test systems of different sizes based on the 1zp8,702bpw,714wtg72 files downloaded from PDB. More specifically, the smallest 1zp8 system consists of 812 atoms in total, 765 of which form the HIV-1 protease while the remaining 47 atoms form its ligand inhibitor AB-2.73 The mid-sized scale 2bpw system contains 1559 atoms, 1514 constituting the HIV-1 protease and 45 its potent ligand inhibitor.74 Lastly, the largest 4wtg system consists of 4357 atoms, 4327 of which belong to the modified version of HCV RdRp and the remaining 30 atoms are found within its ligand, the clinically active metabolite formed by sofosbuvir, Mn2+ and a primer-template RNA.75 Examination of the accessible surface dependency on the algorithm's parameter fR promotes an interesting perspective. There is a certain range where increasing the scaling factor fR leads to reduction of the accessible area, strongly indicating the presence of concave parts on the protein surface which shrink as the radius expands (Fig. 8). However, one may conceive of an approach that uses such observations to identify the presence of cavities but to our knowledge, there is no such method. This is probably due to the usual alternative methods being quite sufficient in identifying cavities or due to the fact that similar calculations would require significant accuracy in the estimation of accessible surfaces. This would not be a practical choice since most of the available methods are of stochastic nature. On the other hand, implementing the analytical calculation of Dodd and Theodorou8 leads to accurate estimations which can be used to estimate partial differences from finite differences. Finally, for users that would like to use our approach in combination with existing (or newer methods) ones for partitioning the surface area based on concavity, we should note that the correlation between accessible areas of a concave cavity formed out of spheres can be affected by the actual definition of the criteria used to separate concave from convex regions.
![]() | ||
Fig. 7 Estimations of the accessible surface area (a) and inaccessible volume (b), both expressed as functions of the algorithm's parameter scaling factor fR. The radii of the auxiliary spheres which determine the range of hard core inter-atomic interactions have been estimated by scaling a common factor, the sum of the van der Waals radii for each atom pair that is used in the formation of the auxiliary sphere. Tests were performed upon 3 molecular systems of varying size (1zp8,70 2bpw,71 and 4wtg72). |
![]() | ||
Fig. 8 Investigation of concave surfaces of the 1zp8 test system70 in relevance to fR changes and visualization by Jmol.22 The translucent orange isosurfaces surrounding (a) and (b) models are created using the “isosurface pocket cavity” Jmol command and are generated according to the set of auxiliary spheres at the specified fR value respectively. In (a), the fR is adjusted to 0.15 and 37 isosurfaces are created with a total accessible surface area of 5008.8 Å2. Increasing fR to 0.20 (b) seems to decrease the amount of isosurfaces created to 34 as well as the overall accessible surface area which drops to 3807.8 Å2. Increments of the scaling factor fR lead to reduction of the overall accessible area as the concave parts on the protein surface shrink and eventually vanish as the radius expands. |
Finally, as mentioned previously, a considerable amount of effort has been put into decomposing analytical calculations into independent sub-calculations which can be performed in parallel, since dealing with all of the auxiliary spheres using a single processor may not be feasible for most of the protein–ligand complexes of interest. Aiming to distribute the memory load into multiple processors even at the expense of performing more arithmetic calculations, in Fig. 9 we present the algorithm's execution time as a function of the number of processors used in our parallel decomposition (Fig. 9a) as well as a function of fR alterations utilizing all processors of our computational nodes through MPI63 (Fig. 9b). The system examined in Fig. 9a consists of the protein–ligand complex retrieved from the 1zp8 PDB entry where all radii have been scaled at half of their van der Waals value by setting the fR parameter to 0.5, while in Fig. 9b, the largest in size 4wtg system (4357 atoms) is tested with increasing fR values exploiting plenty of computational resources. As the size of the molecular system increases relevant to the available resources per processor, the necessity and parallel efficiency of actual calculations may differ, but our approach is expected to be applicable provided that sufficient computational resources are allocated. Subsequently, parallelization through MPI makes our approach suitable both for supercomputers and homemade clusters alike.
![]() | ||
Fig. 9 Inaccessible volume and accessible surface area calculation completion time as a function of the number of processors tested on a small size-scale molecular system (1zp8)70 (a) as well as a function of fR alterations upon the largest in size 4wtg72 molecular system utilizing 38 processors of our computational nodes through MPI (b). |
For all of the above tests, we implemented our developed algorithm and proposed a methodology which mainly relies on the generation of a set of auxiliary spheres according to the examined molecular system for the analytical calculation of the molecular accessible surface area (MASA) and molecular inaccessible volume (MIV). The user can also perform MASA and MIV calculations by simply inputting the generated set of auxiliary spheres to the computational tool of preference. However, this comes with a considerable trade off in the accuracy of the estimated area and volume values if not calculated by our developed algorithm. In order to assess the validity and robustness of our proposed method, analytical volume and area calculations were carried out on the simplest possible molecular test systems. For this reason, two test systems were formed consisting of either monatomic hydrogen65 (test 1) or diatomic hydrogen66 (test 2) as hypothetical protein molecules, while the ligand molecule was always a monatomic hydrogen65 in both cases. The estimated volume and area after the execution of the developed algorithm were compared with the respective calculation results of other available computational tools applied on the same test systems. Due to this fact, the van der Waals (vdW) radii of each hydrogen atom in ligand and protein molecules were adjusted equally to 1.2 Å among the examined programs for a fair comparison. Several tests were performed utilizing the incorporated volume and area features where applicable of Jmol22 and VMD23 molecular visualization software. Apart from that, analytical volume and area calculations of our algorithm were verified by the online partial sphere volume and area calculator78 as well as checked against Poreblazer19,20 and Molecularvolume21 software. The yielded results are summarized in Table 2.
Area (Å2) | Test 1 | Test 2 | Volume (Å3) | Test 1 | Test 2 |
---|---|---|---|---|---|
MASIV | 72.382300 | 83.541238 | MASIV | 57.905843 | 71.190481 |
Online calculator78 | 72.382295 | 83.541232 | Online calculator78 | 57.905836 | 71.190473 |
Jmol22 | 72.187240 | 83.328275 | Jmol22 | 57.589081 | 70.826505 |
VMD23 | 72.382301 | 82.805351 | Molecularvolume21 | 56.000000 | 68.500000 |
Poreblazer19,20 | 72.320000 | 83.170000 |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1me00053e |
This journal is © The Royal Society of Chemistry 2021 |