Yibo
Li
a,
Jianfeng
Pei
*b and
Luhua
Lai
*abc
aCenter for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China. E-mail: ybli@pku.edu.cn
bCenter for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China. E-mail: jfpei@pku.edu.cn
cBNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China. E-mail: lhlai@pku.edu.cn
First published on 9th September 2021
Deep generative models are attracting much attention in the field of de novo molecule design. Compared to traditional methods, deep generative models can be trained in a fully data-driven way with little requirement for expert knowledge. Although many models have been developed to generate 1D and 2D molecular structures, 3D molecule generation is less explored, and the direct design of drug-like molecules inside target binding sites remains challenging. In this work, we introduce DeepLigBuilder, a novel deep learning-based method for de novo drug design that generates 3D molecular structures in the binding sites of target proteins. We first developed Ligand Neural Network (L-Net), a novel graph generative model for the end-to-end design of chemically and conformationally valid 3D molecules with high drug-likeness. Then, we combined L-Net with Monte Carlo tree search to perform structure-based de novo drug design tasks. In the case study of inhibitor design for the main protease of SARS-CoV-2, DeepLigBuilder suggested a list of drug-like compounds with novel chemical structures, high predicted affinity, and similar binding features to those of known inhibitors. The current version of L-Net was trained on drug-like compounds from ChEMBL, which could be easily extended to other molecular datasets with desired properties based on users' demands and applied in functional molecule generation. Merging deep generative models with atomic-level interaction evaluation, DeepLigBuilder provides a state-of-the-art model for structure-based de novo drug design and lead optimization.
More recently, various deep generative models have been introduced to the field of de novo design. Compared to traditional methods, which usually require expert crafted rules to guarantee that the output molecules are practical, deep generative models are mostly “rule-free” and can be trained in a fully data-driven manner with minimal requirement for expert knowledge. A wide range of deep learning architectures, including SMILES-based17 and graph-based18 language models, VAE,19 and GAN,20 have been used. Various methods have been developed to address different needs during the drug discovery process, such as property-based,19,21 target-based,17,21,22 pharmacophore-based,23 and scaffold-based molecular design approaches.18,24,25
However, unlike the majority of traditional de novo design methods, which construct 3D ligands directly inside the 3D structure of binding pockets, most deep learning-based methods only generate 1D (SMILES) or 2D (graph) representations of molecules without using the structural and interaction information of targets. As a result, most of those methods rely on ligand-based objectives,21,22,26 which may be subject to biases related to the ligands in the training set. This also makes the models difficult to be applied to cases with limited bioactivity data. Structure-based information provides direct guidance to optimize the interactions between the ligand and the target and reduces the dependency on existing bioactivity information. It is therefore highly desirable to include 3D conditions in deep molecule generative models.
Recently, Nigam et al. developed algorithms to explore the topological space of molecules with a docking-based scoring function for structure-based de novo drug design. They developed STONED27 which performs molecule optimization by manipulating SELFIES, a sequential representation of molecular structures similar to SMILES but is guaranteed to be 100% valid. An advantage of STONED is that it does not require deep learning or expert knowledge to explore the chemical space. This method can be used in structure-based drug design by combining it with the docking score. They further developed JANUS28 which combines the genetic algorithm with a docking program to optimize the topological structures of molecules. However, both methods still operate on topological structures of molecules and require an external docking program to optimize the 3D poses and obtain docking scores.
In this work, we developed DeepLigBuilder, a novel program for de novo molecular design that enjoys the benefits of both traditional and deep learning-based methods. Specifically, DeepLigBuilder uses a deep generative model to construct and optimize the 3D structures of ligands directly inside 3D binding pockets. We divided this goal into the following two tasks. First, we trained a deep generative model that can produce drug-like molecules with valid 3D structures. Second, we introduced target-based information into the model so that molecules with good predicted binding affinity could be obtained.
For the first task, although there are existing methods that use deep learning to generate 3D molecular structures,29–31 most of them focus on structurally simple molecules like those in the QM9 dataset. Those molecules are usually not sufficient for applications in drug discovery. We introduced Ligand Neural Network (L-Net), a graph generative model that is specifically designed to generate 3D drug-like molecules. L-Net contains two major features that are important for its performance. First, it is built from a new graph convolution architecture, which combines features like graph pooling and rotational covariance, increasing the size of the receptive field for the network while making it more efficient for training and sampling. Second, L-Net is trained using a novel scheme that makes it resistant to 3D errors during generation. As a result, the model can generate chemically correct, conformationally valid, and highly drug-like molecules. Compared to G-SchNet, a previous state-of-the-art model in 3D generation tasks, L-Net achieves significantly better chemical validity while maintaining an improved quality for the generated conformers.
To accomplish the second task, L-Net was combined with Monte Carlo tree search (MCTS), a widely applied technique in reinforcement learning, to optimize molecules directly inside the binding pocket. To our knowledge, this work presents the first attempt to combine deep generative models with MCTS for 3D ligand generation. There are other deep learning-based methods for the conditioned generation of 3D molecules based on binding pocket structures, most notably liGAN.32,33 However, liGAN generates molecules indirectly using atomic density maps, and subsequent rule-based inference of atom location and bond types is necessary. It also requires protein–ligand complexes as training data. On the other hand, DeepLigBuilder only needs a structure-based scoring function and can generate full 3D structures of ligands, including hydrogens, without additional processing, making it much easier to use and extend.
DeepLigBuilder is unique in that it directly operates on 3D molecular structures and optimizes both the topological and 3D structures of molecules at the same time directly inside the binding pocket using MCTS. Due to its ability to directly operate 3D structures, DeepLigBuilder is more flexible and can easily achieve more advanced capabilities, such as anchoring the generated molecules in a spatial location or performing generation based on a privileged 3D fragment.
As a case study, we used DeepLigBuilder to design molecules targeting the main protease of SARS-CoV-2, a key target for anti-COVID-19 drug design.34,35 DeepLigBuilder recaptures the structural and chemical features of known inhibitors and generates potential drug-like inhibitors with novel chemical structures.
The architectures of the state encoder and the policy network are shown in Fig. S3 and S7,† respectively. The state encoder is mainly composed of MPNN layers36 organized into DenseNet blocks,37 which use graph pooling and unpooling layers (see Section S1.5†) to reduce the memory cost during training. The policy network uses MADE (masked autoencoder for distribution estimation38) to efficiently model various decision variables, which explicitly considers the valence constraints and local geometries of molecules to improve the performance of L-Net. The policy network is also rotationally covariant (or equivariant) thanks to the introduction of a local coordinate system (see Section S1.3†). The architectures of the state encoder and the policy network are discussed in detail in Sections S1.2 and S1.6.†
The following aspects of L-Net are examined during the evaluation:
(1) The ability to generate chemically valid compounds. This was measured using the percentage of valid outputs.
(2) The diversity of the generated samples. This was measured by the percentage of unique outputs, as well as internal diversity values of 2D and 3D structures, which were calculated using the Morgan fingerprint and USRCAT (Ultrafast Shape Recognition with CREDO Atom Types),44 respectively.
(3) The ability to correctly learn the distributions of important molecular properties. We report the distributions of several important 2D and 3D molecular properties of the generated molecules, including molecular weight (MW), logP, the number of hydrogen bond donors (HBD) and acceptors (HBA), the number of rotatable bonds (ROT), QED,39 the normalized principal moment of inertia ratios (NPR1 and NPR2)45 and solvent accessible surface areas (total and polar SASA46), and compare the results to those of the test set.
(4) Whether L-Net can correctly model the chemical space (2D and 3D) of the training set. This is measured using 2D and 3D maximum mean discrepancy (MMD) calculated using the Morgan fingerprint and USRCAT. We also introduced precision and recall47 for a more interpretable evaluation. Precision measures the quality of the samples, while recall measures the number of samples in the target distribution that can be covered by the generative model.
(5) The ability to generate high-quality conformers. The quality of local structures is checked by examining the distribution of bond lengths, bond angles, and torsion angles. MMD is used to quantitatively measure whether the distribution of torsion angles is correctly modeled. Global conformation quality is measured using RMSD of heavy atoms after optimization using MMFF94s.48
Detailed descriptions of each metric are given in Section S1.11.† We also trained L-Net using QM9 to enable comparison to G-SchNet on the percentage of valid output and the median RMSD after optimization (details are given in Section S1.11†).
Briefly, MCTS finds promising solutions of a reward function by iteratively constructing a search tree (see Fig. 1b). Each node in the tree represents an intermediate state during molecule generation. At each iteration, the model first chooses a promising state from the search tree (the selection step), enumerates possible actions for that state (the expansion step), and performs a rollout to generate the rest of the molecular structure (the simulation step). The reward is collected for that structure, and the information is backpropagated through the tree to update the Q-values of each node. More technically, our implementation of MCTS adopts the framework of MENTS (Maximum Entropy for Tree Search49), with custom modifications detailed in Section S1.12.† The L-Net has two roles during this process. First, it is used as a rollout policy during the simulation step. Second, it is used to compute a “prior” Q-value for each node in the search tree. In other words, MCTS is responsible for searching for molecules with high binding affinity, while L-Net is used to encourage the structure to be valid, drug-like, easy to synthesize, and diverse.
We used the docking score provided by smina50 as the reward function. A major advantage of DeepLigBuilder is that the docking score can be evaluated much faster than previous 2D MCTS models (such as ChemTS51) because 3D structures of molecules are already generated inside the binding pocket. This attribute allows the use of the fast local optimization capability provided by smina to calculate the docking score (using the minimize option, with a time cost of less than 0.1 seconds per evaluation) without the need for global optimization (costing on average several minutes per evaluation), which affords an orders of magnitude increase in speed.
For the case study, DeepLigBuilder was used to design potential inhibitors for the main protease (Mpro, also referred to as 3CL protease) of SARS-CoV-2, the virus causing the COVID-19 pandemic.52 We first investigated the ability of DeepLigBuilder to perform lead optimization based on a fragment derived from existing peptidomimetic covalent inhibitors. We then tested its performance on de novo designs of noncovalent inhibitors, with the goal of generating highly potent inhibitors with novel scaffolds.
Finally, we discuss the ability of L-Net to generate ring structures with valid conformations. We place our focus on aromatic rings. Most aromatic rings in test set molecules adopt a strict planar structure. We first investigate the issue related to the generation of non-cyclic atoms inside aromatic rings. Non-closed aromatic rings will be marked as invalid molecules by RDKit, and since the percentage of invalid output molecules is low (5.7% as shown in Section 3.1.1), we assume that this issue is relatively minor for L-Net. The occurrence of the noncyclic conformation of rings can also be indicated by incorrect bond lengths. The average and standard deviation for the length of aromatic bonds from test set molecules is 1.39 Å and 0.051 Å, while the values for the generated molecules are 1.39 Å and 0.067 Å. The average bond length matches well between the generated and the test set molecules. Although the standard deviation is a little bit larger among the generated samples, the difference of 0.016 Å is small and would not cause significant problems for applications in de novo drug design.
To evaluate the severity of wrong torsion angles, the standard deviation of torsion angles for aromatic rings is also calculated. The standard deviation is 1.72° for the generated molecules and 0.46° for the test set molecules. Again, the standard deviation is slightly larger for molecules generated by the model. However, the increased deviation of 1.27° is tolerable and could be minimized by the following relaxation, indicating that for most of the time the model correctly generates a planar structure for aromatic rings.
To summarize, although the ring structures generated by L-Net may contain some imperfections, the problem is relatively small and should have a limited impact on the application of the model. If we need to correct the structures, the fast local relaxation utility provided by RDKit can be used. This would improve the quality of ring conformation without causing significant changes to the global conformation of the molecule (with the average RMSD equal to 0.613 Å as shown in Section 3.1.2).
Fig. 3 Lead optimization using DeepLigBuilder. (a) The topological structure of MI-23, a reported covalent inhibitor targeting SARS-CoV-2 Mpro.54 The blue part of the molecule is used as a seed structure by DeepLigBuilder for molecule generation. (b) The range of the scores for the 10 best molecules sampled at each step during the Monte Carlo tree search. Blue: MCTS turned on; grey: MCTS turned off. (c) The distribution of the smina score, QED, and SAscore of high-quality samples. (d) The distribution of hydrophobic groups in high-quality samples. (e) The distribution of hydrogen bond acceptors in high-quality samples. (f–h) Examples of molecules generated by DeepLigBuilder. The binding poses of the generated molecules (grey) are aligned with MI-23 (light grey) for comparison. The residue Glu166 is shown in blue. Molecular properties and predicted binding affinities are listed in the grey boxes. |
The structure of Mpro complexed with MI-23 was used for molecular generation (PDB ID: 7D3I). We only kept the P1 fragment (Fig. 3a, blue) as the starting point for generation. Note that MI-23 is not contained in the training set of L-Net, so the model does not know the structure in advance. A total of 64 MCTS runs were performed to ensure the diversity of the generated molecules. After the MCTS runs, we applied the following operations to filter for high-quality compounds. The general idea of the workflow is similar to that used in CovDock.56 (1) First, Cys145 was mutated to alanine, and the generated molecules were locally relaxed and scored using smina. (2) Second, molecules with a smina score worse than −9.0 kcal mol−1 were removed. This procedure retained molecules with high predicted binding affinity. (3) Third, molecules with warhead groups displaced by more than 1 Å after optimization were removed. (4) Finally, the molecules were clustered by their BM-scaffold.57
We first investigated whether MCTS had helped to find molecules with better scores compared with the random search. Fig. 3b shows the range of the smina scores (before structure relaxation) for the 10 best molecules sampled at each step, with MCTS turned on (blue area) and off (grey area). Molecules sampled using MCTS showed significantly lower smina scores than those sampled with the random search (for smina scores, the lower the better), indicating that MCTS can indeed help with the exploration of molecules with a higher predicted binding affinity compared with the random search.
Next, we investigated the distribution of the high-quality molecules generated with DeepLigBuilder. A total of 7,508 high-quality molecules were obtained, with an overall total of 4,016 BM-scaffolds.57 None of these molecules were present in our training and testing datasets, demonstrating the ability of DeepLigBuilder to find novel structures. The distributions of the smina score (after local relaxation) and QED are shown in Fig. 3c, and the color of each record corresponds to the synthetic accessibility score (SAscore,58 the lower the better). All molecules selected had an improved smina score compared with MI-23 (−9.0 kcal mol−1). The mean QED value of the generated molecule was 0.41, which was lower than that of MI-23 (0.57). However, the value was higher compared to those of several other wide-spectrum, peptide-like inhibitors of coronavirus Mpro, such as N1 (ref. 59) (QED = 0.131), N3 (ref. 34 and 59) (QED = 0.123), and N9 (ref. 59) (QED = 0.127). The average synthetic accessibility score was 4.26, which is similar to that of MI-23 (SAscore = 4.20).
The MI-23 molecule contains three features that are relevant to its high binding affinity: (1) a hydrophobic group filling the S2 site, (2) a hydrogen bond acceptor (the CO group in the linker) interacting with the backbone NH group of Glu166, and (3) a hydrophobic group filling the S3 and S4 sites. To examine whether the generated molecules have captured these features, pharmacophore models were generated using the software Align-It.60 The spatial distributions of hydrophobic groups and hydrogen bond acceptors are shown in Fig. 3d and e. Visual inspection of Fig. 3d shows a dense cluster inside the S2 site, indicating that most generated molecules occupy the S2 site with a hydrophobic group, similar to that of MI-23. Although the distribution of hydrophobic groups extends to the S4 site, their distribution at this site is more spread out than that inside S2, possibly because the S4 site is more open than the S2 site. The distribution of hydrogen bond acceptors (Fig. 3e) showed a cluster (although smaller) around the carbonyl group, indicating that the interaction with Glu166 can be captured by some of the generated molecules.
Several generated molecules (compounds 2–4) are shown in Fig. 3f–h. Searching the full PubChem61 and ChEMBL datasets did not reveal any existing molecules similar to compounds 2–4 (Tanimoto score > 95%), indicating that they are indeed novel structures. The interaction patterns of these molecules with Mpro resemble that of MI-23. Notably, compound 3 has recovered the fluorine substituted phenyl ring in M-23 with a similar position and orientation.
To summarize, we show that given a known privileged fragment, DeepLigBuilder was capable of generating molecules with good predicted binding affinity, reasonable drug-likeness, and binding features similar to known inhibitors. These attributes demonstrate the practical applicability of DeepLigBuilder to lead optimization problems.
DeepLigBuilder succeeded in generating molecules with good binding affinities (smina score < −9 kcal mol−1) in 50 of the 64 runs. This result translates to a success rate of 78.1% (standard error 7%). When turning off the MCTS algorithm, the success rate drops to 0% (in 64 runs), demonstrating the ability of MCTS to improve the efficiency of finding molecules with high scores. Fig. 4d shows how the cumulative number of generated samples with good smina scores changed as MCTS proceeded for successful runs, with the blue line indicating the median and the light blue areas indicating the 10–90th and 25–75th percentiles. A typical MCTS run will produce about 129 unique molecules with good binding affinity within 400 runs, which may take 3–5 h to complete on an NVIDIA GeForce RTX 3070 graphics card.
Although the MCTS already includes an entropy-based penalty that encourages exploration, we found that multiple runs are still necessary to achieve better diversity. Fig. 4e shows the t-SNE visualization of the distributions of scaffolds generated from the first run (red), the 2nd to 32nd runs (dark blue), and the 33rd to 64th runs (light blue). Visual inspection shows that there was a large portion of chemical space unexplored after the first run, indicating that multiple runs are needed. We also found that a major portion of the chemical space of the generated molecules was visited within the first 32 runs, as indicated by the large overlap between the light blue and dark blue points. This analysis shows that 32 runs are generally sufficient for this system.
A total of 19014 molecules with smina scores better than −9.0 kcal mol−1 were obtained; none of these molecules were present in our training and testing datasets. The distributions of the smina score, QED, and SAscore are shown in Fig. 4f. The MCTS model manages to maintain an average QED of 0.53, which is higher than the QED value of compound 5. The average SAscore is 2.75, at a similar level to that of compound 5. This shows that MCTS can properly balance between the requirement for a high binding affinity and the requirement for high drug-likeness and synthesis accessibility, even without explicitly including QED and SAscore in the reward function (these properties have already been captured by L-Net during its training using the drug-like ChEMBL subset).
Compound 5 contains several major features contributing to its high inhibitory activity (as shown in Fig. 4g), namely, (1) five rings in the compound occupy the five sub-binding sites (S1′, S1, S2, S3, and S4) in the pocket, providing good steric complementarity between the ligand and the protein; (2) the carbonyl group in the pyridinone ring forms a hydrogen bond with the backbone NH of Glu166; and (3) the pyridine ring forms a hydrogen bond with His163. We investigated the spatial distribution of pharmacophores of the generated molecules to check whether they also contain these structural features. The distributions of hydrophobic groups and hydrogen bond acceptors are shown in Fig. 4h and i, respectively. For hydrophobic groups, it can be seen that there are dense clusters inside the S1′, S1, S2, and S3, with a similar location to the four corresponding rings inside compound 5, showing that those sub-pockets have been well occupied. From the distribution of hydrogen bond acceptors, we can see that the interactions with Glu166 and His163 are well covered by the generated molecule, and the location of the corresponding hydrogen bond donors aligns well with that of compound 5. To summarize, important pharmacophore features for protein–ligand interaction can be observed in the generated molecules.
We then analyzed the chemical scaffolds of the generated molecules. The 19014 generated molecules contained 2,984 BM-scaffolds, examples of which are shown in Fig. 5a. A full list of the top 50 scaffolds is given in the ESI (Fig. S17 and S18†). These scaffolds highly resemble compound 5, with the basic structure of an aromatic core connected with three other rings, filling sites S1′, S1, and S2. This result indicates that DeepLigBuilder can recover the privileged structural patterns. Encouragingly, we also found that the model recalls fragments of compound 5 that are important for binding. For example, scaffold 2 contains a pyridazinone core that mimics the pyridinone ring in compound 5 that interacts with Glu166. Scaffold 3 contains the uracil-like ring in compound 5 that interacts with Thr26 and Cys145.
We selected three compounds with improved QED, SAscore, and smina score compared to compound 5 for further analysis (Fig. 5b). These are novel structures, and searches in the PubChem and ChEMBL databases do not result in molecules with Tanimoto scores larger than 95%. All three compounds show similar interaction patterns to that of compound 5. Interestingly, compounds 6 and 8 contain an additional hydroxyl group in the S2 site phenyl ring that forms hydrogen bonds with the backbone CO groups of Met49 and Asp187, which also makes the phenyl ring dive deeper into the S2 site to offer better steric complementarity. These findings highlight the ability of DeepLigBuilder to discover novel interactions with the target protein. Of course, determining whether these designed compounds potently inhibit Mpro needs further experimental study.
L-Net provides a powerful tool for 3D molecule deep generation, which can be extended or fine-tuned with other datasets. For example, L-Net could be trained with organic material datasets to generate compounds with desired properties. L-Net could also be used to explore the drug-like chemical space. A previous study63 shows that SMILES-based generative models trained on a small subset of GDB-13 can cover a major portion of the GDB-13 chemical space. Similarly, we expect that L-Net can be used to enumerate molecules in drug-like and synthetically accessible chemical space with 3D structures, thereby enabling a better understanding of the scale of the chemical space and its underlying structure.
In the present study, we used docking scores that reflect the interactions between the target and the ligand as the reward function in the MCTS step, which can be changed to address a variety of design needs, such as similarity and pharmacophore-based rewards. Conditional generative models can also be built using L-Net to further reduce the search time of MCTS.
We used DeepLigBuilder to design potential inhibitors for the main protease of SARS-CoV-2 as a case study. DeepLigBuilder was able to produce promising drug-like compounds with novel chemical structures and high predicted binding affinity, which capture important pharmacophore features of known inhibitors for both lead optimization and de novo generation tasks. DeepLigBuilder is flexible enough to accept a user-defined seed structure, even starting from one atom. Automatic seed generation methods could be included in the future to further extend its capability. Graphical interfaces for user interactions during the generation process may also be developed. To summarize, L-Net and DeepLigBuilder provide promising new tools for structure-based de novo drug design, lead optimization, and design of other functional molecules.
Footnote |
† Electronic Supplementary Information (ESI) available: Supplementary methods, figures and tables. See DOI: 10.1039/d1sc04444c. |
This journal is © The Royal Society of Chemistry 2021 |