Deep-learning-based target screening and similarity search for the predicted inhibitors of the pathways in Parkinson's disease

Herein, a two-step de novo approach was developed for the prediction of piperine targets and another prediction of similar (piperine) compounds from a small molecule library using a deep-learning method. Deep-learning and neural-network approaches were used for target prediction, similarity searches, and validation. The present approach was trained on records containing the data. The model attained an overall accuracy of around 87.5%, where the training and test set was kept as 70% and 30% (17 226/40 197), respectively. This method predicted two targets (MAO-A and MAO-B) and 101 compounds as piperine derivatives. MAO-A and MAO-B are important drug targets in Parkinson's disease. Validation of this method was also performed by considering piperine and its targets (monoamine oxidase A and B) using molecular docking, dynamics simulation and post-simulation analysis of all the selected compounds. Rasagiline, lazabemide, and selegiline were selected as controls, which are already FDA-approved drugs against these targets. Molecular docking studies of the FDA-approved drugs and the compounds we predicted using DL and neural networks were carried out against MAO-A and MAO-B. Using the molecular docking's scoring function, molecular dynamics simulation and free energy calculations as extended validation methods, it was observed that the compounds predicted herein possessed excellent inhibitory effects against the selected targets. Thus, deep learning may play a very effective role in predicting the potential compounds, their targets and can play an expanded role in computer-aided drug approaches.


Introduction
The remarkable growth of structure-based virtual screening techniques and condence in these approaches have accelerated the drug-discovery process. 1,2 These methods depend on the binding affinity scores between a target and a candidate molecule based on the 3D structure of their complex to predict the top hit molecules for onward processing and the following experimental investigation. The available scoring schemes are based on "statistical or expert" analysis of available proteinligand structures. 3,4 Various research studies conducted on drug discoveries witnessed the ever-increasing use of machine learning (ML) methodologies [5][6][7] to identify the relationship in protein-ligand complexes. ML models convert these relationships into a scoring scheme (binding affinity scores), and these binding affinity scores provide a simple and better alternative to inferences based on the statistics and expert knowledge. In the ML approaches, the input data and expected results are provided to the model, and the model then predicts the outcome. A minute change in the model parameters can have incremental effects on the scores in an ML scoring system based on the RF-Score 8,9 with an enabled random forest option and NNscore, 10,11 primarily using neural networks. The outcome of this scoring system can be laudable in virtual screening to yield more active compounds as compared to the case of classical approaches. 9,12 The growth of substantial structural data as well as affinity data provide a spur to researchers to explore them via deep learning approaches. In deep learning, the information contained in data is used to develop a meaningful relationship with the output. Therefore, the presentation of input data and its relationship with the output has a signicant impact on the predictions of the model used. 13 Continuous and prolonged research efforts in this eld have enabled feature extraction to be developed in ML models. Thus in this approach, molecule representation is treated as the rst part of the model. The molecule representation coupled with the predictive part is then used to extract features to solve specic tasks. This mechanism has been proved to be useful to nd unknown and novel relationship. 14,15 Deep learning is widely applied by bioinformaticians 16,17 and computational biologists. 18 In recent years, deep-learning methods have shown promising utilization in computer-aided drug design (CADD), where rst structure-based approaches and then ligand-based models have been used. In the simplest deep-learning models, structure-based designs use molecular information as vectors to develop connected neural networks on top of them. These approaches return promising results to predict the bioactivity, 19 aqueous solubility, 20 and toxicity 21 of the structures. Additionally, the multitask neural network model can predict the activities of multiple targets, and the results of the QSAR models are usually better than from singletask networks due to the better representation, training of data, and recognition of general patterns in the data. 7,19,[21][22][23] Neural networks are exible and thus provide the best representation of data to the model, e.g., by using convolution or a recurrent neural network to get patterns or an acyclic representation on the molecular graph. 24,25 Numerous deep-learning studies have been conducted, where auto-encoders or recurrent neural network methods used in deep models have been used to propose new molecules with the desired properties. [26][27][28][29] The application of deep-learning approaches to ligand-based and structure-based analysis has resulted in the development of various ligand-based and structure-based neural networks, such as AtomNet 30 and other models like those proposed by ref. 31 and 32. In AtomNet, a molecular complex (input) is fed in to the convolutional neural network, which recognizes the interacting atoms and assigns a score of 1 for the active ligands and 0 for inactive ligands. The model proposed by ref. 31 was based on activity and prediction, while that in ref. 32 was based on the energy gap between a protein-ligand complex and the apo states. Deep learning utilizes a exible architecture as compared to the other existing methods to design a problem-specic neural network (NN) to resolve it. Determining the protein-ligand interaction is the fundamental part of the molecular docking program, and for this, many scoring functions have been developed either on the basis of force elds or knowledge of existing complex protein-ligand structures. 33 Considering the current ndings and approaches above, we developed a two-step de novo approach using the PERL script, where specic inputs are used in order to have a good output efficiency. The overall dataset comprised a 70% training set and the remaining 30% was used as the test dataset. Piperine and eight targets were considered for training and testing the dataset for the prediction of piperine targets. Piperine and its compounds in PubChem and ZINC databases were used for training and testing the dataset for the prediction of similar compounds from small molecules libraries (ZINC and Pub-Chem), with 101 compounds studied as potential inhibitors. The dataset was split in a 70/30 ratio for training and testing using the PERL script for our novel deep neural network tailored to 8 experimentally reported structures, and then for the derivative prioritization of piperine screening. The predicted top ve compounds were then validated by comparing with experimentally reported FDA approved drugs (lazabemide, rasagiline, and selegiline) by using rational docking, molecular dynamics simulation, and free energy calculations. Overall these methods reported that the compounds predicted by our methodology possessed a higher potential than those of the drugs experimentally reported to be active.

Deep learning approach
Step 1: A dataset was normalized using the PERL script, where specic inputs were used in order to achieve a good output efficiency. The dataset was divided into a 70% training set and the remaining 30% was used as the test dataset. The validation observed for the test dataset was about 87.5%. The overall ow of the work is given in Fig. 1.
Step 2: Piperine and eight targets were used for training and testing the dataset for the prediction of piperine targets. Piperine and 57 423 compounds were used for the training and testing of the dataset for the prediction of similar compounds from small molecules library (ZINC and PubChem), with 101 compounds studied as a potential inhibitors, with the dataset categorized in a 70/30 ratio into a training and testing set using the PERL script. These steps are given below: Step1: Normalization of 57 423 compounds in the dataset.
Step2: Input the data for training: (1) Prediction of the piperine targets' interrelated values of input and output to execute for training.
(2) Prediction of similar (piperine) compounds' interrelated values of input and output to execute for training.
Step 4: Calculate the neurons of the output, with every neuron output signal calculated.
Step 5: Signal for the output layers calculation.
Step 6: Compute the error of the neuron and repeat step 3 and step 6 until the network is congregated and the error is computed.
RMSE and MAE (mean absolute error) were used to measure the prediction error. A correlation was assessed by using Pearson's correlation coefficient (R) and the standard regression (SD) deviation (i) methods.
where t i and y i are the measured and predicted affinities for the ith complex, whereas a and b are the slope and the intercept of the regression line the between measured and predicted values, respectively.
where w 0j is a bias.
where the sum is over all the neurons k k in the (l À 1) th (l À 1) th layer. To rewrite this expression in a matrix form, we dened a weight matrix w l w l for each layer, ll. The entries of the weight matrix w l w l are just the weights connected to the l th l th layer of the neurons, that is, the entry in the j th j th row and the k th k th column is w ljk w jkl . Similarly, for each layer l l , we dened a bias  vector, b l b l . You can probably guess how this works-the components of the bias vector are just the values b lj b jl , i.e., one component for each neuron in the l th l th layer. Finally, we dened an activation vector a l a l whose components are the activations a lj a jl .

Validation
Molecular docking. The docking estimation was performed on the retrieved protein structures, namely monoamine oxidase A (PDB ID: 2BXS) and monoamine oxidase B (PDB ID: 1GOS), from the protein data bank (RCSB) (http://www.rcsb.org). 34 MMFF force eld was used to optimize the structure of piperine. An energy minimization step was performed using Powell's method and the default setting. The binding potential of piperine with protein was estimated by using the Lamarckian genetic algorithm in Autodock 4.0. 35 The binding energies between the protein and ligand were estimated on a grid map generated by the AutoGrid program. The compounds with the lowest energy values were selected for onward processing.
Interaction pattern and poses analysis. The interactions of the selected ligands with MAO-A and MAO-B were sampled by using the Pymol Visualization tool 36 and Protein-Ligand Interaction Proler (PLIP) (https://projects.biotec.tu-dresden.de/plipweb/plip/index). 37 The hydrogen bonding, electrostatic interactions, hydrophobic, and other interactions were visualized.
All atoms simulations. The AMBER 14 molecular dynamics package 38 was used to conduct the MD simulations for all the selected complexes. The addition of Na + ions and hydrogen helped to neutralize the systems counter with the application the "tleap" package in Amber. A TIP3P water box of 8.0Å was used. Energy minimization of the complexes was carried out in AMBER 14 using the SANDER module at two stages (each of 6000 steps) in order to remove all the constraints atoms in the systems. PMEMD.cuda 39 was used for the MD simulations. The SHAKE and Particle-Mesh Ewald (PME) methods with a nonbond contacts cutoff radius of 10Å were used for the longterm interactions. Isotropic molecule-based scaling with 310 K (Langevin temperature) and (constant pressure) 1 atm was considered for 10 000 picoseconds equilibration time, followed by a total simulation of 20 ns. Aer every 2.0 ps time scale, MD trajectory sampling was performed. RMSD, RMSF, and hydrogen bonding were calculated by using CPPTRAJ and PYTRAJ. 40 The following equation was solved to calculate the stability of the complexes aer 100 ns.
where, N ¼ represents total atoms, m i ¼ mass of atom i, X i and Y i ¼ coordinate vectors for the target and reference atom i, M ¼ total mass. Binding free energy calculation. The binding of ligands to MAO-A and MAO-B could be quantitatively measured by using MM-GBSA combined with MD simulation. 41 For each molecular species, apo and holo, the G bind (binding free energy) was calculated by using the following equation: Fig. 3 Depiction of the experimentally reported targets for piperine. The deep-learning-based scoring to predict the targets for piperine was applied to filter the top targets for piperine. Ranking of each target was carried out by the scoring given against each target.
This journal is © The Royal Society of Chemistry 2019 The different components (G R+L , G R , and G L ) required for the free energy calculation of the apo and holo states are given in eqn (v). In the MM/GBSA and MM/PBSA methods, each free energy term in eqn (v) is calculated using the following equation: In eqn (vi), E bond , E vdw , and E elec are the bond energies, van der Waals, and electrostatic energy, including the dihedral bonds and angles, G PB and G SA . TS S represents the solvation energy corresponding to the polar and non-polar contributions, including absolute energy and solute entropy. The optimized parameters and MIEC model, as proposed recently, work for calculating the free energies between protein-protein interfaces, 42-45 but here we utilized the MM-PBSA.py method using interior solute and exterior solvent values as constant 46 to calculate the free energy.

Deep-learning-based target screening and similarity search
This study was categorized into two parts. The rst part predicted targets for piperine and potential compounds using piperine as the input. A deep-learning model was then used for the prediction of piperine's targets and its derivatives. The potential of the nal predicted compounds was tested by using rational docking, molecular dynamics simulation, and free energy calculations. The pipeline was supplemented with FDA-approved drugs as controls. The approach was written in PERL script, where the prediction accuracy achieved was 87.5% based on a deep-learning network (Fig. 2). Piperine and eight targets (Fig. 3) were taken for training  and testing of the dataset for the prediction of piperine targets; while Piperine and 57 423 compounds were taken for training and testing of the dataset for the prediction of similar compounds from small molecules library (ZINC and PubChem), where 101 compounds were studied as potential inhibitors.
(1) Initialize the weight and parameters m (m ¼ 0.01) (2) Compute the sum of the squared errors overall input F (w) ¼ eTe, where the weight of network w ¼ [w 1 , w 2 , w 3 ,.w n ) and e is the error vector for the network (3) Solve to obtain the increment of weight D w ¼ [J T J + mI] À1 J T e, where J is a Jacobian matrix, m is learning rate neither m is multiplied by decay rate b (0<b<1) (4) Using w+ D w F (w) < F (w) then (go back to step 2)

Validation of the predicted targets and compounds
Scaffold evaluation. The scaffold similarity of the predicted compounds and those of the FDA-approved drugs as control was carried out to mark the identity among these compounds. Fig. 4 shows the structural models of all the predicted and control compounds. It can be observed from the scaffolds that overall some rings that form hydrophobic interactions are similar to those of the control compounds. The backbones in the predicted compounds containing variable atoms signicantly contribute to the formation of hydrogen bonding and thus produce strong inhibitory effects.
Ranking the interaction poses. Validation of our predicted best compounds against the selected targets was done by using the rational docking approach. To sample the best conformations of the predicted inhibitors in the active site, the 3D coordinates of the screened and selected targets (monoamine oxidase A and B) were retrieved from RCSB and prepared for docking simulation. Lowest energy conformational sampling, out of the total allowed ve poses for each ligand, was done by analyzing the docking scores. For validation of our predicted compounds, we selected the top three active drugs against these targets as controls. The dataset containing the control and testing compounds was docked into the active pockets of our selected targets. The results showed that our selected compounds possessed stronger activity than the three control drug candidates. Table 1, summarizes the scores of each compound against the dened targets. Our induced-t docking approach revealed that lazabemide, rasagiline, and selegiline possessed low binding affinities, specically À6.06, À5.96, and À5.90 kcal mol À1 , respectively, when compared to the predicted compounds. Among the deep-learning-based predicted compounds, compound 2 possessed the highest binding affinity À9.8 kcal mol À1 against MAO-A, followed by compound 3, with a binding affinity of À9.5 kcal mol À1 , while compound 1 and 4 (À8.5 kcal mol À1 ) and compound 5 (À8.1 kcal mol À1 ) showed lower binding affinities but still better than the controls. These results suggest that our predicted compounds possessed better inhibitory property than those of the experimentally reported active compounds.
On the other hand, compounds such as lazabemide, rasagiline, and selegiline showed docking scores of À5.7, À6.1, and À6.3 kcal mol À1 against MAO-B. Docking of our deep-learningbased predicted compounds resulted in higher binding affinities compared to the control compounds. The total binding score for each compound (compound 1 to compound 5) was predicted to be À9.6, À9.3, À8.8, À9.5 and À9.3 kcal mol À1 , respectively. These results suggest that our deep-learning-based method outperformed the controls and the predicted compounds had higher inhibiting potential than the experimentally active reported. It is also essential to explain that the predicted compounds possessed higher activity against MAO-B when compared to MAO-A.
Interactions of the top ranking poses (Fig. 5 and 6) also showed that besides having the benzene ring in common responsible for hydrophobic interactions, we predicted the differences in the backbone of the compounds are strongly associated with the formation of hydrogen bonds with the active site residues. The docking scores of all the control and our predicted compounds are summarized in Table 1. These results conrmed that our methodology based on deep learning predicted potential compounds better than the already approved drugs and thus showed this technique could be applied to other targets for potential drug candidates discovery.
Stability analysis of the bound complexes. Post-simulation analysis, such as root mean square deviation (RMSD), of all the selected complexes was carried out to test the stability of our predicted compounds in the active pockets. Both apo and holo systems were subjected to 100 ns simulation time. An initial analysis revealed that the average RMSD for all the systems lay between 1Å and 2.5Å, which conrmed the dynamic stability of all the systems. Acceptable uctuations were observed in some systems, but later on, the production stage was stable until 100 ns. Fig. 7 and 8 show the RMSD graphs of all the systems. The complexes (protein-ligands) systems attained the equilibrium state in the rst 10-20 ns. The RMSD increased up to 2.1Å and then reduced to 1.5Å. Aerward, the RMSD remained constant around 1.5Å with acceptable ux. In the case of the selegiline-MAO-A complex, the system attained a weak equilibrium state around 2Å in 70-80 ns as compared to the apo system. Lazabemide-MAO-A complex also lost its stability from 70 ns and onwards with little uctuation. The compound 3-MAO-A system lost the equilibrium state from 52-60 ns and remained stable for the rest of the MD simulation. The RMSD analyses showed the stable behavior of the predicted ligand complexes to conrm the strong binding and thus inhibiting inuence on the receptor.
In the case of monoamine oxidase B systems, for the selegiline-MAO-B complex, the system attained equilibrium soon aer reaching 15 ns, but uctuations up to 3Å were also observed between 70-80 ns. In the case of compound 4-MAO-B, the complex showed higher uctuations up to 4Å until 8 ns, but later on, the system attained the equilibrium state and followed the stability path until the end. On the other hand, compound 4-MAO-B complex was unstable until 40 ns from the very beginning. Later on, the system remained stable for the rest of the simulation time. In the case of the other systems, little uctuations in the acceptable range were observed, but overall the binding of ligands in the active site stabilized the systems by contributing a different bonding energy. These results suggest that the binding of our predicted compounds tightly occupied the binding sites of MAO-A and MAO-B and thus produced a strong inhibitory effect as compared to the other systems.
In order to nd the residual uctuations in MAO-A and MAO-B systems, both in the apo and complex state, root mean square uctuation (RMSF) values of Ca were calculated. Fluctuations in both the apo and holo states of MAO-A were negligible. In all the systems, most of the uctuations occurred in the N-terminus part. However, it was also observed that the binding of the inhibitor in the active site stabilized the systems by decreasing the residual uctuation. In the case of MAO-B complexes when compared to the apo state, the C-terminus part also showed higher uctuations. It is clear from the RMSF graphs ( Fig. 9 and  10) that the binding of our predicted inhibitors signicantly affected the residual uctuation of the complexes.
Binding free energy analysis. MM/PBSA and MM/GBSA methods are popular approaches to estimate the free energy of the binding of small ligands to biological macromolecules. In order to validate the accuracy of our method, the predicted top 5 ligands and also the controls were subjected to free energy calculations (Fig. 11). The results from the calculations on MAO-A-ligands and MAO-B ligands complexes suggested that our predicted compounds were stronger inhibitors than those already reported. Energies calculations on MAO-A-ligands complexes reported that the predicted compounds were sequentially stronger inhibitors than the controls. Considering the total free energy (DG bind ) as tabulated in Table 2, it is suggested that compound 1 possessed stronger binding affinity (À59.24 kcal mol À1 ) energy, followed by compound 2 (À53.31 kcal mol À1 ), compound 3 (À51.32 kcal mol À1 ), compound 4 (À49.08 kcal mol À1 ), and nally compound 5 (À43.63 kcal mol À1 ), respectively.
On the other hand, the compounds taken here as the controls produced weaker binding energies as compared to our predicted compounds. Specically, the total free energies for rasagiline, lazabemide, and selegiline were reported to be À33.00, À31.72, and À39.42 kcal mol À1 , respectively. The total free energy for piperine, which was considered as an input for the similarity search, was reported to be À51.77 kcal mol À1 , which is ultimately better than the three selected controls. It could be inferred from these binding energies against MAO-A that our predicted compounds could efficiently inhibit MAO-A better than those of the already experimentally reported compounds.
The binding affinities of ligands and MAO-B were also calculated from the last 10 ns of the MD trajectory. As can be seen from Table 2, the total DG bind values of rasagiline, lazabemide, and selegiline were À38.60, À33.19, and À38.23 kcal mol À1 , respectively. The results of our top predicted compounds showed values of À59.81, À51.90, À52.57, À53.95, and À55.17 kcal mol À1 , respectively, which conrmed the strong inhibition properties of these compounds. The interaction of piperine with MAO-B also possessed strong binding affinity by contributing a total energy of À52.69 kcal mol À1 . These results essentially validate our prediction method and thus the reported novel ligands that could robustly inhibit these targets.

Discussion
The discovery of novel small molecules with strong inhibitory potential is a common practice used by researchers. Essential drug features, such as HBD, HBA, and others, are used by computational chemists to nd novel drug candidates based on these dened features. Machine-learning methods, such as ANN, have long been used in the prediction of molecule activity. Generally, DL strategies are enacted in the rst place to handle the issues of activity prediction. When compounds are portrayed with the same number of molecular descriptors, researchers use fully connected DNNs to build models, which is considered a straightforward method. 47 Evaluating the interaction between a protein and a ligand is the key element in the molecular docking program, and many scoring functions have been built up either by force elds or using the knowledge of existing complex protein-ligand structures to assist this process. 48 A typical example is given in the investigation done by Ragoza et al. 31 In this work, a deep neural network combined with a ML approach was used as a scoring function in the virtual screening or as an affinity predictor for novel molecules aer a complex is generated. It can be either applied to test multiple compounds against a single protein or to test multiple proteins against a single compound. The model was applied to a single drug, namely piperine, and its experimental targets. A general docking approach and molecular dynamics simulation approaches were used as supplementary validation methods to investigate the potential of the predicted compounds against the prioritized targets. A total of eight experimental targets were selected, including TRPV1, 49 nuclear factor-kB, 50 monoamine oxidase A, monoamine oxidase B, 51 carbonic anhydrase I, carbonic anhydrase II, 52 lipoxygenase, 53 P-glycoprotein I and CYP3A4, 54 which were reported to be inhibited by piperine. Our deep-learning-based approach discovered that piperine could efficiently inhibit MAO-A and MAO-B. Monoamine oxidase (MAO) catalyzes primary, secondary, and tertiary amines and is considered one of the essential enzymes in neurotransmitter metabolism. Its physiological roles and inhibitors play a significant role in understanding the functional roles of dopamine (DA), norepinephrine, and serotonin (5-HT) neurotransmission in the central nervous system (CNS). It is, therefore, an essential drug target for the treatment of Parkinson's disease.
FDA-approved experimental ligands, such as rasagiline, lazabemide, and selegiline, were compared to piperine by using a conventional docking approach, which reported that piperine is better than all the others. Using a machine-learning approach, piperine was considered as an input for similarity search considering its inhibitory features. PubChem and ZINC databases were subjected to similarity searches to obtain the top 100 hits. Using ML scoring function, only the top 5 compounds were selected for further evaluation to evaluate the prediction power and accuracy of our method. Molecular docking, molecular dynamics simulations, post-simulation analyses, and free energy calculations conrmed that the compounds we predicted based on piperine were more potent inhibitors of MAO-A and MAO-B. Interaction pattern evaluation helped in understanding the bonding pattern. It was observed that the extra ring in the predicted compounds structures and different atoms in the backbones potentially formed hydrogen bonds with the active site residues. Overall the performance of our prediction method outperformed the controls by predicting the most potent compounds. It can, therefore, help in discovering new potential drugs, but also in investigating the side effects of bioactive molecules. By anticipating the potential impact of new drugs on the biology of the cell, deep-learning approaches may contribute to such disciplines as systems medicine and systems biology. The results obtained and the careful analysis of the results revealed reliable predictions based on relevant features. Thus, deep learning and ML-based features can signicantly increase the reliability and accuracy of predicting novel inhibitors.

Conclusion
This study was based on deep learning and machine-learning approaches to determine the impact of these state-of-the-art methods in predicting novel compounds against diseasecausing targets. The prediction of targets and then similarity searches predicted potential compounds based on already approved drugs. Integrated MD simulations and free energy Fig. 11 Total free energy of the controls and predicted compounds against monoamine oxidase A and B are given in the graphs. All the energies are calculated in kcal mol À1 . Table 2 Binding free energies of the predicted inhibitors to monoamine oxidase A and monoamine oxidase B calculated by using MM-GBSA approach. Rasagiline, lazabemide, selegiline, and piperine were grouped as the controls. Compounds 1-5 are our predicted compounds based on the similarity search calculations revealed that the predicted compounds possessed stronger inhibitory potential than those of the already FDAapproved compounds, thus showing the enhanced reliability and accuracy of our method.