Open Access Article
Sandhi Kranthi Reddy
*a,
S. V. G. Reddy
a and
Syed Hussain Basha
b
aDepartment of CSE, GST, GITAM (Deemed to be University), Visakhapatnam, A.P, India
bInnovative Informatica Technologies, Hyderabad, Telangana, India
First published on 10th January 2025
Non-Small Cell Lung Cancer (NSCLC) is a formidable global health challenge, responsible for the majority of cancer-related deaths worldwide. The Platelet-Derived Growth Factor Receptor (PDGFR) has emerged as a promising therapeutic target in NSCLC, given its crucial involvement in cell growth, proliferation, angiogenesis, and tumor progression. Among PDGFR inhibitors, avapritinib has garnered attention due to its selective activity against mutant forms of PDGFR, particularly PDGFRA D842V and KIT exon 17 D816V, linked to resistance against conventional tyrosine kinase inhibitors. In recent years, Machine Learning has emerged as a powerful tool in pharmaceutical research, offering data-driven insights and accelerating lead identification for drug discovery. In this research article, we focus on the application of Machine Learning, alongside the RDKit toolkit, to identify potential anti-cancer drug candidates targeting PDGFR in NSCLC. Our study demonstrates how smart algorithms efficiently narrow down large screening collections to target-specific sets of just a few hundred small molecules, streamlining the hit discovery process. Employing a Machine Learning-assisted virtual screening strategy, we successfully preselected 220 compounds with potential PDGFRA inhibitory activity from a vast library of 1.048 million compounds, representing a mere 0.013% of the original library. To validate these candidates, we employed traditional genetic algorithm-based virtual screening and docking methods. Remarkably, we found that ZINC000002931631 exhibited comparable or even superior inhibitory potential against PDGFRA compared to Avapritinib, which highlights the value of our Machine Learning approach. Moreover, as part of our lead validation studies, we conducted molecular dynamic simulations, revealing critical molecular–level interactions responsible for the conformational changes in PDGFRA necessary for substrate binding. Our study exemplifies the potential of Machine Learning in the drug discovery process, providing a more efficient and cost-effective means of identifying promising drug candidates for NSCLC treatment. The success of this approach in preselecting compounds with potent PDGFRA inhibitory potential highlights its significance in advancing personalized and targeted therapies for cancer treatment.
Despite advances in early detection and treatment modalities, the majority of NSCLC patients are diagnosed at an advanced stage, when curative treatment options are limited. 6 Standard treatments include surgical resection, radiation therapy, and chemotherapy, often used in combination depending on the stage and extent of the disease. 7 In the last decade, targeted therapies and immunotherapies have emerged as game-changing options for subsets of NSCLC patients, offering improved survival rates and fewer side effects compared to conventional chemotherapy. 8 However, challenges persist, such as resistance to therapies, disease relapse, and the identification of novel targets for those without actionable mutations. Consequently, ongoing research efforts focus on identifying new therapeutic targets, understanding the mechanisms of resistance, and developing innovative treatment strategies to combat NSCLC effectively. 9
The Platelet-Derived Growth Factor Receptor (PDGFR) has emerged as a promising target in NSCLC due to its critical role in various cellular processes, including cell growth, proliferation, angiogenesis, and tumor progression. 10 PDGFR belongs to the receptor tyrosine kinase family and is activated by binding to its ligands, primarily platelet-derived growth factors (PDGFs). 11 Upon activation, PDGFR triggers intracellular signaling cascades, such as the MAPK and PI3K pathways, which play pivotal roles in cell survival, migration, and angiogenesis. 12
Increasing evidence suggests that PDGFR signaling is dysregulated in NSCLC and is associated with aggressive tumor behavior, metastasis, and therapeutic resistance. Overexpression and activation of PDGFR have been observed in NSCLC patient samples and cell lines, making it an attractive therapeutic target. 13 Preclinical studies investigating the inhibition of PDGFR signaling in NSCLC have demonstrated promising results. 14 Inhibitors targeting PDGFR have shown antitumor activity and suppressed tumor growth in experimental models of NSCLC. 15 Moreover, PDGFR inhibitors have demonstrated the potential to enhance the efficacy of existing therapies and overcome resistance to conventional treatments, making them an appealing addition to current treatment regimens. 16
Clinical trials evaluating PDGFR-targeted therapies in NSCLC patients have shown encouraging outcomes, further supporting the potential utility of PDGFR inhibition in the clinical setting. 17 Among the PDGFR inhibitors, avapritinib has garnered increasing attention for its potent and selective activity against mutant forms of PDGFR, particularly PDGFRA D842V and KIT exon 17 D816V, which are associated with resistance to conventional tyrosine kinase inhibitors. 18
Avapritinib is an orally available small molecule kinase inhibitor developed by Blueprint Medicines Corporation. It is designed to specifically target and inhibit the activity of PDGFR and KIT receptors, which play critical roles in cell signaling and cancer development. Avapritinib is specifically intended for adults with unresectable or metastatic gastrointestinal stromal tumor (GIST). 19 Avapritinib's exceptional specificity for PDGFR and KIT mutant forms makes it a promising candidate for targeted therapy in NSCLC patients harboring these mutations. 20 Preclinical studies have demonstrated avapritinib's efficacy in inhibiting tumor growth and reducing angiogenesis in NSCLC models expressing PDGFRA D842V and KIT exon 17 D816V mutations. 21 Moreover, avapritinib's ability to overcome resistance to other tyrosine kinase inhibitors provides a unique advantage in treating NSCLC patients with limited therapeutic options, as a repurposed drug. 22 Clinical trials evaluating avapritinib in NSCLC patients with PDGFR alterations have shown promising results, with significant antitumor activity and improved overall survival rates. Notably, avapritinib's targeted approach minimizes off-target effects on normal cells, potentially reducing treatment-related toxicities compared to traditional chemotherapies. 23 Despite these encouraging findings, challenges remain in understanding the optimal patient selection, defining the appropriate combination therapies, and addressing potential resistance mechanisms.
In recent years, the convergence of computational approaches and molecular biology has led to groundbreaking advancements in drug discovery, particularly in the identification of potential anti-cancer agents. Machine Learning, a subset of artificial intelligence, has emerged as a powerful tool in the pharmaceutical industry for lead identification, offering accelerated and data-driven insights into novel drug candidates. Coupled with RDKit, a widely-used open-source cheminformatics toolkit, 24 these methodologies hold immense promise in the quest for more effective and targeted therapies for complex diseases like non-small cell lung cancer (NSCLC) through PDGFR targeting.
Machine Learning offers the advantage of processing large-scale datasets, uncovering intricate patterns, and predicting molecular properties with remarkable accuracy. In the context of lead identification, Machine Learning models are trained using diverse chemical libraries and biological data to discern molecular features associated with desired therapeutic activities. 25 These models can then be applied to screen vast chemical databases to prioritize promising drug candidates, accelerating the drug discovery process and reducing the associated costs and time. RDKit, on the other hand, provides a versatile and efficient toolkit for cheminformatics, allowing researchers to manipulate chemical structures, calculate molecular descriptors, and perform virtual screening and structure–activity relationship (SAR) analysis. Its seamless integration with Machine Learning workflows facilitates the rapid exploration of chemical space and the generation of predictive models that lead to rational drug design.
In this research paper, we focus on the application of Machine Learning and RDKit in lead identification for anti-cancer drug discovery, with a specific emphasis on PDGFR targeting in NSCLC, as a preliminary step towards repurposing existing PDGFR inhibitor designed for GIST as a NSCLC inhibitor along with finding similar or better molecules.
Molecular fingerprints were generated using the open-source cheminformatics library, RDKit. The required libraries for the research are imported, including pandas, numpy, matplotlib, and RDKit. Pandas is used for reading CSV files, numpy for converting RDKit objects to number arrays, matplotlib for plotting, and RDKit for molecular fingerprint generation.
The K-Means clustering algorithm was chosen for its efficiency and ability to handle high-dimensional, unlabelled data. K-Means assigns data points or compounds to clusters where points or compounds within each cluster are more similar to each other than to those in other clusters. K-Means offers faster computation through its iterative process when compared to other clustering algorithms Hierarchical Clustering and DBSCAN, making it ideal for high-throughput virtual screening.
We can integrate the K-Means clustering machine learning algorithm into the elbow approach. A well-liked unsupervised machine learning approach called K-Means clustering is used to divide a dataset into a number of separate, non-overlapping clusters. To organize data points into K clusters, where K is a predetermined number, is the basic goal of K-Means. The goal is to reduce the sum of squared distances between data points and their cluster's mean. Each data point belongs to the cluster with the closest mean (center). The elbow method is utilized in K-Means clustering. K-Means doesn't need labeled data, in contrast to supervised learning. K cluster centroids are iteratively adjusted until they cease moving after being initialized randomly. For a better understanding, let's go over the steps involved in K-means clustering:
• Choose (K) how many clusters there should be in the dataset.
• Decide on K centroids at random from the dataset.
• To create K clusters, we will now utilize the Euclidean distance or Manhattan distance as the metric to determine the distance between each point and the nearest cluster centroid.
• Now identify the new centroid of the resulting clusters.
• Repeat step 4 after once more reassigning the entire data point based on this new centroid. The process will be repeated until the centroid's position stays the same, or there is no longer any convergence, after a predetermined number of iterations.
The key to this approach is determining the ideal number of clusters. The Elbow Method is a frequently employed technique for determining the ideal K value. In the Elbow method, we are actually varying the number of clusters (K) from 1–20 depending upon the dataset. For each value of K, we are calculating WCSS (Within-Cluster Sum of Square). WCSS is the sum of the squared distance between each point and the centroid in a cluster. When we plot the WCSS with the K value, the plot looks like an Elbow. As the number of clusters increases, the WCSS value will start to decrease. WCSS value is largest when K = 1. When we analyze the graph, we can see that the graph will rapidly change at a point and thus creating an elbow shape. From this point, the graph moves almost parallel to the X-axis. The K value corresponding to this point is the optimal value of K or an optimal number of clusters. When referring to the quality of clusters in unsupervised clustering methods like K-Means, the term “silhouette” in data analysis and clustering typically refers to the silhouette coefficient or silhouette score. When compared to the closest neighboring cluster, the silhouette score indicates how similar each data point in a cluster is to its neighbors in that cluster. From −1 to 1, better results are indicated by higher values.
After generating molecular fingerprints, the application of the K-Means clustering algorithm is followed by validation to determine the primary cluster, with a focus on identifying lead compounds exhibiting the highest binding energy.
A Convolutional Neural Network (CNN) is structured with several layers, including the input layer, Convolutional layer, Pooling layer, and fully connected layers. 34
These layers work together in a hierarchical manner, with deeper layers learning increasingly abstract and complex features. DCNNs are particularly well-suited for tasks like image recognition and computer vision due to their ability to automatically learn and represent hierarchical features within data. KDEEP is a protein–ligand affinity predictor based on DCNNs (Deep Convolutional Neural Networks) is used for the virtual screening process. 33 The results for the core test set for the standard PDBbind (v. 2016) 34 are state-of-the-art, with a Pearson's correlation coefficient of 0.82 and an RMSE in pK units between experimental and projected affinity of 1.27. KDEEP is made available through https://PlayMolecule.org so users can quickly test their own protein–ligand complexes. Each prediction only takes a brief amount of time. 35 KDEEP is already in use as a desirable scoring function for contemporary computational chemistry pipelines because of its speed, performance, and simplicity. Deep Convolution Neural Networks utilized for the protein–ligand binding affinity predictions as part of the KDEEP program voxelized given compound structure into 8 different pharmacophoric-like features (hydrophobic, aromatic, hydrogen-bond donor and acceptor, positive and negative ionizable, metallic and total excluded volume). Then, it is used as input for a DCNN model, which is pre-trained using the PDBbind v.2016 database, to predict the binding poses and binding energies.
000 cubic Å s for PDGFRA in its apo state, complexed with Avapritinib and in complex with ZINC000002931631 at its binding site. During the equilibration process, van der Waals and short-range electrostatic interactions were cut off at 9 Å, while long-range electrostatic interactions were computed using the Particle Mesh Ewald method. 48 A RESPA integrator 49 with a time step of 2 fs was used, and long-range electrostatics were computed every 6 fs. The equilibration was carried out using the Desmond program in the NPT ensemble 50 at a temperature of 300 K and 1 bar pressure using the Nose–Hoover chain relaxation thermostat method along with the Martyna–Tobias–Klein relaxation Barostat method 51 with isotropic coupling style at 1 ps & 2 ps timescales, respectively. Throughout the simulated timescale, all the simulations were conducted under the same temperature, pressure, and volume conditions. 52,53 As part of the simulation quality analysis, it was observed that the average total energy of the simulated system remained constant at −100500 kcal mol−1 in all cases of simulations.
The Tanimoto coefficient, also known as the Jaccard similarity coefficient, is considered efficient for certain types of similarity comparisons, particularly in cases where binary data or binary feature vectors are involved. Here's why the Tanimoto coefficient is considered efficient: Simplicity: The Tanimoto coefficient is straightforward to compute. It measures the similarity between two sets by comparing the size of their intersection to the size of their union. This simplicity makes it computationally efficient, especially when dealing with large datasets.
P (partition coefficient), and many others that describe the physical and chemical characteristics of a molecule. Quantum chemical descriptors: these are derived from quantum mechanical calculations and provide detailed electronic and structural information about a molecule. Pharmacophore Fingerprints: these describe the molecular features necessary for a compound to interact with a biological target, such as protein–ligand interaction patterns. Molecular fingerprints are used in a wide range of applications, including drug discovery (virtual screening and QSAR modeling), similarity searching, clustering, and cheminformatics. In hybrid screening we have used Morgan fingerprint and 2D pharmacophore fingerprint are used because they are useful for similarity searching and molecular clustering due to their ability to capture local structural information.Morgan fingerprints, 60 often referred to as Morgan circular fingerprints or simply Morgan fingerprints, are a type of molecular fingerprint used in cheminformatics and computational chemistry. These fingerprints are a variation of the Extended Connectivity Fingerprints (ECFP) and are designed to represent the structural features and connectivity patterns of molecules in a binary format as shown in Fig. 2.
Pharmacophore fingerprints 61–63 are a type of molecular fingerprint used in cheminformatics and drug discovery. They are designed to capture the essential features of a molecule that are critical for its interaction with a biological target, such as a protein or enzyme as shown in Fig. 3. Pharmacophore fingerprints are widely used for ligand-based virtual screening, similarity searching, and pharmacophore modeling.
Here are some key points about pharmacophore fingerprints.
• Molecular fingerprint is calculated for every molecule or smile in dataset two molecular fingerprint data i.e., Morgan finger print and 2Dpharmacorhore fingerprint is generated.
• Similarity metrics are used to find similarity between query molecule and list of compounds.
• K-Means Clustering algorithm is applied to filtered out required compounds.
• Convolution Neural Network(K-Deep) is applied on filtered lead compounds to predict Binding Energy.
• Finally Molecular dynamics simulation is performed to propose lead compounds.
The main advantage of Hybrid screening is that the time required for screening the compounds is drastically reduced because traditional virtual screening takes 100 times more than hybrid method to screen compounds and it will become standard method to identify the novel inhibitor for any disease.
Molecular fingerprint data refers to a representation of a molecule's structure and properties in a format that can be used for various computational and analytical purposes. These fingerprints are essential in the fields of chemistry, bioinformatics, and drug discovery, among others. There are two primary types of molecular fingerprints: the Morgan fingerprint and the Pharmacophore fingerprint.
Now K-Means Clustering algorithm is applied for Tanimoto values of morgan and pharmacophore to filter out top most smiles. The elbow method is applied on morgan finger Tanimoto values, from which can we conclude the number of clusters. The following Fig. 6c represents the elbow method: x axis represents number of clusters and y axis represents WCSS. From the graph we can observe that there is a constant flow from cluster 18 to 22, hence we can consider the number of clusters as 20. Now K-Means algorithms are applied on morgan Tanimoto values for K value as 20. After applying K-Means clustering on Morgan Fingerprint Tanimoto values the following clusters (shown in Fig. 6d) are generated. The following Fig. 7a shows the silhouette score of above clustering as 0.57 which is considered as efficient clustering. The cluster number, count, minimum and maximum values are shown in Table 1 as well as in Fig. 7b. By observing the following table and graph the sixth cluster i.e., cluster no:5 has the highest range of Tanimoto values containing values between 0.29 to 0.38 with entries 2197 which are useful for further screening process.
| Cluster | Count | Min | Max |
|---|---|---|---|
| 0 | 84 622 |
0.12 | 0.13 |
| 1 | 73 855 |
0.18 | 0.19 |
| 2 | 26 780 |
0.22 | 0.23 |
| 3 | 39 151 |
0.08 | 0.1 |
| 4 | 73 587 |
0.15 | 0.16 |
| 5 | 2197 | 0.29 | 0.38 |
| 6 | 34 411 |
0.21 | 0.22 |
| 7 | 59 637 |
0.1 | 0.11 |
| 8 | 83 599 |
0.13 | 0.14 |
| 9 | 16 624 |
0.23 | 0.25 |
| 10 | 87 920 |
0.17 | 0.18 |
| 11 | 21 203 |
0.06 | 0.08 |
| 12 | 91 496 |
0.14 | 0.14 |
| 13 | 59 584 |
0.19 | 0.2 |
| 14 | 76 334 |
0.11 | 0.12 |
| 15 | 7938 | 0.26 | 0.28 |
| 16 | 85 612 |
0.14 | 0.15 |
| 17 | 76 067 |
0.16 | 0.17 |
| 18 | 42 065 |
0.2 | 0.21 |
| 19 | 5889 | 0 | 0.06 |
Similarly K-Means applied for pharmacophore fingerprint data before that number of clusters need to be identified based on the Elbow method. The following Fig. 7c represents the elbow method: x axis represents number of clusters and y axis represents WCSS. From the graph it is observed that there is a constant flow from the number of clusters from 15 to 19, hence we can consider the number of clusters as 17. Now K-Means algorithms are applied on pharmacophore Tanimoto values for K value as 17. The following graph shows the clustering of pharmacophore Tanimoto values (Fig. 7d).
When referring to the quality of clusters in unsupervised clustering methods like K-Means, the term “silhouette” in data analysis and clustering typically refers to the silhouette coefficient or silhouette score. When compared to the closest neighboring cluster, the silhouette score indicates how similar each data point in a cluster is to its neighbors in that cluster. From −1 to 1, better results are indicated by higher values. The following Fig. 7e shows the silhouette score of above clustering as 0.57 which is considered as efficient clustering.
K-Means clustering algorithm generated twenty clusters for given data, the cluster number, count, minimum and maximum values are shown in the Table 2 as well as in graph Fig. 7f. By observing the following table and graph the sixth cluster i.e., cluster no: 6 has the highest range of Tanimoto values containing values between 0.27 to 0.40 with entries 1958 which are useful for further screening process.
| Cluster | Count | Min | Max |
|---|---|---|---|
| 0 | 93 397 |
0.03 | 0.04 |
| 1 | 73 569 |
0.1 | 0.11 |
| 2 | 43 931 |
0.14 | 0.16 |
| 3 | 89 486 |
0.07 | 0.09 |
| 4 | 18 249 |
0.2 | 0.23 |
| 5 | 86 610 |
0 | 0.01 |
| 6 | 1958 | 0.27 | 0.4 |
| 7 | 93 278 |
0.05 | 0.06 |
| 8 | 53 014 |
0.13 | 0.14 |
| 9 | 35 441 |
0.16 | 0.18 |
| 10 | 82 135 |
0.09 | 0.1 |
| 11 | 92 748 |
0.01 | 0.03 |
| 12 | 9554 | 0.23 | 0.27 |
| 13 | 93 130 |
0.04 | 0.05 |
| 14 | 92 014 |
0.06 | 0.07 |
| 15 | 26 909 |
0.18 | 0.2 |
| 16 | 63 148 |
0.11 | 0.13 |
With the help of k-means clustering algorithms two clusters are identified, one from Morgan Fingerprint i.e. cluster number 06 that contains 2197 smiles and other one is from pharmacophore fingerprint i.e., cluster number 07 that contains 1958 smiles as shown in Table 3.
| S.No | Name of the finger print | Range | Number of smiles | Cluster number |
|---|---|---|---|---|
| 1 | Morgan fingerprint | 0.29 to 0.38 | 2197 | 06 |
| 2 | Pharmacophore fingerprint | 0.27 to 0.40 | 1958 | 07 |
The graph, presented in Fig. 8a, provides a comparative analysis of Tanimoto similarity values between two distinct clusters: Morgan and Pharmacophore. The horizontal axis represents the index, sequentially identifying data points, while the vertical axis quantifies the Tanimoto similarity values. This visualization offers insights into the degree of similarity or dissimilarity between compounds within these clusters, aiding in the assessment of their structural relationships.
![]() | ||
| Fig. 8 (a) Comparative Tanimoto similarity analysis of Morgan and Pharmacophore clusters. (b) Comparative common smiles Tanimoto similarity analysis of Morgan and Pharmacophore. | ||
In this comparative analysis, we have examined two distinct clusters of chemical compounds. The first cluster comprises 2197 entries, each characterized by their SMILES representations and associated Tanimoto similarity values. The second cluster consists of 1958 entries, exclusively associated with the Pharmacophore fingerprint category. Importantly, a subset of common SMILES entries was identified from both the Morgan and Pharmacophore fingerprint datasets which is shown in Table 4 and Fig. 8b. The motivation behind this selection was to pinpoint the most relevant compounds, particularly those with potential for high binding affinity or biological activity. By focusing on these shared SMILES entries, this analysis seeks to highlight and prioritize compounds of significant interest for further exploration and investigation, potentially accelerating the process of identifying promising candidates in chemical and drug discovery research.
| S. no | Finger print | Range (min & max) | Number of common smiles |
|---|---|---|---|
| 1 | Morgan fingerprint | 0.29 to 0.38 | 220 |
| 2 | Pharmacophore fingerprint | 0.27 to 0.40 |
Traditional virtual screening methods often require a significant amount of time to sift through large datasets, typically reducing compounds from hundreds of thousands to just thousands. However, the integration of similarity metrics and machine learning has revolutionized this process, significantly reducing the time required to filter compounds to a minimum.
The comprehensive list of compounds resulting from the virtual screening process can be found in the supplementary material. Among the 220 compounds evaluated, we have identified the top 30 compounds, which, along with the control compound avapritinib, displayed a binding affinity of −9.8 Kcal mol−1 or higher during molecular docking studies. These top-ranking compounds have been selected for further in-depth evaluation.
According to the docking results, Avapritnib was found to be docked at the binding site with a binding energy of −10.69 kcal mol−1 and a predicted IC50 value of 14.49 nM (nanomolar). Whereas, ZINC000002931631 successfully bound to the PDGFRA binding site, occupying the available space with a binding energy of −10.58 kcal mol−1 and a predicted IC50 value of 17.60 nM (nanomolar), suggesting that our proposed compound ZINC000002931631 is very close to the inhibitory potential of Avapritinib.
In Fig. 9, it is evident that Avapritinib formed direct hydrogen bonds with LEU615; TYR676; SER917 and ASP973; while engaging in pi-stacking interactions with residues GLU675; ASP973; LEU615; LEU661; ARG585; TRP586, TYR676 and LYS623. Furthermore, van der Waals interactions involving important residues SER616 and ASP973 were observed. Whereas, ZINC000002931631 formed direct hydrogen bonds with ASP973; while engaging in pi-stacking interactions with residues GLY652; PRO653; SER972; LEU615; LEU660; LEU661; ARG585 and TRP586. Furthermore, van der Waals interactions involving important residues SER616; ARG585; THR649; MET648; ASN659; PHE969 and ASP973 were observed.
The protein's backbone RMSD fluctuated between 1.5 and 3.0 Å, with average values of 2.7, 2.5 and 2.2 Å for PDGFRA in its apo state, PDGFRA in complex with Avapritinib, and PDGFRA in complex with ZINC000002931631 at the binding site, respectively. Significant conformational changes in PDGFRA were observed during the initial and final 20 nanoseconds, particularly in the presence of the ZINC000002931631 compound complex at the binding site and in the apo state. These observations suggest that the ZINC000002931631 molecule induced conformational alterations in PDGFRA. Notably, at approximately 20 nanoseconds, a sudden increase in RMSD from 1.5 to 2.5 Å in the PDGFRA backbone was observed when complexed with the ZINC000002931631 molecule. Subsequently, the RMSD stabilized at around 2.5 Å with an average of 2.2 Å. It is interesting to note that the rise in PDGFRA RMSD, reaching up to 3.5 Å during the apo state simulation, was significantly attenuated when PDGFRA was complexed with Avapritinib and ZINC000002931631.
The protein's radius of gyration (ROG) exhibited fluctuations within the range of 20.1 to 20.9 Å. The average ROG values for PDGFRA in its apo state, PDGFRA in complex with Avapritinib, and PDGFRA in complex with ZINC000002931631 at the binding site were measured as 20.6 Å, 20.5 Å, and 20.4 Å, respectively. Notably, significant conformational changes in PDGFRA were observed during the initial and final 20 nanoseconds, particularly in the apo state. However, PDGFRA in combination with Avapritinib and ZINC000002931631 demonstrated relative stabilization, showing no significant fluctuations. These findings suggest that the flexibility of PDGFRA to undergo conformational changes is attenuated upon substrate molecule binding and stabilization within its binding pocket.
During the course of the simulation, notable variations in the total number of intramolecular hydrogen bonds were observed. These hydrogen bonds play a pivotal role in governing the rigidity of the protein, thereby influencing its ability to undergo conformational changes, which is crucial for the protein's activity. Specifically, a higher number of intramolecular hydrogen bonds generally leads to increased rigidity in the protein structure, limiting its flexibility to adopt different conformations. This flexibility is known to be of critical importance for the proper functioning of the protein in its specific biological context. In our simulations, PDGFRA in its apo state maintained an average of approximately 100 intramolecular hydrogen bonds throughout the simulated timescale. In contrast, when PDGFRA was complexed with Avapritinib and ZINC000002931631, the average number of intramolecular hydrogen bonds increased significantly, with Avapritinib maintaining an average of 170 hydrogen bonds and ZINC000002931631 maintaining an average of 180 hydrogen bonds. These findings suggest that the presence of Avapritinib and ZINC000002931631 in complex with PDGFRA induces an increase in intramolecular hydrogen bonding, potentially leading to a higher level of rigidity in the protein's structure, thus limiting its functionality. Such stabilization of the protein's conformation may have functional implications, as it could modulate the activity of PDGFRA in response to ligand binding.
In order to further validate the inhibitory potential of Avapritinib and ZINC000002931631 compounds targeting PDGFRA, we have investigated with particular focus on energy as a key parameter throughout the simulated timescale. Our analysis revealed that the average energy of the PDGFRA in its apo state during the simulation was approximately −4300 kcal mol−1. However, in the presence of Avapritinib, the PDGFRA exhibited a different energy profile, with the average energy maintaining approximately −5500 kcal mol−1. On the other hand, PDGFRA was found strongly inhibited by ZINC000002931631 as evident from its' much lowered energy with an approximate average of −5800 (Fig. 11a). Moreover, the binding energy profile of these compounds during the simulated timescale reveal that Avapritinib maintained an average of −35 Kcal mol−1 of binding energy, whereas ZINC000002931631 exhibited −55 Kcal mol−1 of binding energy (Fig. 11b).
These findings indicate that our proposed lead compound ZINC000002931631 has a notable strong inhibitory potential on PDGFRA and is very well comparable with FDA approved PDGFRA inhibitor Avapritinib in the least case scenario. The lower average energy observed in the presence of Avapritinib and ZINC000002931631 suggests that the binding of these compounds stabilize the PDGFRA conformation or affect its interactions with the surrounding environment. The altered energy landscape resulting from these compound bindings could be associated with potential functional implications, possibly influencing the functional activity or substrate binding capability of the PDGFRA. Further investigations are warranted to elucidate the precise mechanisms underlying the observed energy changes and their relevance to the enzymatic function of the PDGFRA. The analyses of RMSD, ROG, and intramolecular hydrogen bonds provided compelling evidence that specific conformational changes in the PDGFRA protein underlie the inhibitory potential of the Avapritinib and ZINC000002931631 compounds. However, the precise regions of the protein responsible for these inhibitory conformational changes remain uncertain. To address this question, we conducted a thorough investigation of the secondary structural elements (SSE) of PDGFRA throughout the simulated timescale.
Fig. 12 illustrates the changes in a few alpha-helices and beta-sheets, particularly in regions adjacent to the initial 80–100, 150, and 270 residues. The significant changes observed in these SSEs suggest their involvement in the inhibitory action of the compounds. Notably, the overall SSE percentage increased from 40.89% in the apo state of the protein to 41.10% and 41.94% for PDGFRA in complex with Avapritinib and ZINC000002931631, respectively. These alterations in secondary structural elements are believed to be responsible for the observed fluctuations in RMSD, ROG, and intramolecular hydrogen bonds, particularly in the case of PDGFRA complexed with Avapritinib and ZINC000002931631 at the binding site. The conformational changes induced by these compounds seem to affect specific regions of the protein, leading to increased rigidity and inhibitory effects.
By shedding light on the dynamic changes in secondary structural elements of PDGFRA, our findings contribute to a deeper understanding of the molecular interactions and mechanisms underlying the inhibitory action of Avapritinib and ZINC000002931631 compounds on PDGFRA activity. These results may have implications for the design and development of novel targeted therapies against PDGFRA-related disorders. Further studies focusing on the specific interactions within the identified SSEs could provide valuable insights into the precise mode of inhibition and guide the rational design of more effective therapeutic agents.
![]() | ||
| Fig. 13 Molecular interactions observed between PDGFRA with Avapritinib during the simulated timescale. | ||
On the other hand, in the simulation where ZINC000002931631 was complexed with PDGFRA, the ZINC000002931631 compound was found to be tightly bound at the binding site throughout 100 ns of simulated timescale. Hydrophobic and water bridging interactions were observed as a crucial role player than direct hydrogen bonds (Fig. 14), which we believe to have led to conformational changes in the PDGFRA, especially near the binding site, possibly facilitating substrate binding. For example, TRP586 interacted with ZINC000002931631 for about 90% of the simulation time, and other residues, especially PHE969; LEU661; SER972, LEU615, ARG617 and PRO653, were found to interact with ZINC000002931631 for approximately 20% on an average of the simulation time. Among these, PHE969; SER972 and LEU661 residues formed water-mediated interactions with ZINC000002931631 for about 40% on average of the simulation time, respectively. Thus, in accordance with an earlier study that shows serine/threonine kinase has evolved to have large free-energy penalties (4–6 kcal mol−1) to adopt an inactive state relative to the active conformation when compared to tyrosine kinase (PDGFR and KIT). Suggesting challenges associated with designing type-I tyrosine kinase inhibitors in terms of selectivity. 64 Thus, such computational methods can help to design highly selective inhibitors for challenging drug targets.
![]() | ||
| Fig. 14 Molecular interactions observed between PDGFRA with ZINC000002931631 during the simulated timescale. | ||
| This journal is © The Royal Society of Chemistry 2025 |