Open Access Article
This Open Access Article is licensed under a Creative Commons Attribution-Non Commercial 3.0 Unported Licence

Comprehensive report on biochemical, pharmacological, and pharmacokinetic properties of tool compounds relevant to human pathologies

Tikam Chand Dakal a, Joice K. Joseph b, Deepika c, Pawan Kumar Maurya c, Narendra Kumar Sharma d and Senthilkumar Rajagopal *b
aGenome and Computational Biology Lab, Mohanlal Sukhadia University, Udaipur 313001, Rajasthan, India
bDepartment of Biotechnology, School of Applied Sciences, REVA University, Bengaluru, Karnataka, India. E-mail: senthilanal@yahoo.com; Tel: +91-9566860390
cDepartment of Biochemistry, Central University of Haryana, Mahendergarh 123031, Haryana, India
dDepartment of Bioscience and Biotechnology, Banasthali Vidyapith, Tonk, Rajasthan, India

Received 15th December 2024 , Accepted 15th March 2025

First published on 21st March 2025


Abstract

A tool compound is a reagent that is a selective small-molecule modulator of a protein's activity. It enables researchers to investigate the mechanistic and phenotypic aspects of the molecular target through various experimental approaches, such as biochemical analyses, cell-based assays, or animal investigations. The field of life science research stands to gain significant advantages from the development of research tools that are both more accessible and aesthetically engaging, thereby facilitating the process of hypothesis formation. Target identification and efficacy prediction require novel methodologies due to the declining frequency of new medication approvals and the rising expense of drug development. In this review, we emphasize that chemical probe data collection offers researchers a comprehensive compilation of tool chemicals and also discusses the collection of currently available tool chemicals and highlights limitations in our capacity to target specific biochemical processes through pharmacological means selectively.


1. Introduction

The process of finding new drugs is a drawn-out, very complex one that requires significant financial resources and a poor success rate. The efficiency of chemical compound synthesis has recently increased in chemistry, enabling chemical libraries to generate and store vast amounts of diverse data. The number of effectively-identified molecular medications has not increased considerably over time despite tremendous progress being made in identifying key compounds with efficacy therapeutically against targets and pathways through high-throughput screening of diverse molecules.1 Pharmaceutical companies prioritize research based on several factors, including unmet medical needs, potential patient populations that a new product may treat, ways to differentiate the latest product from competitors, price points that a new product would command in various markets (mainly developed nations), the amount of money needed to bring the product to market, and advertising expenses. In recent years, for bringing new products to market, the pharmaceutical industry experiences increase in time and costs. Also factors like complex regulatory processes, high development costs and increased examination from regulatory agencies has led to reduction in approval for new drugs.2

However the pharmaceutical companies face significant challenges. We can address these issues by leveraging natural products and advanced technologies. More than one hundred novel products, mostly anti-infectives and anti-cancer medicines, are in clinical development. Novel compounds quickly produced in bacteria or yeasts are becoming more accessible using molecular biological techniques. The development of molecular biology techniques, along with combinatorial chemistry approaches, has enabled the researchers to create and access a wide range of natural products more effectively. Furthermore, by incorporating these natural ingredients into drug discovery efforts, there is a way to accelerate the identification of novel therapies. Databases containing natural product information are also subjected to data mining and virtual screening techniques. It is envisaged that using natural ingredients more effectively and efficiently can enhance the drug discovery process.3 Research on small molecules has made significant strides in the field of biological sciences. These substances are useful for examining biological processes because of their strength, cell activity, and selectivity. These compounds need to be properly described and known to researchers to be of any utility, thereby offering more cost-effective and targeted solutions to overcome the barriers of traditional drug discovery, which has been hindered by inefficiencies in chemical synthesis and a poor success rate in bringing products to market.4

The pharmaceutical business faces an ever-increasing financial burden for each product brought to market due to regulatory authorities’ growing requirements in production, safety, and efficacy for product development generally.2 The most effective source of leads for drug discovery has been natural ingredients alone. More than one hundred novel products, mostly anti-infectives and anti-cancer medicines, are in clinical development. Novel compounds quickly produced in bacteria or yeasts are becoming more accessible using molecular biological techniques. Combinatorial chemistry approaches are also based on natural product scaffolds to create screening libraries resembling drug-like compounds. Various screening strategies are being developed to make using natural products in drug discovery campaigns easier. Databases containing natural product information are also subjected to data mining and virtual screening techniques. It is envisaged that using natural ingredients more effectively and efficiently can enhance the drug discovery process.3 Research on small molecules has made significant strides in the field of biological sciences. These substances are useful for examining biological processes because of their strength, cell activity, and selectivity. These compounds need to be properly described and known to researchers to be of any utility.4

Drug development did not require knowledge of the molecular identity of the drug target during much of the 20th century, as it was dependent on observation of the pharmacological effects of compounds. On the other hand, the current approach usually begins with identifying putative targets involved in the illness process. The roughly 500 possible therapeutic targets that existed in 1996 have significantly risen with the completion of the Human Genome Project.

Small compounds that resemble drugs have therapeutic uses and make excellent reagents for life science research. One of the main reasons small molecules are wonderful research tools is that they are simple to work with and usually require little optimization when doing experiments. A molecule's usefulness as a research catalyst depends on its bioactivity and selectivity; additionally, it must cause a strong, targeted cell reaction. The induced phenotype will not always be connected to a particular biological target if a molecule is promiscuous or generically reactive, and the conclusions drawn will be erroneous.5 Chemical probes, also known as small molecule tool compounds, are excellent research instruments that exhibit strong, focused, and targeted biological effects. Our knowledge of the molecular target of the rapamycin (mTOR) signaling pathway and the epigenetic regulation of gene expression has been revolutionized by some chemical probes, including JQ-1 and rapamycin, respectively.6

JQ-1, a therapeutic thienotriazolodiazepine-based molecule, shows strong binding affinity and selectively inhibits the BRD4 (a protein that binds to acetylated lysine pocket and is a part of bromodomain and extra terminal – BET family) and is widely used in cancer therapy. JQ-1 exhibits chemotherapeutic activity against NUT midline carcinoma (NMC), myeloid leukemia, lymphoblastic leukemia, multiple myeloma, solid tumors, lung adenocarcinoma, neuroblastoma and medulloblastoma. JQ-1 shows unique property by binding to BRD4 without inhibiting its bromodomain activity; instead, it leads to the downregulation of genes associated with cancer by displacing BRD4 from chromatin.7

Rapamycin, a bacterially derived natural product, has a remarkable history as a legitimate medication with established or proven clinical effects in a range of disease settings, as well as a chemical probe for studies of cell growth control-related pathways. It is a potent antifungal and immunosuppressive agent and studies reveal that rapamycin binds to FK506 binding protein 12, and the formed complex inhibits mTOR which in turn reduces phosphorylation of proteins, cell cycle progression and cytokine production.8

The literature contains a plethora of knowledge about the actions of small molecules and biotherapeutics, and having access to this knowledge can help with many kinds of drug discovery analysis and decision-making. To select compounds that are potentially active against a new target, for instance, one could choose tool compounds for probing targets or pathways of interest, identify potential off-target activities of compounds that could raise safety concerns, explain current side effects, or suggest new uses for old compounds. Another option would be to analyze structure–activity relationships (SAR) for a compound series of interest, evaluate the properties of absorption, distribution, metabolism, excretion, and toxicity (ADMET) in vivo, or create predictive models. Due to the ongoing shift in basic research on disease mechanisms from the commercial to public sectors, access to this information is particularly crucial.9 Phenotypic screening is a popular method for finding targets and leads and linking route regulation to chemistry. Although it has demonstrated its worth in identifying more than half of small-molecule novel molecular entities, it poses a difficulty in terms of target identification after hits are found. Individual small molecules or planned sets of molecules with well-understood molecular orbitals (MO) can be employed as chemical probes to monitor the phenotypic effect of target modulation, hence facilitating phenotypic drug discovery and systematic target validation. A high-quality tool compound can also be extremely important as a positive control molecule for assay development, such as signal-to-noise optimization, or to support preclinical in vivo target validation in a drug discovery project.10

Phenotypic tool compounds should meet the following requirements: (a) they should be potent and selective on target in both cell-free and cell-based assays; (b) they should be exposed at the site of action or cell permeability; (c) they should have proven utility as a probe, i.e., phenotypic relevance via a demonstrated proximal biomarker; and (d) they should be available. Large databases extract the information needed to choose the best tool compound challenging.11 Tool Compounds Relevant to Human Pathologies and mode of action has been illustrated in Table 1.

Table 1 Tool compounds relevant to human pathologies and mode of action
S. no. Description Name Associate diseases Mode of action Ref.
1 Anticancer Antitumoral phortress Breast, ovarian and renal cancer Activates AhR signaling and causes induction of cytochrome P450 activity 48 and 49
2 Immunomodulation Tryptophan-based IDO1 inhibitors Cancer N-Formyl-L-kynurenine by indoleamine-2,3-dioxygenase 1 (IDO1) has tumor-mediated immune suppression 50
3 Antibacterial Inhibitors of ECF transporters Targeting the energy-coupling factor (ECF) transporters 51
Obesity and metabolic disorders Targeting free fatty acid receptors, FFA1-4 in hypothalamus 52


Like any reagent used in experiments, small molecule tool compounds (chemical reagents) must undergo quality control (QC) before usage. Tool compounds are deemed valuable solely if they possess high potency, exhibit established selectivity, and have a well-documented mechanism of action (MOA). The continued examination of the pharmacological properties of tool compounds and the verification of their potential as a therapeutic target are heavily reliant on the suitability and caliber of the existing pharmacological probes or tool molecules. Following a concise overview of the pharmacological functions associated with tool compounds and a discussion on the desirable properties of pharmacological tool compounds, this article will examine and assess the specific compounds that have been utilized or are presently being utilized as tools to investigate the role of tool compounds in diverse in vitro and in vivo contexts. A small-molecule tool compound must satisfy some criteria to be deemed suitable for supporting Target Validation (Fig. 1).


image file: d4pm00331d-f1.tif
Fig. 1 Tools and compounds applications. A small-molecule tool compound satisfies the criteria such as efficacy, selectivity, target engagement. The study exhibits indications of target engagement within cellular systems, such as using Cellular Thermal Shift Assay (CETSA), target validation and defined involvement in biological, cellular and signaling pathways suitable for supporting target validation.

A. Efficacy: A tool compound should possess adequate efficacy to facilitate the empirical examination of the experimental hypothesis. Its potency should be determined by at least two orthogonal methodologies, such as biochemical tests and surface plasmon resonance (SPR).

B. Selectivity: The selectivity for the target of interest can be assessed by conducting screenings against closely related target family members and utilizing wide pharmacology panels, such as those provided by Eurofins, DiscoveRx, and CEREP. It is imperative to conduct off-target screening at a concentration consistent with that employed in the Target Validation trials. Assessing the degree of selectivity poses a significant challenge in classifying chemical probes. There is a continuous discovery of novel targets for well-established medications, and most bioactive compounds probably have a binding solid affinity to several targets. The coordinates represent a specific point on a two-dimensional plane.12 The stipulation of chemical probes needing a selectivity of greater than 30-fold versus alternative targets can be perceived as a potential disadvantage for compounds that have undergone extensive testing across multiple proteins. However, this stipulation effectively eliminates promiscuous chemicals and enhances the reliability of linking a phenotype to the regulation of a specific target. Using more advanced metrics for assessing selectivity will enhance the discernment of valuable chemical research instruments.13

C. Target Engagement: The study exhibits indications of target engagement within cellular systems, such as using Cellular Thermal Shift Assay (CETSA).

D. Target Validation: The compound half-maximal inhibitory concentration (IC50) or half-maximal effective concentration (EC50) is employed in Target Validation tests at a concentration deemed suitable for its at the specific target under investigation. In the experimental design, an inactive control chemical, a structurally related compound that exhibits inactivity at the target, was included. The purity and structural integrity of the compound batch are assessed.

E. Defined Involvement in Biological, Cellular and Signaling Pathways: Demonstrates empirical support for the occurrence of biochemical consequences farther along the biological pathway resulting from adjusting its specific target.

The challenges faced in pharmacology pushes the boundaries of current technologies, demanding for holistic solutions grounded with artificial intelligence (AI), synthetic biology and personalized therapies. By addressing these challenges, the future pharmacology ensures that these innovations are sustainable and are available to diverse populations at an affordable rate. By 2050, gold standards for using tool compounds in pharmacology will be sophisticated, data driven and highly patient-centric across various stages of drug designing, development, drug response and in accelerating novel treatments. These standards will leverage the integration of AI, machine learning (ML), quantum computing and personalized medicine. The use of advanced tools ensures safety, reliability, efficacy and accuracy of new therapies.

2. Challenges in identification, characterization and applications for intended use

Without a doubt, the process of finding new drugs is labor-intensive, costly, time-consuming, and complex. As a result, significant work has been done to expedite and facilitate the underlying procedures. Notably, the development of computational approaches and scientific advancements in the field of protein structure elucidation paved the way for the emergence of virtual screening and rational drug discovery. The latter is an in-silico method to look for compounds that will probably be active on a specific target by searching through huge chemical libraries. Virtual screening can be carried out using known ligands for a target of interest or its structural information (known ligands-based virtual screening). Structure-based virtual screening looks for chemicals with shapes and other characteristics comparable to protein binding sites using modeled or experimental protein 3D structures. Alternatively, ligand-based virtual screening uses the similarity principle, which states that comparable compounds are likely to have similar bioactivities, to find small molecules that resemble the known active ligands.14

The virtual screening process involves predicting possibly bioactive compounds from files containing substantial libraries of small molecules using computational methods. Due to the ongoing development, enhancement, and availability of in silico methodologies, virtual screening is growing in popularity in the drug discovery sector.15 Virtual screening approaches are utilized by both public and private enterprises to reduce laboratory resource consumption, as many of these techniques are user-friendly. Nonetheless, it frequently happens that the methods used in virtual screening workflows are limited to those that the research team is familiar with. Additionally, each methodology has several disadvantages that should be avoided to prevent the production of erroneous results or artefacts, even though the software is frequently simple to use.

The generation of quality tool compounds necessitates substantial resources and expertise. Therefore, it is crucial to acknowledge that many tool compounds documented in the literature need to be verified, meaning they may need more selectivity for the intended target or effectively bind to the claimed target. When assessing the validity or existence of a literature tool compound, it is advisable to adopt a skeptical stance if any of the following characteristics are observed:

(a) There is no evidence of orthogonal confirmation of binding.

(b) The provided information solely consists of results from a proliferation assay.

(c) The presence or absence of SAR (Structure–Activity Relationships) in lipophilicity is observed.

(d) The functionality of pan-assay interference compounds (PAINS) is well-documented and widely recognized in the scientific community.16

Frequent hitter structures refer to the occurrence patterns of some aspects within a given dataset. These structures are characterized by the high frequency at which specific elements appear, indicating. More selectivity data must be supplied, or more information must be available.

3. In silico ADME/T study for drug discovery

It is not a guarantee that a molecule will work as intended in vivo, even if it can bind to the target of interest with specificity and its activity is verified in vitro. Before being transferred to the target tissue and correctly absorbed by the organism, the substance must be prevented from being digested and eliminated. The term ADME (Absorption, Distribution, Metabolism, and Excretion) properties refers to the characteristics of a compound that modulate one of these stages and affect its in vivo activity. These characteristics can be used to assess a compound's drug-likeness or how closely it resembles a real drug and can, therefore, be metabolized by the organism as one.17 A growing number of people understand that a successful medicine is one that appropriately balances potency, efficacy, safety, and good pharmacokinetics. One difficult problem in the drug development process is that, because of the intricacy of the drug-body interaction and the individual reaction to drug perturbation, the knowledge needed to generate medications is never sufficient. ADME/T studies involving absorption, distribution, metabolism, excretion, and toxicity are involved in a much earlier stage of the discovery process to prevent late-stage failures in discovering new chemicals to be used as drugs.18

To assess the ADME/T characteristics of a single molecule in advance, faster, easier, and more affordable in-silico approaches must be developed. Because it offers a simple, high-throughput approach to enhance screening and testing capabilities by concentrating only on promising compounds, the in-silico prediction of ADME/T characteristics presents an appealing substitute for experimental measurements. This helps to reduce the time and cost associated with the drug discovery process.19

These computational approaches can be used to predict and to optimize pharmacokinetics and pharmacodynamics in the drug development process and thereby significantly improve the success rate of preclinical and clinical trials. These studies help to identify potential issues like poor absorption, metabolic instability and interactions. Drug candidates can be optimized by applying Lipinski's Rule of Five for oral bioavailability, tissue distribution, and metabolic stability and to minimize toxicity.20 Lipinski's rule of five is a pioneer physiochemical filter unit relating any drug molecule's physicochemical parameters with its pharmacokinetic properties. Therefore, it is a method to determine the oral bioavailability of the drug. It can be convenient to approve any biological molecule as a drug. The rule states that a molecule is more likely to be an orally active drug if;

1. Not more than five hydrogen bond donors.

2. Not more than 10 hydrogen bond acceptors.

3. Molecular mass is less than 500 Da.

4. log[thin space (1/6-em)]P (octanol–water partition coefficient) of less than 5.

Compounds that violate these rules often have issues with solubility, permeability, or absorption, and thus may not be suitable for oral administration.21–23

4. Evidence-based and quantitative prioritization

Utilizing potent and selective chemical agents possessing clearly defined targets can facilitate the elucidation of biological mechanisms underlying traits observed in phenotypic screens. Nevertheless, identifying specific compounds in large quantities to form screening sets with properties is challenging. To avoid the repetitive utilization of indiscriminate published compounds, employing a methodical methodology for prioritizing probes is imperative. Wang and coauthors conducted a meta-analysis on a comprehensive collection of diverse bioactivity data to develop a quantitative criterion that may be used to rank tool compounds for specific targets systematically. The tool's score (TS) was subsequently evaluated on many drugs by analyzing their activity patterns in a panel of 41 cell-based pathway assays. Their study reveals that high-TS tools exhibit more consistent and specific phenotypic characteristics than compounds with lower TS values. In addition, they emphasized the examination of commonly tested drugs that exhibit non-selective characteristics, as well as the differentiation between polypharmacology within a particular target family and promiscuity across several target families. Therefore, target Similarity (TS) can be utilized to prioritize compounds from diverse databases for phenotypic screening.

5. Database

5.1. ChemBank

A public web-based informatics environment called ChemBank (https://chembank.broad.harvard.edu) was created by the Chemical Biology Program and Platform at the Broad Institute at Harvard and MIT in cooperation. This knowledge environment contains resources for analyzing data and publicly available data obtained from small compounds and small-molecule screens. The meticulous definition of screening experiments in terms of statistical hypothesis testing, the metadata-based organization of screening experiments into projects involving collections of related assays, and the commitment to preserving raw screening data make ChemBank stand out among small-molecule databases.24 Adopting uniform terminology and standards to facilitate the exchange and management of chemical genetic data is one of ChemBank's goals. The expanding community of industry and academic scientists interested in exchanging data will be invited to provide data on the structures and functions of small molecules using predetermined criteria. ChemBank is being developed to investigate the fundamental principles of biological networks and to make it easier to identify the proteins that tiny compounds found in cellular and organismal experiments interact with.25

ChemBank contains raw experimental data from high-throughput biological assays, chemical structures, and names, computed molecular descriptors, biological information vetted by humans about small molecule activities, and a wealth of metadata about screening studies. Although there are numerous additional freely accessible small-molecule and medication databases. Three key features set ChemBank apart: (i) its commitment to preserving raw screening data; (ii) its exact description of screening trials in terms of statistical hypothesis testing; and (iii) its hierarchical metadata-based grouping of similar assays into screening projects.26

5.2. ChEMBL bioactivity database

One of the most important examples of the many publicly accessible databases currently available for chemical structure and bioactivity is the ChEMBL database, which is offered as part of the extensive suite of life-science informatics resources at EMBL-EBI. Crucially, ChEMBL makes it possible for the scientific community to respond to significant scientific queries, many of which concern health.27 ChEMBL is a manually edited database of bioactivity information about small drug-like compounds. The data kept in ChEMBL is publicly accessible and updated regularly.28 These databases include PubChem BioAssay, BindingDB, GuideToPharmacology, and DrugBank. One may argue that the rise in popularity and development of these databases has democratized drug discovery more broadly, as well as the chemical biology and computational medicinal chemistry sciences. No longer are a select few commercial organizations the only ones with access to high-quality data on a large scale for data-driven assessments of polypharmacology, bioisosteric replacements, chemogenomics, medication repurposing, and predictive modelling.29

Aiming to gather information and knowledge about medicinal chemistry from all points of the pharmaceutical research and development process, ChEMBL is a sizable, publicly accessible drug discovery database. Several prestigious publications on medicinal chemistry provide full-text articles with information on small molecules and their biological function. This information is combined with information on authorized medications and clinical development candidates, including their mechanisms of action and therapeutic indications. Additionally, bioactivity data is shared with other databases, such as BindingDB and PubChem BioAssay, so that users can access even more data. The resulting database can be used for many different practical purposes, such as finding chemical tools for a target of interest, evaluating compound selectivity, training machine learning models (for target prediction, for example), helping to develop hypotheses for new drug uses, determining target tractability, and integrating with other drug discovery tools.30

Using small molecule tool compounds has facilitated significant advancements in life science research. These compounds possess high potency, cellular activity, and selectivity, making them well-suited for investigating biological processes. For these molecules to have utility, they must undergo accurate characterization, while researchers must maintain awareness of their properties. The ChEMBL bioactivity database was utilized impartially to extract high-quality tool compounds. Studies have reported 407 best-in-class compounds for 278 protein targets in an annotated data set.31 Furthermore, informatics functionalities were established alongside a web application to facilitate data visualization and automate the production of pharmacological hypotheses. The functionalities mentioned above were employed to make predictions regarding inhibitors of the Chromobox Protein Homologue 5 (CBX5) mediated gene repression pathway, which is currently devoid of suitable inhibitors. The accuracy of the projections was then confirmed using a cell-based test that exhibited a high level of specificity. This assay identified novel chemical modulators that could influence the production of heterochromatin mediated by CBX5. This dataset and its corresponding services will help researchers maximize the efficacy of these valuable substances.

5.3. PubChem BioAssay

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public information resource for preserving small molecule and siRNA reagent chemical structures and biological characteristics. Since 2004, the National Institutes of Health have housed the National Center for Biotechnology Information (NCBI), a National Library of Medicine branch. Bioactivity data from medicinal chemistry research and high-throughput screenings are housed in the PubChem BioAssay database. Furthermore, many dozen high-throughput RNAi screenings against entire genomes are available in the PubChem BioAssay database. Because these data relate to the other NCBI resources, PubChem is a popular public information platform for studying chemical biology and drug discovery.32 The database aimed to give the scientific community an open-access resource to find experimental bioactivity high-throughput screening (HTS) data of chemical substances. The National Institutes of Health (NIH) initially provided small-molecule HTS input. Today, the database collects information from over 700 sources, including governmental bodies, prestigious research institutions, chemical suppliers, and other biochemical databases. These sources account for over 260 million bioactivity data points reported in small-molecule assays and RNA interference reagents screening projects.33

Since its launch, PubChem BioAssay has proven to be a dependable and frequently searched public database that offers easy access and direct downloads of information on every biological assay, including the chemical characteristics and bioactivities of every tested molecule, as well as comprehensive screening protocols, input data, and assay results. The two search options, search and advanced, provide a useful tool for gathering and analyzing data by enabling a methodical and comprehensive examination of the assays submitted to the database based on various criteria, such as assay type, target type, or quantity of highlighted compounds.34,35

PubChem is a highly popular chemistry information portal for biomedical research communities in many disciplines, including cheminformatics, chemical biology, medicinal chemistry, and drug development, with millions of unique users each month. Significantly, PubChem is a source of big data in chemistry and is utilized in numerous machine learning and data science initiatives for drug repurposing, computational toxicology, virtual screening, and other applications. The information contained in PubChem is gathered from hundreds of data sources and arranged into several data collections, such as Pathway, Gene, Protein, Substance, Compound, BioAssay, and Patent. Compound stores the distinct chemical structures retrieved from Substance using chemical structure standardization, while Substance archives the chemical data provided by separate data sources. Test results and descriptions of biological assays are deposited by assay data vendors in BioAssay. Substance ID (SID), Compound ID (CID), and Assay ID (AID) are the record identifiers (IDs) used in substance, compound, and bioassay, respectively.36 BioAssay, Compound, and Substance are the three main public databases comprising PubChem, an open archive. It includes details on various chemical entities, such as siRNA and miRNA, tiny molecules, lipids, carbohydrates, and amino acid and nucleic acid sequences that have undergone chemical modification.37

5.4. BindingDB

BindingDB (https://www.bindingdb.org) is the first publicly available database of measured protein–ligand affinity data launched on the web in 2000. It is intended to facilitate both broad analyzes that capitalize on the comprehensiveness of a sizable and expanding data set and access to focused data sets, such as affinity data linked to a specific medication target.38 Initially, the design of BindingDB, which came from an academic setting, concentrated on small compounds that were said to be active against targets for which 3D structural data was accessible.39 It contains about 20[thin space (1/6-em)]000 measurements, making it one of the largest publicly available datasets of protein–ligand binding affinities, and it keeps growing. Currently, targets whose three-dimensional structures can be correctly modeled or are listed in the Protein Data Bank (PDB) are the focus of data collecting. Because these data lend themselves to structural analysis and can be used to create and validate computer models of binding, they are particularly interesting.40

5.5. DrugBank

A web-enabled database called DrugBank (https://www.drugbank.ca) offers extensive molecular data regarding medications, their mechanisms, interactions, and targets. Since its inception in 2006, DrugBank has seen significant changes in drug research and development requirements and advancements in online standards. DrugBank has been extensively utilized for in silico drug target identification, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction, and general pharmaceutical education since its initial release in 2006. DrugBank can offer comprehensive, current, quantitative, analytical, or molecular-scale information regarding pharmaceuticals, therapeutic targets, and the physiological or biological effects of drug activities because it is a clinically focused drug encyclopedia. DrugBank can offer numerous built-in tools for viewing, sorting, searching, and extracting text, image, sequence, or structure data since it is a chemically oriented drug database.41 DrugBank is an extensive, publicly accessible online database that includes complete drug, drug–target, drug action, and drug interaction data about both FDA-approved and experimental pharmaceuticals undergoing FDA approval. One of the world's most popular reference drug sites is DrugBank, thanks to its comprehensive, excellent, primary-sourced content. Educators, pharmacologists, medicinal chemists, pharmacists, pharmaceutical researchers, and the pharmaceutical industry, everyone uses it frequently.42 As a web-based bioinformatics/cheminformatics resource, DrugBank integrates extensive drug target data with thorough drug data. Its main purpose is to aid in computer-based drug and target discovery. Pharmacists and pharmaceutical researchers also utilize DrugBank as a comprehensive online reference because it electronically catalogs nearly all known medicines and therapeutic targets.43

5.6. ChemSpider

The Royal Society of Chemistry (RSC) is the owner of ChemSpider. The whole database can only be downloaded under licensing, even though searches and extracting small result sets are free. It provides services to enhance submitted data through user corrections, additional annotations, and interaction with user apps. This also applies to ChemSpider SyntheticPages, which covers peer review, semantic enhancement, and reactions with citable URLs. Another distinctive characteristic is direct connections to RSC journal structures.44 ChemSpider is a highly helpful online database of known compounds that can be used to identify these kinds of compounds in samples of natural products, commercial products, environmental products, and forensics. In addition to being a search engine built on top of terabytes of chemical data, ChemSpider is a community for chemists who work together to improve and curate the database by sharing their knowledge, expertise, and data. Therefore, ChemSpider is similar to Wikipedia in promoting community involvement and contributions.45

The above-studied databases comprise most of the records related to chemical structures and constitute a wide range of resources concerning drug discovery and chemical biology. The National Center for Biotechnology Information has a PubChem portal focusing on chemistry. PubChem offers information on chemicals and proteins, genes, pathways, and more using hundreds of data sources worldwide.46 The chemoinformatics database ChemBank is openly accessible. Small molecules, small-molecule screens, and resources for analyzing these data are the data sources. It was created with the Harvard Broad Institute's Chemical Biology Program and Platform. ChEMBL is a database of small compounds that resemble bioactive drugs. This database includes abstracted bioactivities (such as binding constants, pharmacology, and ADMET data), computed characteristics (such as log[thin space (1/6-em)]P, molecular weight, and Lipinski parameters), and 2D structures. PubChem is a free database of tiny compounds and details about their biological activity. The National Center for Biotechnology Information, a division of the National Library of Medicine and the National Institutes of Health (NIH) in the United States, is responsible for maintaining the system. It has a connection to NIH PubMed/Entrez data. The DrugBank database integrates extensive drug target information with detailed drug (i.e., chemical, pharmacological, and pharmaceutical) data.47

The amount of openly available internet databases that support the chemistry community has dramatically increased in recent years. Chemistry data is available online for computer modeling, mining, and system integration to support drug discovery. However, it is necessary to ensure that the data are of high caliber to prevent time wasted on pointless searches, that reliable data support the models, and that inaccurate data does not detract from the enhanced discoverability of online resources.

6. Conclusion

This study's chemical probe data collection offers researchers a comprehensive compilation of tool chemicals. This study also delimits the collection of currently available tool chemicals and highlights limitations in our capacity to target specific biochemical processes through pharmacological means selectively. This dataset, in conjunction with the computational tools provided below, will help researchers maximize the efficacy of these valuable chemical compounds.

Author contributions

T.C.D – review conception, draft manuscript preparation, manuscript reviewing and editing. J.K.J. – draft manuscript preparation. D.K. – draft manuscript preparation. Pawan kumar Maurya – draft manuscript preparation, reviewing. Narendra Kumar Sharma – draft manuscript preparation, reviewing and editing. Senthilkumar Rajagopal – review conception, manuscript reviewing and editing.

Data availability

The data that support the findings in this study are available upon reasonable request.

Conflicts of interest

The authors declare no conflict of interest.

References

  1. P. Li, Y. Fu and Y. Wang, Network based approach to drug discovery: a mini review, Mini-Rev. Med. Chem., 2015, 15(8), 687–695,  DOI:10.2174/1389557515666150219143933.
  2. F. Destro and M. Barolo, A review on the modernization of pharmaceutical development and manufacturing – Trends, perspectives, and the role of mathematical modeling, Int. J. Pharm., 2022, 620, 121715,  DOI:10.1016/j.ijpharm.2022.121715.
  3. K. Dzobo, The Role of Natural Products as Sources of Therapeutic Agents for Innovative Drug Discovery, Comp. Pharmacol., 2022, 408–422,  DOI:10.1016/B978-0-12-820472-6.00041-4.
  4. M. W. Y. Southey and M. Brunavs, Introduction to small molecule drug discovery and preclinical development, Front. Drug Discovery, 2023, 3 DOI:10.3389/fddsv.2023.1314077.
  5. C. H. Arrowsmith, J. E. Audia, C. Austin, J. Baell, J. Bennett, J. Blagg, C. Bountra, P. E. Brennan, P. J. Brown, M. E. Bunnage, C. Buser-Doepner, R. M. Campbell, A. J. Carter, P. Cohen, R. A. Copeland, B. Cravatt, J. L. Dahlin, D. Dhanak, A. M. Edwards, M. Frederiksen, S. V. Frye, N. Gray, C. E. Grimshaw, D. Hepworth, T. Howe, K. V. Huber, J. Jin, S. Knapp, J. D. Kotz, R. G. Kruger, D. Lowe, M. M. Mader, B. Marsden, A. Mueller-Fahrnow, S. Müller, R. C. O'Hagan, J. P. Overington, D. R. Owen, S. H. Rosenberg, B. Roth, R. Ross, M. Schapira, S. L. Schreiber, B. Shoichet, M. Sundström, G. Superti-Furga, J. Taunton, L. Toledo-Sherman, C. Walpole, M. A. Walters, T. M. Willson, P. Workman, R. N. Young and W. J. Zuercher, The promise and peril of chemical probes, Nat. Chem. Biol., 2015,(8), 536–541,  DOI:10.1038/nchembio.1867.
  6. M. J. Kling, C. N. Griggs, E. M. McIntyre, G. Alexander, S. Ray, K. B. Challagundla, S. S. Joshi, D. W. Coulter and N. K. Chaturvedi, Synergistic efficacy of inhibiting MYCN and mTOR signaling against neuroblastoma, BMC Cancer, 2021, 21(1), 1061,  DOI:10.1186/s12885-021-08782-9.
  7. S. De, R. Sahu, S. Palei and L. N. Nanda, Synthesis, SAR, and application of JQ1 analogs as PROTACs for cancer therapy, Bioorg. Med. Chem., 2024, 112, 117875,  DOI:10.1016/j.bmc.2024.117875.
  8. R. T. Abraham, J. J. Gibbons and E. I. Graziani, Chapter 17 – Chemistry and Pharmacology of Rapamycin and Its Derivatives, Enzymes, 2010, 27, 329–366,  DOI:10.1016/S1874-6047(10)27017-8.
  9. A. Gaulton, L. J. Bellis, A. P. Bento, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, D. Michalovich, B. Al-Lazikani and J. P. Overington, ChEMBL: a large-scale bioactivity database for drug discovery, Nucleic Acids Res., 2012, D1100–D1107,  DOI:10.1093/nar/gkr777.
  10. D. C. Swinney, Phenotypic vs. target-based drug discovery for first-in-class medicines, Clin. Pharmacol. Ther., 2013,(4), 299–301,  DOI:10.1038/clpt.2012.236.
  11. Y. Wang and J. L. Jenkins, Quantitative Prioritization of Tool Compounds for Phenotypic Screening, Methods Mol. Biol., 2018, 1787, 195–206,  DOI:10.1007/978-1-4939-7847-2_15.
  12. R. S. Jha, A. K. Shukla, A. Kumari and A. Kumar, Virtual screening of potential orally active anti-bacterial compounds of finger millet, Int. J. Plants Res., 2024 DOI:10.1007/s42535-024-01051-7.
  13. F. Miljković and J. Bajorath, Data-Driven Exploration of Selectivity and Off-Target Activities of Designated Chemical Probes, Molecules, 2018, 23(10), 2434,  DOI:10.3390/molecules23102434.
  14. M. E. Bragina, A. Daina, M. A. S. Perez, O. Michielin and V. Zoete, The SwissSimilarity 2021 Web Tool: Novel Chemical Libraries and Additional Methods for an Enhanced Ligand-Based Virtual Screening Experience, Int. J. Mol. Sci., 2022, 23(2), 811,  DOI:10.3390/ijms23020811.
  15. A. Mokashi and N. M. Bhatia, Integrated Network Ethnopharmacology, Molecular Docking, and ADMET Analysis Strategy for Exploring the Anti-Breast Cancer Activity of Ayurvedic Botanicals Targeting the Progesterone Receptor, BIO Integr., 2024, 5(1) DOI:10.15212/bioi-2024-0066.
  16. J. Sun, H. Zhong, K. Wang, N. Li and L. Chen, Gains from no real PAINS: Where ‘Fair Trial Strategy’ stands in the development of multi-target ligands, Acta Pharm. Sin. B, 2021,(11), 3417–3432,  DOI:10.1016/j.apsb.2021.02.023.
  17. T. Scior, A. Bender, G. Tresadern, J. L. Medina-Franco, K. Martínez-Mayorga, T. Langer, K. Cuanalo-Contreras and D. K. Agrafiotis, Recognizing pitfalls in virtual screening: a critical review, J. Chem. Inf. Model., 2012, 52(4), 867–881,  DOI:10.1021/ci200528d.
  18. Y. Wang, J. Xing, Y. Xu, N. Zhou, J. Peng, Z. Xiong, X. Liu, X. Luo, C. Luo, K. Chen, M. Zheng and H. Jiang, In silico ADME/T modeling for rational drug design, Q. Rev. Biophys., 2015, 48(4), 488–515,  DOI:10.1017/S0033583515000190.
  19. W. Zhou, Y. Wang, A. Lu and G. Zhang, Systems Pharmacology in Small Molecular Drug Discovery, Int. J. Mol. Sci., 2016, 17(2), 246,  DOI:10.3390/ijms17020246.
  20. S. S. Khuzwayo, M. Selepe, D. Meyer and N. H. Gama, The synthesis and investigation of novel 3 benzoylbenzofurans and pyrazole derivatives for anti-HIV activity, RSC Med. Chem., 2025, 1–36,  10.1039/D4MD00844H.
  21. C. A. Lipinski, F. Lombardo, B. W. Dominy and P. J. Feeney, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Delivery Rev., 2001, 46(1–3), 3–26,  DOI:10.1016/s0169-409x(00)00129-0.
  22. C. A. Lipinski, Lead- and drug-like compounds: the rule-of-five revolution, Drug Discovery Today: Technol., 2004, 1(4), 337–341,  DOI:10.1016/j.ddtec.2004.11.007.
  23. S. Dewanjee, P. Paul, T. K. Dua, S. Bhowmick and A. Saha, Big Leaf Mahogany Seeds: Swietenia macrophylla Seeds Offer Possible Phytotherapeutic Intervention Against Diabetic Pathophysiology, Nuts and Seeds in Health and Dis. Pre, 2020, ch. 38, pp. 543–565.  DOI:10.1016/B978-0-12-818553-7.00038-3.
  24. K. P. Seiler, G. A. George, M. P. Happ, N. E. Bodycombe, H. A. Carrinski, S. Norton, S. Brudz, J. P. Sullivan, J. Muhlich, M. Serrano, P. Ferraiolo, N. J. Tolliday, S. L. Schreiber and P. A. Clemons, ChemBank: a small-molecule screening and cheminformatics resource database, Nucleic Acids Res., 2008, 36, D351–D359,  DOI:10.1093/nar/gkm843.
  25. R. L. Strausberg and S. L. Schreiber, From knowing to controlling: a path from genomics to drugs using small molecule probes, Science, 2003, 300(5617), 294–295,  DOI:10.1126/science.1083395.
  26. N. Tolliday, P. A. Clemons, P. Ferraiolo, A. N. Koehler, T. A. Lewis, X. Li, S. L. Schreiber, D. S. Gerhard, S. Eliasof and N. Tolliday, Small molecules, big players: the National Cancer Institute's Initiative for Chemical Genetics, Cancer Res., 2006, 66(18), 8935–8942,  DOI:10.1158/0008-5472.CAN-06-2552.
  27. B. Zdrazil, E. Felix, F. Hunter, E. J. Manners, J. Blackshaw, S. Corbett, M. de Veij, H. Ioannidis, D. M. Lopez, J. F. Mosquera, M. P. Magarinos, N. Bosc, R. Arcila, T. Kizilören, A. Gaulton, A. P. Bento, M. F. Adasme, P. Monecke, G. A. Landrum and A. R. Leach, The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods, Nucleic Acids Res., 2024, 52(D1), D1180–D1192,  DOI:10.1093/nar/gkad1004.
  28. M. M. Nowotka, A. Gaulton, D. Mendez, A. P. Bento, A. Hersey and A. Leach, Using ChEMBL web services for building applications and data processing workflows relevant to drug discovery, Expert Opin. Drug Discovery, 2017, 12(8), 757–767,  DOI:10.1080/17460441.2017.1339032.
  29. G. Papadatos, A. Gaulton, A. Hersey and J. P. Overington, Activity, assay and target data curation and quality in the ChEMBL database, J. Comput. Aided Mol. Des., 2015, 29(9), 885–896,  DOI:10.1007/s10822-015-9860-5.
  30. D. Mendez, A. Gaulton, A. P. Bento, J. Chambers, M. De Veij, E. Félix, M. P. Magariños, J. F. Mosquera, P. Mutowo, M. Nowotka, M. Gordillo-Marañón, F. Hunter, L. Junco, G. Mugumbate, M. Rodriguez-Lopez, F. Atkinson, N. Bosc, C. J. Radoux, A. Segura-Cabrera, A. Hersey and A. R. Leach, ChEMBL: towards direct deposition of bioassay data, Nucleic Acids Res., 2019, 47(D1), D930–D940,  DOI:10.1093/nar/gky1075.
  31. K. V. Butler, I. A. MacDonald, N. A. Hathaway and J. Jin, Report and Application of a Tool Compound Data Set, J. Chem. Inf. Model., 2017, 57(11), 2699–2706,  DOI:10.1021/acs.jcim.7b00343.
  32. Y. Wang, J. Xiao, T. O. Suzek, J. Zhang, J. Wang, Z. Zhou, L. Han, K. Karapetyan, S. Dracheva, B. A. Shoemaker, E. Bolton, A. Gindulyte and S. H. Bryant, PubChem's BioAssay Database, Nucleic Acids Res., 2012, 40, D400–D412,  DOI:10.1093/nar/gkr1132.
  33. V. K. Tran-Nguyen and D. Rognan, Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement, Int. J. Mol. Sci., 2020, 21(12), 4380,  DOI:10.3390/ijms21124380.
  34. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne, The Protein Data Bank, Nucleic Acids Res., 2000, 28(1), 235–242,  DOI:10.1093/nar/28.1.235.
  35. V. D. Hahnke, S. Kim and E. E. Bolton, PubChem chemical structure standardization, J. Cheminf., 2018, 10(1), 36,  DOI:10.1186/s13321-018-0293-8.
  36. S. Kim, Exploring Chemical Information in PubChem, Curr. Protoc., 2021, 1(8), 217,  DOI:10.1002/cpz1.217.
  37. S. Kim, P. A. Thiessen, T. Cheng, B. Yu, B. A. Shoemaker, J. Wang, E. E. Bolton, Y. Wang and S. H. Bryant, Literature information in PubChem: associations between PubChem records and scientific articles, J. Cheminf., 2016, 8, 32,  DOI:10.1186/s13321-016-0142-6.
  38. M. K. Gilson, T. Liu, M. Baitaluk, G. Nicola, L. Hwang and J. Chong, BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology, Nucleic Acids Res., 2016, 44(D1), D1045–D1053,  DOI:10.1093/nar/gkv1072.
  39. A. M. Wassermann and J. Bajorath, BindingDB and ChEMBL: online compound databases for drug discovery, Expert Opin. Drug Discovery, 2011, 6(7), 683–687,  DOI:10.1517/17460441.2011.579100.
  40. T. Liu, Y. Lin, X. Wen, R. N. Jorissen and M. K. Gilson, BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities, Nucleic Acids Res., 2007, 35, D198–D201,  DOI:10.1093/nar/gkl999.
  41. D. S. Wishart, C. Knox, A. C. Guo, D. Cheng, S. Shrivastava, D. Tzur, B. Gautam and M. Hassanali, DrugBank: a knowledgebase for drugs, drug actions and drug targets, Nucleic Acids Res., 2008, 36, D901–D906,  DOI:10.1093/nar/gkm958.
  42. D. S. Wishart and A. Wu, Using DrugBank for In Silico Drug Exploration and Discovery, Curr. Protoc. Bioinf., 2016, 54, 14,  DOI:10.1002/cpbi.108.
  43. D. S. Wishart, In silico drug exploration and discovery using DrugBank, Curr Protoc Bioinformatics, 2007, ch. 14, Unit 14 4.  DOI:10.1002/0471250953.bi1404s18.
  44. C. Southan, Caveat Usor: Assessing Differences between Major Chemistry Databases, ChemMedChem, 2018, 13(6), 470–481,  DOI:10.1002/cmdc.201700724.
  45. J. L. Little, A. J. Williams, A. Pshenichnov and V. Tkachenko, Identification of “known unknowns” utilizing accurate mass data and ChemSpider, J. Am. Soc. Mass Spectrom., 2012, 23(1), 179–185,  DOI:10.1007/s13361-011-0265-y.
  46. T. Cheng, T. Ono, M. Shiota, I. Yamada, K. F. Aoki-Kinoshita and E. E. Bolton, Bridging glycoinformatics and cheminformatics: integration efforts between GlyCosmos and PubChem, Glycobiology, 2023, 33(6), 454–463,  DOI:10.1093/glycob/cwad028.
  47. K. Azzaoui, E. Jacoby, S. Senger, E. C. Rodríguez, M. Loza, B. Zdrazil, M. Pinto, A. J. Williams, V. de la Torre, J. Mestres, M. Pastor, O. Taboureau, M. Rarey, C. Chichester, S. Pettifer, N. Blomberg, L. Harland, B. Williams-Jones and G. F. Ecker, Scientific competency questions as the basis for semantically enriched open pharmacological space development, Drug Discovery Today, 2013, 18(17–18), 843–852,  DOI:10.1016/j.drudis.2013.05.008.
  48. H. Ohashi, K. Nishioka, S. Nakajima, S. Kim, R. Suzuki, H. Aizaki, M. Fukasawa, S. Kamisuki, F. Sugawara, N. Ohtani, M. Muramatsu, T. Wakita and K. Watashi, The aryl hydrocarbon receptor-cytochrome P450 1A1 pathway controls lipid accumulation and enhances the permissiveness for hepatitis C virus assembly, J. Biol. Chem., 2018, 293(51), 19559–19571,  DOI:10.1074/jbc.RA118.005033.
  49. B. Itkin, A. Breen, L. Turyanska, E. O. Sandes, T. D. Bradshaw and A. I. Loaiza-Perez, New Treatments in Renal Cancer: The AhR Ligands, Int. J. Mol. Sci., 2020, 21(10), 3551,  DOI:10.3390/ijms21103551.
  50. K. Tang, Y. H. Wu, Y. Song and B. Yu, Indoleamine 2,3-dioxygenase 1 (IDO1) inhibitors in clinical trials for cancer immunotherapy, J. Hematol. Oncol., 2021, 14(1), 68,  DOI:10.1186/s13045-021-01080-8.
  51. A. Shams, S. Bousis, E. Diamanti, W. A. M. Elgaher, L. Zeimetz, J. Haupenthal, D. J. Slotboom and A. K. H. Hirsch, Expression and characterization of pantothenate energy-coupling factor transporters as an anti-infective drug target, Protein Sci., 2024, 33(11), e5195,  DOI:10.1002/pro.5195.
  52. N. R. V. Dragano, E. Milbank, R. Haddad-Tóvolli, P. Garrido-Gil, E. Nóvoa, M. F. Fondevilla, V. Capelli, A. M. Zanesco, C. Solon, J. Morari, L. Pires, Á. Estevez-Salguero, D. Beiroa, I. González-García, O. Barca-Mayo, C. Diéguez, R. Nogueiras, J. L. Labandeira-García, E. Rexen Ulven, T. Ulven, M. Claret, L. A. Velloso and M. López, Hypothalamic free fatty acid receptor-1 regulates whole-body energy balance, Mol. Metab., 2024, 79, 101840,  DOI:10.1016/j.molmet.2023.101840.

This journal is © The Royal Society of Chemistry 2025
Click here to see how this site uses Cookies. View our privacy policy here.