Sara
Masarone†
a,
Katie V.
Beckwith†
b,
Matthew R.
Wilkinson†
a,
Shreshth
Tuli†
a,
Amy
Lane
a,
Sam
Windsor†
a,
Jordan
Lane†
*a and
Layla
Hosseini-Gerami†
*a
aIgnota Labs Ltd, Cambridge, UK. E-mail: jordan.lane@ignotalabs.ai
bYusuf Hamied Department of Chemistry, University of Cambridge, UK
First published on 6th January 2025
Modern drug discovery projects are plagued with high failure rates, many of which have safety as the underlying cause. The drug discovery process involves selecting the right compounds from a pool of possible candidates to satisfy some pre-set requirements. As this process is costly and time consuming, finding toxicities at later stages can result in project failure. In this context, the use of existing data from previous projects can help develop computational models (e.g. QSARs) and algorithms to speed up the identification of compound toxicity. While clinical and in vivo data continues to be fundamental, data originating from organ-on-a-chip models, cell lines and previous studies can accelerate the drug discovery process allowing for faster identification of toxicities and thus saving time and resources.
Safety concerns halt 56% of projects which, after efficacy, makes it the largest contributor to project failure.6 Despite safety being the single most important factor in determining a drug's chances of approval, safety assessment is often neglected until the late stages of the discovery timeline.7 There are significant barriers preventing safety from becoming an early priority. For example, although safety assessments can be carried out using in vitro systems, the cost and time burden associated with these experiments as well as the sheer number of potential toxicity endpoints to screen against, makes conventional large-scale testing impossible. This is especially true for small or medium BioTech companies with limited resources. Instead, strategic decisions must be made, selecting limited numbers of compounds and endpoints for testing. This narrow approach increases the risk of overlooking toxic effects, that will ultimately halt the project further down the development timeline. Furthermore, in vitro tests do not fully capture the interactions a drug makes in living organisms (in vivo).8In vivo models offer better translation to clinical observations compared to in vitro, but translation from pre-clinical species to human findings is still far from perfect.9 In addition, in vivo studies' inherent reliance on animal testing is expensive and raises significant ethical concerns. In a bid to create a solution for large-scale yet clinically relevant toxicity screening, in silico approaches offer a promising solution to address the limitations of wet lab and animal testing. To fully realise their potential, in silico solutions require careful implementation to foster widespread adoption and trust.
Artificial Intelligence (AI) has seen a surge in popularity with data-driven models delivering state-of-the-art performance across tasks previously thought to be only possible with manual involvement. In drug discovery, this unlocks a vast wave of potential across the entire lifecycle of pharmaceuticals.10 By definition, AI learns from prior experience to make informed predictions on a given task. In contrast to traditional wet lab experimentation, where negative data from failed projects is archived and ignored, integrating this data with AI can inform future research. Instead, there is value in the failed project data, as the experience and relationships it uncovered can be carried forward to provide informed decisions on where to target practical efforts in the future. Crucially, to be useful in practice, these models must be robust enough to accurately generalise to novel chemical structures.
This perspective article highlights the recent advancements in predictive toxicology and their potential impact on safety assessments in drug research and development. The article explores the concept of in silico toxicology and the benefits it brings compared to traditional approaches. To present a comprehensive perspective on the field, the utilisation of AI and Machine Learning (ML) is examined specifically focusing on its integration with systems biology, ‘omics’ data, and cell painting techniques for advancing predictive toxicology. In addition, the challenges that limit the applicability of these methods in practice are discussed. This includes limited data availability, representative chemical space coverage, and difficulties in predicting in vivo responses. This article also provides perspectives on how the challenges can be best addressed to advance the field.
The financial burden is also an important driver, with costs escalating due to the need for extensive laboratory testing, large-scale clinical trials, and the deployment of specialised personnel and resources. This results in the cost of failure being substantial; for every successful drug, numerous candidates fail at various stages, leading to sunk costs that must be absorbed by pharmaceutical companies. These failures often occur late in the development process, particularly during clinical trials, where safety and efficacy issues frequently emerge, leading to the termination of projects after significant investment.
Incentives and decisions from governments and policy makers are also driving the adoption of data driven technology. One such example is the FDA's forward-looking initiative – FDA 2.0.12 This encourages the adoption of advanced technologies to streamline drug approval processes. The initiative aims to modernize regulatory frameworks, making them more adaptable to innovative methodologies like AI, ultimately facilitating faster and more efficient drug development cycles. One key focus of this is to not only embrace new technologies, but to eliminate the moral issues surrounding drugs discovery, with a particular focus on animal testing in this case.
A paper recently released by the FDA discussed the use of ML to screen and design compounds to accelerate de novo drug design and to elucidate drug target interactions. The Center for Drug Evaluation and Research (CDER) AI Steering committee was also established to facilitate and coordinate the use of AI in the pharmacology industry. This aims to facilitate the creation of frameworks in collaboration with other partners or companies and ultimately guide the use of ML in this field. The FDA discussion paper also touches upon important aspects to consider when developing ML models, such as data bias, the ethics around the use of AI in the clinic, transparency and explainability.13
From a legislation perspective, another crucial motivator for the use of AI in drug discovery is the Inflation Reduction Act, which imposes cost containment measures on pharmaceutical companies. This legislation enforces a controlled price inflation having a profound impact on the pharmaceutical landscape. Although a welcome relief for patients, control of drug pricing has a knock-on effect for research and development efforts within pharma, an area with $83 billion spend in 2019.14 Pricing controls will impact R&D spending as well as stake holder decisions around the market strategy and intellectual property controls. This is especially impactful on early-stage assets, where the level or risk is much larger for achieving a significant return on investment. With more constrained budgets, the opportunity for both risk and cost reduction from AI methods is an ever more critical lifeline for pharmaceutical development. AI technologies offer a tangible solution by accelerating the drug discovery process, reducing costs, as compounds can be screened using in silico technologies and predictive modelling and making the entire process more efficient.10
Recent advances in in vitro toxicity assessment aim to improve physiological relevance and include the use of spheroid and organ-on-a-chip technologies. Growing cells in 3D environments (rather than as a 2D layer on a plate) allow the cells to develop better intercellular and cell-matrix communication, which strongly influences the physiological attributes of individual cells in vivo.15 For example, a study compared the response of 2D HepG2 (an immortal human hepatocyte cell line) and 3D cultured spheroids to a range of liver toxicants finding that the 3D system was more representative of the in vivo liver response.16
Despite their improved in vivo relevance over 2D cultures, 3D cultures lack the microenvironmental complexity and precise control over physiological conditions that organ-on-a-chip systems offer. These microfluidic devices replicate the structural and functional units of human organs, allowing for accurate simulation of human responses to drugs and chemicals under physiologically relevant conditions.17 This technology also enables real-time analysis of cellular responses which is important as toxicity responses are often time dependent.18
Although ideally in vitro assays aim to represent the underlying biology, there are frequently complexities and dynamics that they cannot capture, despite the advances in spheroids and microfluidics technology. Complexities around the cell line background, species, immortalisation of cancer cell lines and lab-to-lab variability can all affect the quality and reproducibility of these in vitro results.19 Despite the limitations, without the reductionist approach of in vitro assays, understanding the mechanism of action would be impossible to determine. Mitochondrial toxicity is a prominent example of this. A range of clinical toxicity issues are caused by mitochondrial toxicity,20 but without probing the underlying cellular processes, attributing mitochondrial toxicity as the underlying cause is impossible.
In vivo studies are crucial because they provide a more comprehensive and realistic view of how a compound behaves in a complex biological system. Unlike in vitro assays, which are optimised for speed and reproducibility but may lack certain biological complexities, in vivo models offer data that is more predictive of human responses. This makes them an essential step in the drug development process, as they can uncover potential issues that might not be evident in simpler models.21
Despite their advantages, in vivo studies come with significant ethical and practical limitations. There is a strong consensus that animal testing should be minimised, and all efforts should be made to find alternatives – termed the “3Rs” of reduction, replacement and refinement.22 Additionally, while in vivo models provide valuable insights, they often face criticism for not perfectly mimicking human diseases or toxicological responses due to differences in species homology.23 An analysis of pre-clinical and phase I trial data on 108 oncology drugs showed a poor correlation between animal and human outcomes (positive predictive value = 0.65).24 The FDA's mandate for testing in two non-human species underscores the uncertainty regarding the relevance of animal models to human biology. Furthermore, the complexity of whole-organism studies makes it challenging to pinpoint specific mechanisms of toxicity, necessitating more extensive studies with increased animal and compound numbers, thus increasing time and costs. These limitations highlight the need for clinical data, which provide the highest level of relevance and accuracy in assessing human responses to new drugs.
Human clinical trials are indispensable because they offer the most accurate and direct assessment of how a drug will perform in the target population. While in vitro and in vivo models are essential for preliminary testing and risk reduction, they cannot fully replicate the complexity of human biology. Clinical trials provide comprehensive data on human-specific factors such as co-morbidities, interactions with other medications, and individual variations in metabolism, sex, ethnicity, and lifestyle. This level of detail is crucial for determining the real-world safety and efficacy of a new drug.
Despite being the gold standard, the complexity of clinical trials,25 with varied patient-specific factors and population-level differences, makes it challenging to identify root causes of observed effects without the support of in vitro and in vivo experiments. The interplay of these factors underscores the necessity of preceding preclinical studies to support and interpret clinical data accurately.
AI models created from in vitro data can create a platform that facilitates mechanistic understanding via in silico analysis. This approach, despite limited by the lack of extensive data, allows for a rapid, iterative assessment of potential toxicophores across a broad range of specific biological endpoints. By using this broad assessment sweep, areas of interest can be identified for further focus. For compounds with notable effects that are predicted in silico, predictions can be supplemented by in vivo data, allowing for a more comprehensive picture of a drug's toxicity profile. This step-by-step progression ensures judicious use of resources whilst upholding the rigour of analysis that traditional lab-based testing offers.
More recent works have improved the throughput of in vivo systems offering an opportunity to build AI models on data that was previously too limited in size. Literature has also shown an increased capacity to harness data from novel platforms such as organ-on-a-chip and 3D cell culture systems. Leveraging these approaches allows users to merge the richness of in vivo system data with the scalability of in vitro studies.26 By validating the outputs of more controlled models on in vivo translatable biology, more robust, translatable AI models can be developed for efficient screening with high-quality validation.
Owing to the swift insights from in vitro data and the comprehensive perspective from in vivo data, establishing a dynamic in silico feedback system allows researchers to iterate and refine their hypotheses and experiments in near real-time. This will diminish redundancy and enhance the pace of safety evaluation and consequently, the wider drug discovery process.
(1) Read across – uses toxicity data from well-characterised compounds to directly infer the effects of structurally related, untested compounds, based on the principles of chemical similarity.
(2) Structural alerts – identifies specific chemical structures/substructures (called ‘alerts’) that are known to be associated with toxicological outcomes. If a compound contains one of these alerts, it may exhibit the associated toxicity.
(3) Quantitative structure activity relationships (QSAR) – establish a relationship between chemical features and observed biological activity to construct a mathematical model to predict compound effects.
Although it is important to consider that in silico toxicology includes all these methods, this article will focus specifically on the use of AI for predicting toxicity. ML exists as a subsection of AI. In this article specifically, the term ML will refer to any algorithm uses data to learn a specific task. Although the term ML also encompasses Deep Learning (DL), this article will not discuss the use of neural networks in this field – an area of ML that specifically uses neural networks. When AI is discussed, it will encompass both ML and DL methods.
Both ML and DL have seen an explosion of interest across the drug discovery process. Successful applications include molecular property prediction,27 synthesis design,28 protein structure elucidation29 and smart manufacturing of pharmaceutical products.30 Although the applications beyond toxicity are out of scope for this article, it is important to consider how significant adopting AI methods will be for the drug discovery industry as a whole.31 The primary value added when utilising AI models for predicting toxicity comes from unlocking safety evaluation data from the moment the structure of a drug is chosen. Unlike in vitro and in vivo tests, in silico models do not require compound synthesis. While these models require high-quality, curated data for both training and validation, they can still be particularly advantageous in early drug development where sample quantities are at a premium due to synthesis being challenging and expensive. In silico screening from the beginning of the drug discovery timeline also allows for significant cost savings as molecules likely to exhibit toxic behaviour are not progressed through the necessary development stages to reach the point of wet lab pre-clinical toxicity assessments.32 Thus, there is a clear financial advantage to avoiding sunk costs due to failed projects at this stage.
Aside from financial incentives, AI methods offer far superior throughput compared with laboratory experimentations. In vitro assays for a single compound can take several days to collect results from, compared to inference times in the seconds scale for in silico tools. By assessing a greater number of compounds, trends can better be explored which helps inform structural changes to compounds during development. Interpretability methods offered by ML and DL models further assist this as discussed in the Interpretability and trust section.
Despite the advantages, the use of AI for toxicity prediction is still an emerging technology. Clear barriers that limit adoption persist making the use of AI models challenging. Research efforts targeting these areas are critical in enabling the industry can unlock the vast potential on offer. Toxicity is a complex, multivariate problem which makes training accurate AI models highly challenging. Asking a model to predict the overall clinical toxicity of a compound would require a significantly more information-rich input than what is available from just the molecular structure. It is well established that clinical toxicity results from a wide variety of factors. Where model systems are used to make assessments, AI practitioners must also begin with a simplified system inspired by the same wet lab testing done practically. This is important as capturing complete information regarding an in vitro assay requires significantly less information than accurately representing in vivo systems. In fact, in vitro assays are a useful starting point for modelling, as by design they aim to control variables such that only differences in the chemical structure give rise to the observed effect. Factors such as bioavailability, delivery routes and patient level differences are not accounted for and hence do not affect in vitro assays. These assays are routinely used to gather toxicity data during drug discovery, meaning training data is available in quantities suitable for applying AI methods. It must be noted that although common practice in drug discovery settings, there is limited translation of in vitro outcomes to clinical toxicity.23 As such, extending the predictive power of AI models beyond digitalised in vitro twins remains an open research challenge.
Once a model is trained, it can be called upon to make inferences, and its ability to do so must be assessed. Properly assessing the performance of chemical models is difficult and requires careful consideration to ensure misleading indicators of success are avoided. To further the evaluation process, exploring interpretability methods is vital to foster trust and draw useful conclusions about how the model navigates high-dimensional data.
The concept of “chemical space” encompasses all potential chemical structures, with estimates suggesting the number of “drug-like” molecules is in the order of 1060.35 Given its immense size, it's not feasible to gather toxicity data for every compound. As a result, in silico models are typically limited to specific sections of this space. This limitation restricts a model's applicability domain (AD), making predictions for unfamiliar chemicals potentially unreliable. This constraint is of particular concern when models inform critical decisions. Additionally, the presence of activity cliffs, where structurally similar compounds have vastly different toxicity levels and activity profiles (Fig. 3), poses a modelling challenge as models relying on structural similarities can be misled by these nuances.36 In the example in Fig. 3, the addition of a hydroxy group increased the inhibitory activity of a compound by almost three orders of magnitude.37 Expanding the training dataset to include a broader range of chemical structures enhances a model's predictive capabilities. In certain scenarios, data augmentation can also be performed, being careful not to introduce additional bias, especially with very small datasets. Transfer learning, which relies on transferring knowledge from a pre-trained model can also be considered, although pretrained large models can be hard to come across in cheminformatics.38
Incorporating domain-specific knowledge, such as mechanistic/pathway-based information, or higher order data (non-chemical structure-based, e.g., omics or cell painting data39) can refine predictions, especially when navigating the challenges posed by activity cliffs.40
The push towards open access and the Findable, Accessible, Interoperable and Reusable (FAIR) principles aims to address these challenges.42 Open access advocates for free and unrestricted access to research outputs, ensuring that this data can be readily accessed and utilised by researchers, ultimately benefitting all. The FAIR principles further strengthen this approach by ensuring that data is not only accessible, but presented in such a way that it can be easily found, integrated with other datasets, and reused for various research purposes.
However, while these principles are promising, their implementation is not without challenges. Concerns regarding IP, competitive advantage and data misuse are significant barriers to broader data sharing. To harness the benefits of open access and FAIR data, there is a need for collaborative efforts between academia, industry and regulatory bodies. By creating frameworks that protect proprietary interests while promoting data sharing (by having some incentives, for example), the scientific community can work towards more robust and reliable ML models.
Two positive examples of the implementation of FAIR practices come from Roche and AstraZeneca that have implemented these principles to enhance the use and sharing of clinical trial data for scientific insights. Roche focuses on a “learn-by-doing” approach and prospective “FAIRification”, while AstraZeneca uses scientific use cases and iterative data modelling to drive translational medicine research and foster data stewardship. Both initiatives highlight the importance of cultural shifts and structured processes to achieve scalable, reusable data systems.43
Choosing appropriate metrics to evaluate a model is vital. In the field of toxicity, metric choice must reflect the target application and the distribution of samples across the dataset used to train the model. Class imbalance is incredibly common in toxicity prediction tasks. Biologically it is much more likely that a compound will be inactive with respect to a particular target and so datasets combining active and inactive compounds regularly have many more inactive compounds. Although not inherently limiting, users must select evaluation metrics that are not misrepresented when working with imbalanced data. To illustrate this, consider a dataset of 90:
10 inactive to active compounds. The model can be 90% accurate by assigning inactive labels all the time. In this case, the classifier has no skill but accuracy of 90% is a seemingly impressive performance statistic. The same is true for AUROC, which is regularly reported in the literature for model evaluation.45 In addition to these, F1 score can be effective for evaluating imbalanced datasets as it balances precision and recall, but it can obscure class-specific performance and is less informative when class distributions are highly skewed. Matthews Correlation Coefficient (MCC) is also a robust metric for imbalanced data, it provides a single value that considers all confusion matrix components, allowing for a more balanced view of model performance across classes.
In general, all these metrics must be interpreted with proper consideration to the datasets they represent if they are to be meaningful.
In addition to model validation “per se” (as described in this section), new models can also be evaluated against pre-existent models by benchmarking them. This can help gain an understanding of whether a new technique has improved upon commonly used methods and what are its strengths and weaknesses compared to other models.
Initiatives have emerged over recent years to try and tackle the benchmarking challenge. The most well-adopted example of this is the Therapeutic Data Commons (TDC)47 and the Tox21 dataset from.48 TDC offers a variety of cheminformatics benchmarks including toxicity, however the datasets included are limited in size and are not accompanied by relevant scientific context regarding how they were generated. Despite being a promising initiative, Tox21 has been widely criticised for its data and metadata quality, and literature has documented its ineffectiveness for modelling.49,50 It must also be considered that both tools only offer in vitro data. This means that the challenge of comparing performance on in vivo data is one that continues to remain unsolved at the time of writing.
Initiatives have emerged over recent years to try and tackle the benchmarking challenge. A platform, Polaris51 was launched to implement, host and run benchmarks in computational drug discovery. The aims of this platform are to address the performance gap seen between test set metrics and applications to real-life drug discovery projects, and to close the gap between modellers and downstream users. This provides a valuable resource to the community towards the development of toxicity models which are practically useful and relevant in drug discovery.
Without suitable benchmarks, assessing the performance of different modelling techniques is not possible. During the AI lifecycle, performance changes can be attributed to data quality and size as well as ML or DL model design and hyperparameter tuning. To truly assess and compare model performance, the effects of the training data must be minimised by keeping consistent cross-validation splits, labels and the number of data points. Only by doing so can meaningful performance conclusions be drawn.
For ML researchers, the goal is to develop or refine algorithms to achieve the best possible performance metrics. This can lead to the use of complex “black-box” DL methods. While such approaches might squeeze out an additional fraction of accuracy compared to simpler, inherently interpretable methods such as tree-based algorithms, they can be obscure in their operation, making it challenging for non-experts to understand or trust. The goal of the medicinal chemist or toxicologist is to obtain clarity and reliability. An incremental increase in accuracy is of secondary importance if they cannot discern why a prediction was made, the nature of the data on which the model was trained, or its relevance and reliability for their specific chemical series.
Addressing this challenge requires a shift in focus. While technical advancements are essential, equal weight should be given to the communication and presentation of model results. Efforts should be channelled towards creating interfaces and explanations that translate the complexities of ML into insights that are meaningful to toxicologists and chemists, as only then will there be a genuine alignment between computational advancements and practical toxicological applications, fostering greater confidence and integration of in silico methodologies within the field.
Methods to increase interpretability and trust in predictive toxicology models include the use of permutation feature importance and SHapley Additive exPlanations (SHAP).52 These provide importance scores to individual features, enabling an explanation of predicted outcomes. Structural features of high importance for predicting a toxicity outcome can be mapped to the original compound structure to produce ‘toxicophores’ of relevant chemical moieties in causing the unwanted interaction.53 Other examples of SHAP for drug discovery include its application to compound potency and multitarget activity prediction and its use for metabolic stability analyses.54,55 For example, Rodríguez-Pérez identified crucial groups for B-cell lymphoma 2 protein (Bcl-2) inhibition such as 2-amino-3-chloro-pyridine, and Wojtuch presented a case study showing that an aromatic ring with the chlorine atom attached increases metabolic stability.
Once these explanations have been generated, they should be reviewed and assessed by an expert human, such as a medicinal chemist. For example, a tertiary amine moiety is a known driver of hERG inhibition,56 and hence a predictive hERG model which gives a high importance to this chemical feature indicates that the model has learned the underlying causes of the molecular interaction, increasing trust and confidence in its predictions.
Beyond these PK considerations, physiological interactions intrinsic to living organisms further complicate the extrapolation. The interplay between organs, systemic responses, and immune-mediated reactions can markedly modulate a compound's toxicological profile.59 This biological complexity is why it is a challenge to predict direct in vivo or clinical endpoints based on chemical structure alone; a chemical representation is not sufficient information to predict idiosyncratic responses such as DILI. This is exemplified in a review of computational models for DILI prediction by Vall et al., who remark that higher order data types such as genomics, gene expression or imaging data may improve predictability of in vivo responses.60 The task is to bridge the observational gap between controlled in vitro environments and the dynamic realities of in vivo systems.
![]() | ||
Fig. 4 Different methods, data and heterogeneous relationships are required to elucidate the biological complexity of toxicity. |
One strategy for predicting in vivo organ-level toxicity is to integrate results from in silico predictions across multiple in vitro endpoints. For example, the prediction of DILI can benefit from combining predictions from models focused on bile salt export pump inhibition, mitochondrial toxicity, and liver (e.g., HepG2 cell) cytotoxicity. As each of these models addresses specific mechanisms that may contribute to DILI, their combined predictions can offer a holistic understanding of liver injury risk. This has been exemplified in work by Seal et al.,61 where this combined predictive approach outperformed direct predictions of DILI based on chemical structure alone. However, a prevailing challenge is mapping these discrete endpoints to organ-level responses. Recent efforts have aimed at statistically assessing the likelihood of adverse events arising from off-target or secondary pharmacology effects.62,63 As research progresses in this domain, the aim is to better understand the relationships and synergies between different endpoints to discern which combinations offer the most informative insights into risk.
Due to advancements in high-throughput technologies, the integration of ‘omics’ data has gained traction as a method for modelling compound responses. By considering the interplay between different biological processes, this approach captures a closer approximation of the system's response, offering a depth that complements traditional compound structure-based assessments.64,65 For example, genomics information can be used to understand the genetic predispositions that may influence a compound's effects, informing toxicity risk on a personalised level (pharmacogenomics).66 Transcriptomics data provides a snapshot of cellular response to compound perturbation and can be used to understand the mechanisms of toxicity of a compound (toxicogenomics).65 Such data is available on a large scale in the public domain through platforms such as LINCS L1000,67 and the datasets Open TG-GATEs68 and DrugMatrix69 link compound-induced gene expression data to in vivo findings (in rat) such as clinical chemistry, histopathology and toxic effects. Other ‘omics’ modalities, such as metabolomics and phosphoproteomics, offer views on compound metabolic pathways and protein signalling activity, respectively. The strength of ‘omics’ lies not in these individual datasets but in their integration. By combining these modalities, researchers can attain a layered, comprehensive view of compound-induced changes, from the genetic level to the functional metabolic outcomes. This allows for a more granular prediction and understanding of toxicities, facilitating a holistic approach to risk assessment. However, this integration is not without challenges. ‘Omics’ data often come with high biological variability and noise, making the extraction of meaningful signals a complex task – and the signal-to-noise ratio varies greatly across different modalities.70
The integration of ADME and PK/PD data, and predictions built on such data, can also aid in assessing in vivo risk.71 Such insights are critical for bridging the gap between in vitro findings and in vivo implications. When combined with toxicological predictions based on in vitro data, ADME and PK/PD data provide a clearer picture of the real-world exposure scenarios. For instance, while an in vitro assay might indicate hepatotoxicity, PK data might reveal that the compound doesn't reach the liver in significant concentrations, adjusting the perceived risk.
Knowledge graphs and network-based data structures have aided in this data integration challenge. These structured data representations capture intricate relationships between various biological entities, from genes and proteins to metabolic pathways. One example of a pre-made knowledge graph tailored for computational toxicity is ComptoxAI,72 providing links between chemical exposures, pathways and systems nodes that explain toxic outcomes (780038 distinct chemicals included as of July 2022). Beyond mere data storage, knowledge graphs facilitate efficient data retrieval and serve as robust platforms for advanced computational analyses. One of their significant strengths is the ability to integrate the results of machine ML models, such as predictions based on in vitro endpoints, into a coherent and interconnected framework. By doing so, they provide an enriched environment where predictions from different models can be combined with high-order data modalities, offering a holistic understanding of potential toxicological risks. This has been exemplified by Hao et al., who used data from ComptoxAI to predict causal chains of compound-gene, gene-pathway and pathway-toxicity interactions with a graph-based DL approach called AIDTox.73 An example of a prediction output can be seen in ref. 73. Fig. 5 shows the predicted important biological processes leading to dasatinib causing HEK293 cell death such as interaction with CYP1A2 leading to metabolism of lipids. This allows for a more granular understanding of, and potential elucidation of new mechanisms of drug-induced toxicity.
A particularly promising area is the application of causal reasoning techniques on knowledge graphs across multiple ‘omics’ layers.74 This approach can uncover the sequence of molecular events leading to a toxic outcome, providing insights that are more nuanced and closer to the real-world biological intricacies. For example, Trairatphisan et al.75 used a causal reasoning approach, CARNIVAL,76 to uncover aberrant cell signalling in DILI, leveraging Open TG-GATEs repeat-dosing transcriptional and in vivo histopathological data to identify a regulatory pathway among liver fibrosis-inducing compounds. By deciphering the intricacies of molecular pathways, causal reasoning can identify potential intervention points for mitigating adverse effects, or even reveal previously unknown off-target effects of compounds.
Despite the evident advantages, several challenges must be addressed to fully realize the potential of AI in predictive toxicology. These include data availability, the need for comprehensive and representative datasets, and the difficulties in translating in vitro and in silico findings to in vivo contexts. Overcoming these hurdles requires collaborative efforts across academia, industry, and regulatory bodies to promote data sharing and develop robust, interpretable models that can be trusted by practitioners. By nature, in silico toxicology is a highly interdisciplinary field and collaboration between wet lab scientists and AI developers is critical in building useful tools with maximum impact.
The future of drug discovery and toxicology will be increasingly data-driven, with AI and ML playing a central role in navigating the complexities of biological systems and predicting pharmaceutical outcomes. By integrating diverse data modalities and refining computational methods, we can move towards more accurate and efficient toxicity assessments, ultimately improving the safety and efficacy of new therapeutic candidates.
Footnote |
† Indicates equal contribution. |
This journal is © The Royal Society of Chemistry 2025 |