Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models
Abstract
Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.
- This article is part of the themed collection: AI in Medicinal Chemistry