Open Access Article
Aishvarya
Tandon
a,
Anna
Santura
b,
Herbert
Waldmann
*a,
Axel
Pahl
*a and
Paul
Czodrowski
*b
aDepartment of Chemical Biology, Max-Planck-Institute of Molecular Physiology, Otto-Hahn-Str. 11, Dortmund, Germany. E-mail: herbert.waldmann@mpi-dortmund.mpg.de; axel.pahl@mpi-dortmund.mpg.de
bDepartment of Chemistry, Johannes Gutenberg University Mainz, Mainz, Germany. E-mail: czodpaul@uni-mainz.de
First published on 24th May 2024
Lysosomotropism is a phenomenon of diverse pharmaceutical interests because it is a property of compounds with diverse chemical structures and primary targets. While it is primarily reported to be caused by compounds having suitable lipophilicity and basicity values, not all compounds that fulfill such criteria are in fact lysosomotropic. Here, we use morphological profiling by means of the cell painting assay (CPA) as a reliable surrogate to identify lysosomotropism. We noticed that only 35% of the compound subset with matching physicochemical properties show the lysosomotropic phenotype. Based on a matched molecular pair analysis (MMPA), no key substructures driving lysosomotropism could be identified. However, using explainable machine learning (XML), we were able to highlight that higher lipophilicity, basicity, molecular weight, and lower topological polar surface area are among the important properties that induce lysosomotropism in the compounds of this subset.
![]() | ||
| Fig. 1 Lysosomotropism is primarily believed to be driven by lipophilicity and protonation state of the compound. In this diagram, “B” is a lysosomotropic compound and “BH+” is its protonated state. Illustration inspired by Kuzu et al. (2017).4 | ||
Recently lysosomotropism has especially drawn attention in multiple drug repurposing studies targeting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This stems from the crucial involvement of cathepsin L, a lysosomal protease, in cleaving the SARS-CoV-2 spike protein and facilitating virus entry into host cells. By the accumulation of lysosomotropic drugs in the lysosome, the compartment's pH rises, rendering proteolytic enzymes inactive and impeding viral replication.5–7
Consequently, well-established lysosomotropic drugs, e.g. chloroquine and hydroxychloroquine, were initially promising drug repurposing candidates against SARS-CoV-2. However, these drugs failed to demonstrate significant clinical benefits.6,8–13
In a more general sense, lysosomotropism is also a phenomenon of various drugs. It occurs for structurally different compound classes, modes of actions and targets, and is independent of species and cell-type.3,14 For instance, lysosomotropic properties have been observed in anticancer compounds (tamoxifen, doxorubicin, daunorubicin, mitoxantrone), tyrosine kinase inhibitors (imatinib, dasatinib, sunitinib, sorefenib), β-blockers (propranolol), antihistamines (promethazine, astemizole, dimebon, desloratadine), and selective serotonin reuptake inhibitors (sertraline, paroxetine, fluoxetine, fluvoxamine).4,15–17
Given such broad range of interests, several groups have contributed towards the measurement, quantification and prediction of lysosomotropism, see e.g. Nadanaciva et al. (2011), Ufuk et al. (2017), Schmitt et al. (2018) and Norinder et al. (2019).15,18–20 Highlighting the importance of identifying lysosomotropism early in the development process, Hu et al. (2023) recently developed and published models on phospholipidosis, a process related to lysosomotropism, using compound literature data. They also validated their results using a live-cell imaging assay.21
Compounds with a calculated log
P (clog
P) value greater than 2 and a basic pKa (bpKa) value between 6.5 and 11, representing lipophilicity and basicity, respectively, are likely to be lysosomotropic.15 Here, we will refer to this cross-section of properties as the physicochemical window (“PhysChem window”). While lysosomotropism at first seems to be primarily driven by lipophilicity and protonation state of compounds, it has been established that not all molecules which have the suitable physicochemical properties are in fact lysosomotropic.15,22
The cell painting assay (CPA) is an unbiased, image-based phenotypic assay where morphological profiles consisting of hundreds of features are generated from the images of compound-treated and control cells.23 One typical use case of the CPA is the formation of target hypotheses for test compounds with unknown biological activity. Here profiles of test compounds are compared to those of reference compounds with annotated targets or pathways. However, many compounds with lysosomotropic properties induce a distinct phenotype in the CPA which is independent of their target activity.3 We have observed that comparing morphological profiles of test compounds to that of a known lysosomotropic agent – smoothened agonist (SAG), is a reliable surrogate for determining lysosomotropism.
Machine learning (ML) methods have become an integral part of drug discovery. Some prominent methods are QSAR, prediction of chemical reactions and retrosynthesis, and the generation of novel chemical structures.24,25 Programming packages such as LIME and SHAP offer “explainability” of a model, enabling the interpretation of its predictions.26,27 The transparency about a model's predictions inspires confidence in researchers to trust them, which is why these packages are gaining popularity and application in drug discovery.28–30 Similarly, input features found important by a bioactivity prediction model can be determined using such packages, and this information can be used to develop a hypothesis of the underlying mechanism of the bioactivity.
Herein, we investigate the lysosomotropism observed by the CPA using matched molecular pair analysis (MMPA) and explainable machine learning (XML) to understand which physicochemical descriptor and/or chemical substructures affect lysosomotropism in compounds with feasible basicity and lipophilic values.
With the MMPA, we aim to identify key substructures which are responsible for transformation of a lysosomotropic compound to a non-lysosomotropic compound, and vice versa. Similarly, by interpreting tree-based machine learning (ML) models with molecular fingerprints we addressed the identification of important substructures whose presence affects lysosomotropism. Finally, by interpreting ML models with molecular descriptors as input, the determination of physicochemical parameters which affect lysosomotropism is attempted.
In our implementation of post-imaging analysis following the feature calculation by the open-source software CellProfiler, 579 Z-scores of morphological features are deduced per compound. The Z-score of a morphological feature represents the difference between a morphological feature and its relative DMSO control. A compound's morphological profile (or simply its CP profile) is thereby a list of its Z-scores.34
Our processed CP data represents a total of 13
450 compounds. In this data, 3114 are reference compounds, whose biological activities are annotated, and 10
336 are internal research compounds. The internal research compounds primarily consist of natural products-inspired compounds and pseudo-natural products.38,39 2065 compounds are present in the PhysChem window, and thereby are relevant to this study.
Induction, a measure of bioactivity, is the percentage of significantly altered features. Compounds with the induction value greater than or equal to 5 are considered bioactive.34 In the PhysChem window, 1196 compounds are bioactive, while 869 are not.
The CPA is a routine in-house screening assay at the Compound Management and Screening Center, Dortmund, and is used in identifying numerous biological clusters and pathways, among them lysosomotropism. The similarities between CP profiles can be measured by Pearson's similarity. The CP profiles are considered similar if their Pearson's similarity values are greater than 75%. Schneidewind et al. (2021) identified a biocluster in the CPA data whose mode of action is likely due to disturbed cholesterol homeostasis caused by lysosomotropism.3
Smoothened agonist (SAG), a well-established lysosomotropic compound, is present as a reference compound in the dataset and can be found in the reported biocluster. Because of its pronounced profile in the CPA, SAG was used as the reference compound for defining the lysosomotropic phenotype. This similarity score, termed as the lyso score, ranges from 0 (indicating no biosimilarity) to 100 (indicating full biosimilarity). Compounds with a lyso score above or equal to 75 were annotated as lysosomotropic given the high biosimilarity of their profile to the profile determined for SAG, the rest were labelled as non-lysosomotropic. Out of the 2065 compounds present in the PhysChem window, 1327 were labelled as non-lysosomotropic and the remaining 738 as lysosomotropic.
Fig. 2 exemplarily shows three lysosomotropic reference compounds – imatinib, toremifene, and clozapine – and their CP profiles in comparison to SAG. The morphological profiles of these three compounds and that of SAG are very similar, although these compounds have different primary targets and chemical structures. Imatinib, a tyrosine kinase inhibitor, is primarily used to treat chronic myeloid leukemia,40 whereas toremifene, a selective non-steroidal estrogen receptor modulator, is administered to treat breast cancer.41 Clozapine is an anti-psychotic drug used in the treatment of severely ill patients with schizophrenia. While clozapine's mode of action is unknown, it is proposed to be an antagonist of dopamine and serotonin receptors.42
Furthermore, it is anticipated that the effect of a substituent on the respective physicochemical/biological property can be generalized, i.e. that its contribution is transferable across compound series.48
Given that the method captures the implicit knowledge contained in the chemical dataset in a systematic and automated manner, the emergence of the rules is fully explainable (as it is easy to trace back to the underlying compound pairs), resembles the intuitive way of a chemist's thinking, and lacks the “black box” character, which is frequently raised as a point of critique concerning machine learning or other in silico methods utilized for (Q)SAR analysis.45,46
The MMP concept has been widely employed.43–51 However, to the best of our knowledge, the application of this approach to a CP data set and with a view to lysosomotropism has not been reported to date.
Altogether, 4441 unique transformations have been found regardless of the lysosomotropism classification. However, >99% of them occur less than 7 times (Fig. S1) (ESI†). Similar numbers, as well as the Zipfian-shaped distribution of counts (number of occurrence), have already been described by Hussain and Rea (2010),51 among others.
In summary, only 27 transformations occur ten times or more – of which the top 10 transformations with the highest numbers are shown in Fig. 3. Unsurprisingly, most of the “high-count” transformations found either resemble “simple” terminal group substitutions, where only a single atom is replaced, or functional group substitutions.
The Δ value is calculated by subtracting the lyso score of the “from” compound from that of “to” compound. In other words, if the Δ lyso score distribution is shifted to the right, the transformation is accompanied by an increase in lysosomotropism. However, in none of the “high-count” conversions shown in Fig. 3, the change in lyso score incidental to the transformations is in one direction only. The MMPA performed in this work intimates that there does not appear to be a dominant structural feature in the compound library under investigation that determines lysosomotropism.
We used RDKit to generate one- and two-dimensional molecular descriptors of the compounds.56 Around 200 molecular descriptors can be generated by the RDKit. However, since we were aiming for explainability, we manually selected the descriptors which are intuitive, such as NumHAcceptors, TPSA, FractionCSP3, fr_nitro, etc. Some examples of unintuitive descriptors, which were removed, are PEOE_VSA7, VSA_EState8, SMR_VSA6, etc.
In total, 107 intuitive molecular descriptors were selected. These select descriptors are listed in the Table S1.† log
P and bpKa calculated by ChemAxon cxcalc, were also used additionally for one of the models.
We prepared XGBoost binary classifiers as described below. Except for the scale_pos_weight hyperparameter, which was used to provide the weights of “Non-Lysosomotropic” and “Lysosomotropic” classes to control the class imbalance, default hyperparameters were used for training. The performances of the models trained with the default hyperparameters and with the optimized hyperparameters by the package Optuna59 were found similar, and thereby the default hyperparameters were used. All the models were trained on the internal data. Stratified 5-fold cross-validation with the balanced accuracy and the Cohen's kappa score as model performance metrics, was used to validate the models. The libraries present in the Scikit learn package were used for these calculations.60
Due to the numerical nature of the descriptors, TreeExplainer could be used directly on the models trained on the molecular descriptors and various SHAP plots can be employed to study the descriptors and their importance on a data set. However, molecular fingerprints are binary in nature and especially in the case of Morgan fingerprints, multiple substructures can be encoded in the same bit. Thus, highlighting bits as important is unintuitive unless the substructures they encode are known. We used X-FP, a Python library, to compute the substructures of the bits which Morgan fingerprints encodes. We then used X-FP's functionality to calculate feature importance by SHAP TreeExplainer and visualized these important bits and the substructures they encoded.64
![]() | ||
| Fig. 4 The 5-fold cross validation results of the all models. The black line on top of the bars indicates the standard deviation. | ||
The molecular descriptor model, “Select_RDKit_desc_with_logP_bpKa1_unscaled model”, and the molecular fingerprint model, “Morgan_FP_radius2_model” were selected as the representatives of their respective model types. For convenience, these models are referred to as the “Descriptor model” and the “Fingerprint model”, respectively.
The descriptor model has the average balanced accuracy of 0.79 with the standard deviation of 0.02, and the average Cohen's kappa score of 0.59 with the standard deviation of 0.05. Similarly, the fingerprint model has the average balanced accuracy of 0.77 with the standard deviation of 0.03, and the average Cohen's kappa score of 0.54 with the standard deviation of 0.05.
The descriptor model's balanced accuracy is 0.68 while its Cohen's kappa score is 0.3. The Fingerprint model's balanced accuracy is 0.51 and its Cohen's kappa is 0.01. The confusion matrices of these models' performances are shown in the Fig. 5.
The color gradient from blue to red indicates the value of a feature from lower to higher. In our case, positive Shapley value correspond to the lysosomotropic class, and negative Shapley values to the non-lysosomotropic class.
Fig. 7 and 8 are the SHAP summary plots for the descriptor model on the training dataset and the validation datasets, respectively. These plots show the descriptors which are found important in each of the SHAP analysis.
SHAP dependence plots are scatter plots between a feature and their Shapley values. The dependence plots of top 10 features of the descriptor model for all the three datasets are present in the ESI† (Fig. S2–S4).
In the SHAP analysis of the training dataset, log
P and bpKa1 are found as the most important descriptors. It can be observed in Fig. 7 that the higher log
P and bpKa1 values contribute the model output towards the lysosomotropic class, and vice versa. Similar observations are noted in both of the SHAP analysis of the validations sets, the only exception being that the bpKa1 is the third most important descriptor in the time-split dataset.
Descriptor fr_NH1 which describes the number of secondary amines, is found important in the SHAP analysis of the validations sets. Here, it is noted that the higher number of secondary amines have positive Shapley values indicating that they contribute model outputs to the lysosomotropic class.
Higher topological polar surface area (TPSA) is associated with poor cell membrane permeability. The inverse relationship between the TPSA values and their corresponding Shapley value indicates that the model outputs are driven towards non-lysosomotropic class when the TPSA of the compounds is higher. This can be justified if it is hypothesized that the non-lysosomotropic compounds have poor lysosome and/or cell membrane permeability.
Descriptor HeavyAtomMolWt calculates the average molecular weight of compounds while ignoring the hydrogen atoms. Across all three datasets, especially in the training and the external datasets, it can be observed that lower heavy atom molecular weights have negative Shapley values. Interestingly, the magnitude of negative Shapley values is higher than the positive Shapley values, indicating that the model finds lower molecular weights more important in classifying compounds as non-lysosomotropic.
The X-FP reports of the SHAP analysis of the fingerprint model of different datasets are present in the ESI.† Bit 2049, primarily encoding the sp3-hybridized carbon atom, is found important across all three datasets. When this bit is switched on – indicating the presence of the substructure encoded, the Shapley values are positive. This means that the presence of this substructure contributes the model prediction to the lysosomotropic class. Similarly, bit 3959, mainly encoding a secondary carbon across all the datasets, is found important in the training set and the external set. Positive Shapley value when this bit is switched on shows that presence of this substructure affects model predictions towards the lysosomotropic class.
Another bit encoding sp3 hybridized carbon is bit 1028 which encodes a carbon atom in the aliphatic ring. This bit is found important in the training dataset and the external dataset. Here too, presence of such substructure favors model predictions towards the lysosomotropic class. Interestingly, FractionCSP3 is a descriptor which describes the fractions of sp3 hybridized carbons present in a compound, and this descriptor is found important in the SHAP analysis of the descriptor model of all the datasets. Thus, both of the models find the sp3 hybridized carbon substructures important and the therefore suggests predictions towards the lysosomotropic class in the presence of such substructures.
Bit 2715, depending on its neighboring groups, might be encoding a secondary amine. This bit is important across all the three datasets. Bit 3200 encoding an aliphatic nitrogen atom is important across all three datasets. For both of these cases, presence of these substructures would impact the model output towards the lysosomotropic class. This is in line with the finding that basic pKa is consistently found as relevant descriptor in the SHAP analysis which is mostly driven by amine moieties.
The exemplar top bits and their substructures are shown in Table 1.
![]() | ||
| Fig. 9 ECDF plots of the maximum chemical structure similarities of time-split dataset (A) and the external dataset (B) with the training dataset. | ||
80% of the time-split dataset show a maximum Tanimoto similarity of less than 0.4 to the training set, whereas for the external dataset it is 0.3. This shows that the chemical spaces of both of the validation sets are diverse in comparison to the training set.
We also performed principal component analysis (PCA) of the input descriptors of the combined datasets. However, only 19% explained variance ratio was observed in the first three principal components.
P
P was one of the two criteria for defining the PhysChem window used here, is the top descriptor in all 3 SHAP analyses of the descriptor model performed on the 3 data sets (training, time-split, and the external validation). All of these analyses show a common trend that higher log
P values tend to have higher SHAP values and vice versa. Such relation can be interpreted such that with the higher log
P values, the model favors the lysosomotropic class, and similarly with the lower log
P values, the model instead favors the non-lysosomotropic class.
This relation between log
P values and lysosomotropism can also be noticed in the violin plot of the original lysosomotropic class distribution versus the log
P values (Fig. 10). Here, across all the three datasets, the lysosomotropic compounds tend to have higher log
P values compared to the non-lysosomotropic ones.
![]() | ||
Fig. 10 Violin plot of log P values across all datasets. The lysosomotropic classes are based on the cutoff of the compounds' lyso score. The violins are scaled based on the number of observations. | ||
This relationship between molecular weight and non-lysosomotropism can be observed in the violin plot of the original lysosomotropic class distribution versus the molecular weight of the compounds (Fig. 11).
P > 2, bpKa between 6.2 and 11), this lead to a restriction of the entire data towards basic moieties. However, the XML still might not have been capable of identifying this bias, but they correctly identified the fact that basic moieties are relevant for lysosomotropism.
Many important bits encode substructures of the Morgan fingerprint radius of 0. This means that these substructures are single atoms, and thereby too unspecific to hypothesize any chemical structure-based activity.
First, as highlighted in the previous section, due to the absence of any key substructures inducing lysosomotropism, the substructures found important by the fingerprint model might not be specific enough to differentiate between lysosomotropic and non-lysosomotropic compounds. Furthermore, if lysosomotropism is mainly driven by log
P and bpKa, then it will hard to find any substructure motifs that drive a log
P/bpKa change which in turn modulates lysosomotropism. This is due to the fact the very same (small, because it only considers the neighbors up to two bonds apart) substructure detected by a fingerprint can have very different log
P/bpKa values based on their decoration.
The chemical diversity of the training and the test sets form a challenge to the performance of the fingerprint model. This is especially pronounced when the presence or absence of the individual substructure (encoded as a bit in the fingerprint) has an impact on the biological readout, is influenced by the neighboring groups, the position in the molecule or by the stereochemistry, which is ignored by the fingerprints employed here.
The reasonable performance of the descriptor model compared to the fingerprint model further supports the notion that the lysosomotropism is primarily caused by physicochemical properties of a compound.
Out of 1369 non-lysosomotropic compounds present in the combined training set and time-split set, 682 of them were also tested at higher compound concentrations of either 30 μM and 50 μM in the CPA in addition to the standard 10 μM compound concentration. While approximately 40% of such compounds (277 compounds) show lysosomotropism at higher concentration, the remaining 405 compounds remain non-lysosomotropic. Moreover, out of these 405 compounds which stay non-lysosomotropic at both standard and higher concentrations, 255 of them are bioactive (induction >5).
Nadanaciva et al. also reported that the two non-lysosomotropic compounds present in the PhysChem window in their analysis, especially risperidone, did not show lysosomotropism even at the highest tested concentration of 150 μM.15
Further investigation in this area might offer insight on this observation.
P values and the bpKa values of the compounds in the wet-lab at a bigger scale, at least in an academic setup. Thus, in our study the values of these descriptors are predicted. These descriptors are influential in the study. First, the compound selection is based on the compounds' presence in the PhysChem window, which is the cross-section of the log
P values and the bpKa values. Second, these descriptors were found as the most important descriptors by the descriptor model. Therefore, our analyses might be affected due to the differences in the true and the predicted values of these descriptors.
While our internal data is a compilation of diverse compounds over many years, it cannot cover the full chemical space. Our models are trained on this finite data and the hypotheses derived from these results are therefore limited to this representation.
The lyso score of 75 or above, indicating biosimilarity, is a hard cut-off value to determine the lysosomotropic class of the compounds. The compounds around this cut-off value will be biosimilar, however, their lysosomotropic classes will be different. This could affect our analyses.
Lastly, the SHAP analyses do not show causation for the ground truth, rather they show what features were found important by a model in a dataset. Since our models' performances are limited, the SHAP analyses are also not definitive.
P and bpKa values does not suffice it to be a lysosomotropic compound. By performing a MMPA, we were not able to detect key substructures that can be made responsible for a lysosomotropic effect.
Our ML models were trained on the internal dataset and tested on diverse validation sets. This ensured that these models' predictions and, by extension, the interpretable analyses done on them are general and not specific to the training set. These models revealed that the lysosomotropic effect is favored when the compounds have high lipophilicity, basicity, high number of basic amines and high number of sp3-hybridized carbons, and low TPSA.
000 compounds were measured in the cell painting assay out of which roughly 450 compounds are known lysosomotropic active as reported in literature. These compounds are used as reference system for comparison of the cell painting profiles of the remaining compounds. Out the overall 13
000 tested compounds, 738 compounds are inside the PhysChem window and lysosomotropic. But there are 1327 compounds are inside the PhysChem which are not lysosomotropic. A filtering strategy was employed on the internal CP data set till 2022. 10 μM compound concentration was selected. Toxic compounds and those flagged with purity alerts were excluded. Compounds with heavy atom count exceeding 50 were removed. Compounds whose calculated log
P and bpKa1 values were empty were removed. Final selection was then made on the compounds present in the PhysChem window – log
P value greater than 2 and bpKa1 value greater than 6.2 and less than 11. apKa1 and apKa2 were removed since they were redundant in the context of this study. In total, 2065 compounds were available for MMPA, and model training and cross validation.
The bpKa1 limit of 6.5 from literature was lowered to 6.2 to enable the inclusion of a larger group of lysosomotropic compounds (53 compounds) present in the bpKa1 range between 6.5 and 6.2.
Same steps were repeated to obtain compounds for the time-split validation set on the 2023 dataset.
Details about selected compounds from the time-split validation and the training data can be found here: ref. 34, 37, 39, 67 and 68.
The corresponding code for the filtering strategy is available on GitHub.69
P and bpKa, were calculated by the RDKit. Log
P and bpKa were calculated by ChemAxon Marvin cxcalc version 22.22.0 (https://www.chemaxon.com).
All the molecular fingerprints were generated by the RDKit. The Morgan fingerprints were calculated as bit vectors and the option to save the bit info was enabled to perform the SHAP analysis of the substructures using X-FP.
For each compound present in a query dataset, chemical structure similarity against all of the compounds in the training dataset was calculated. The compound present in the training data which is structurally most similar to the query compound would therefore give the highest similarity score, and this score would be the query compound's maximum similarity score against the training dataset.
By plotting the maximum similarity scores of all the compounds of the query dataset as an ECDF plot, the percentage similarity to the training dataset can be observed.
P and bpKa were calculated by ChemAxon Marvin cxcalc version 22.22.0 (https://www.chemaxon.com) – the compounds present outside the PhysChem window were removed. To remove promiscuous compounds from the data in the following step, filtering proposed by Novartis Institutes of BioMedical Research (NIBR), which includes filters for the PAINS filter families A and B, was performed.72,73
The purchased compounds have at least a purity of 90 percent, measured by liquid chromatography–mass spectrometry (LC–MS). More details can be found in the ESI.†
Maximum similarity calculation was performed against the internal MPI dataset and diverse compounds were short-listed. The original lysosomotropic class ratio present in the training dataset (65% non-lysosomotropic and 35% lysosomotropic) was aimed to be maintained, therefore, 127 compounds where 81 as non-lysosomotropic and 46 as lysosomotropic compounds predicted by the descriptor model were short-listed and ordered. For these 127 compounds, the fingerprint model classified 70 as non-lysosomotropic and 57 as lysosomotropic.
| bpKa | Basic pKa |
clog P | Calculated log P |
| CP | Cell painting |
| CPA | Cell painting assay |
| LC–MS | Liquid chromatography-mass spectrometry |
log P | Octanol/water-partition coefficient |
| LTR | LysoTracker Red DND-99 |
| ML | Machine learning |
| MMP | Matched molecular pair |
| MMPA | Matched molecular pair analysis |
| PCA | Principal component analysis |
| QSAR | Quantitative structure activity relationship |
| SAG | Smoothened agonist |
| SAR | Structure activity relationship |
| SARS-CoV-2 | Severe acute respiratory syndrome coronavirus 2 |
| SHAP | Shapley additive explanations |
| SMILES | Simplified molecular input line entry system |
| SPR | Structure property relationship |
| TPSA | Topological polar surface area |
| XGBoost | Extreme gradient boosting |
| XML | Explainable machine learning |
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4md00107a |
| This journal is © The Royal Society of Chemistry 2024 |