Marja A.
Heiskanen
ab and
Tero
Aittokallio
*abc
aBiomathematics Research Group, Department of Mathematics, University of Turku, FI-20014, Finland
bFinnish Doctoral Programme in Computational Sciences (FICS), Aalto University, School of Science, FI-00076, Finland
cInstitute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00014, Finland. E-mail: tero.aittokallio@fimm.fi
First published on 31st January 2013
Chemical-genomic and genetic interaction profiling approaches are widely used to study mechanisms of drug action and resistance. However, there exist a number of scoring algorithms customized to different experimental assays, the relative performance of which remains poorly understood, especially with respect to different types of chemogenetic assays. Using yeast Saccharomyces cerevisiae as a test bed, we carried out a systematic evaluation among the main drug target analysis approaches in terms of predicting global drug–target interaction networks. We found drastic differences in their performance across different chemical-genomic assay types, such as those based on heterozygous and homozygous diploid or haploid deletion mutant libraries. Moreover, a relatively small overlap in the predicted targets was observed between those approaches that use either chemical-genomic screening alone or combined with genetic interaction profiling. A rank-based integration of the complementary scoring approaches led to improved overall performance, demonstrating that genetic interaction profiling provides added information on drug target prediction. Optimal performance was achieved when focusing specifically on the negative tail of the genetic interactions, suggesting that combining synthetic lethal interactions with chemical–genetic interactions provides highest information on drug–target interactions. A network view of rapamycin-interacting genes, pathways and complexes was used as an example to demonstrate the benefits of such integrated and optimized analysis of chemogenetic assays in yeast.
A number of experimental–computational strategies have been introduced over the past decade for systematic analysis of drugs and their targets.1–3 In particular, by taking advantage of the genome-wide deletion-mutant collections in yeast Saccharomyces cerevisiae, it has become possible to measure genome-scale growth phenotype responses to combined genetic and chemical perturbations, thereby enabling systematic means to identify candidate drug targets in vivo. Such chemical-genetic profiling approaches to discovering molecular targets and mechanisms of action of compounds involve various experimental assays, such as those based on drug-induced haploinsufficiency profiling (HIP) or homozygous deletion profiling (HOP), as well as their integration with genetic interaction profiling.2,3 These profiling assays generate a rich source of high-dimensional datasets, and custom-designed computational data analysis methods have been tailored for mining the data from different experimental assays.4–7
Despite the widespread application of the chemical-genomic and genetic profiling approaches, there is only rather scattered information available on the performance of various experimental assays and customized computational solutions in terms of their relative accuracy at predicting drug–target interactions. For instance, the heterozygous deletion strain collection has been considered as being more effective for identification of direct targets of chemicals, while the homozygous diploid or haploid deletion mutant collections are being used to identify genes involved in buffering the drug target pathways.2,3,5 However, to our knowledge, there are no systematic studies of their relative performance on a common set of shared chemicals and gene deletion mutants. Another open question, directly related to the experimental costs of conducting such assays, concerns whether the genetic interaction profiling of double-deletion mutants can really provide any added information on drug target prediction beyond that obtained from the HIP or HOP approaches alone.
In addition to such experimental design questions, there is no general consensus on a computational scoring approach that, on the basis of the experimental data, would most accurately rank the genes according to their likelihood of being a target of a particular compound. For instance, target identification in the HIP approach is typically based on the fitness defect, the so-called FD-score, which implies that the heterozygous mutant deleted for the drug target shows an increased sensitivity to the particular compound.4,6,8 In the HOP approach, on the other hand, the haploid or homozygous diploid deletion of the target or another gene in the same pathway may also lead to increased resistance.9 Moreover, despite the development of customized methods based on profile correlation,7 or the so-called I-score,9 it remains unclear what is the most effective way to compare the drug–mutant relationships (either sensitivity or resistance) with the double-mutant fitness phenotypes obtained from genetic interaction profiling.
In the present work, we systematically compared various drug target profiling and scoring strategies using genome-wide datasets on S. cerevisiae, with the aim of evaluating their relative merits and potential limitations, especially with respect to scoring positive and negative fitness responses to chemical treatments. We also introduce here a novel scoring approach, named SR-score, which combines the rankings from two scoring methods in such a manner that it places special emphasis on the early target recognition. The systematic evaluation was carried out using the curated STITCH database of known and predicted drug–gene interactions as an external benchmark set. We further investigate the reproducibility of the target gene rankings using the replicate measurements from the chemical-genomic assays as an internal control. Statistical inference was used to assess the significance of the differences observed. Finally, the optimized methods were applied to construction of an integrated drug–target interaction network for rapamycin treatment.
The third chemical-genomic dataset consists of 4111 yeast haploid deletion strains grown in 82 different chemicals or natural product extracts.11 After including those gene deletion strains present in the genetic interaction dataset, the dimension of the haploid dataset is 1256 × 82, with 2% missing data values.
All of these scoring methods result in a unique ranking of each drug–target pair. There are also other approaches, such as hierarchical clustering or the factorgram method,11 which allow visualizing and clustering chemicals and gene deletion strains into functionally relevant groups with similar biological effects in an efficient way. However, such exploratory approaches were not considered in the present study.
εij = wij − wiwj, |
We also tested alternative variants of the ρ-score, based either on the Spearman correlation coefficient or a simple overlap statistic7,10 of the most sensitive (or resistant) gene deletion strains in the chemical-genomic and genetic interaction profiles. However, the Pearson correlation-based ρ-score was selected since it showed the best performance (data not shown).
Iic = |Zic(FD)| + Zic(ρ). |
The positive tail of the I-score distribution identifies the potential drug–target interactions. A missing value in either of the scores FDic or ρic results in a missing value in Iic as well.
The operation of the SR-score is illustrated in Fig. S2 (ESI†). The positive tail of the SR-score identifies potential drug–target interactions.
There are many fundamental differences between the SR- and the I-score. First, since the Savage-score is based on ranks, the ranges of the Savage-scores are always equal for both of the FD- and ρ-scores, making it robust against possible outliers. In the I-score, either one of the Z-score-normalized FD- or ρ-distributions may spread wider than the other, thus putting more emphasis on the tail of the wider distribution. Second, the FD- and ρ-scores often result in somewhat complementary rankings. Whereas the I-score puts equal weights on all observations, the SR-score stresses more the top-ranking targets and gives only minor emphasis on later ranks. Thus, a poor performance of either one of the FD- or ρ-score does not mask the good performance of the better approach, while good performance of both of the approaches is further enhanced in the calculation of the SR-score. Finally, the SR-score can be calculated even if one or both of the FD- or ρ-scores is missing; in such cases, a gene deletion strain with a missing value is just ranked last in the first step.
Thus, under a random scoring, the proportion of successes for rank r is In the external evaluation, M corresponds to the total number of STITCH links used in the evaluation, whereas for replicates M = rC/2. In the internal evaluation, the ideal scoring corresponds to the case in which the proportion of successes is 0.5 for every rank r, since ideally both of the replicates are ranked similarly. For the STITCH links, the performance curve of the ideal scoring is obtained by assuming that for every drug treatment c, the corresponding numbers of the STITCH links Mc are ranked first.
In the case of the STITCH links, the performance curve of a scoring method decreases with larger ranks, and here the whole area under the curve is reported; similar results are obtained when considering only the ranks smaller than 150 (Table S1, ESI†). However, in the case of replicates, the performance curve increases constantly with larger ranks, while only the early ranks are often interesting in practice. Therefore, the partial area under the curve for ranks smaller than 150 was calculated and reported here.
In order to enable a direct comparison of the datasets, we tested their performances using the shared set of genes and drugs present in each dataset pair (Fig. 1). Interestingly, the haploid dataset performed significantly better than the other two datasets (Fig. 1A and C), whereas the homozygous gene deletion collection was more accurate compared to the heterozygous dataset (Fig. 1B). The same trends were also observed when directly comparing all the three datasets (Fig. 1D), although the number of overlapping drugs in this analysis was relatively small. The overlap in the STITCH links when using the technically similar homozygous and haploid datasets was relatively large compared to the overlaps between the other dataset pairs, suggesting that the homozygous diploid and haploid deletion collections provide to some degree redundant findings (Fig. 1A).
![]() | ||
Fig. 1 Comparison of the homozygous, heterozygous and haploid datasets when predicting drug–target interactions using shared sets of STITCH links. The curves illustrate the number of STITCH links relative to the total number of drug–gene pairs at varying ranks when considering only the common gene deletion strains, drugs and STITCH links present in the datasets under comparison. The pairwise dataset comparisons of (A) homozygous and haploid, (B) homo- and heterozygous, and (C) haploid and heterozygous datasets are performed using the number of STITCH links and drugs denoted in each figure. In figures (A)–(C), the curves for overlapping STITCH links of the FD- and ρ-scores for the two datasets are also shown along with a random overlap. In (D), the shared STITCH links in all three chemical-genomic datasets are considered. |
To make the evaluations as comprehensive as possible, we also repeated these analyses using all the available drugs and STITCH links within each dataset separately. In line with the previous results, the homozygous and haploid datasets performed again significantly better than the heterozygous dataset (Table S2 and Fig. S3, ESI†). In general, the FD-score seemed to be more accurate than the ρ-score in recovering STITCH links, suggesting that these two scoring approaches may recover somewhat different drug–target interactions, which motivates their integration in the following sections.
Taken together, these results demonstrate the relatively poor performance of the heterozygous diploid deletion dataset, compared to the homozygous diploid or haploid deletion datasets. As the haploid dataset showed the best performance, we concentrate our further analysis on this dataset. The corresponding results for the homo- and heterozygous datasets are provided as ESI.†
In the haploid dataset, each of the scoring approaches performed significantly better than a random scoring (Fig. 2). The ρ-score showed somewhat poorer performance compared to the FD-score, but the difference was not statistically significant (Table 1). However, the target prediction was improved when applying the integrated scoring approaches, with the SR-score leading to the significantly best overall performance (Table 1). Interestingly, about half of the interactions found by the FD- and ρ-scores were overlapping, suggesting that these two approaches lead to somewhat complementary results. The benefit of the integrated scoring approaches is that they capture relatively large portions of the interactions found by either the FD- or ρ-score alone (Fig. 2, inset).
![]() | ||
Fig. 2 Performance of the different scoring approaches in the haploid dataset. The curves illustrate the number of recovered STITCH links relative to the total number of drug–gene pairs at varying ranks when the gene targets are ranked for each drug separately using different scoring methods. The Venn diagrams in the inset describe the number of overlapping STITCH links from the different scoring approaches at rank 50. |
Diagonal: area under the curve (AUC) and p-values compared to random. Above diagonal: pair-wise differences of AUCs (row–column). Below diagonal: p-values for the pair-wise score differences. |
---|
![]() |
The results obtained from the homo- and heterozygous datasets further support the good overall performance of the SR-score (Fig. S4, ESI†); in the homozygous dataset the improvement provided by the SR-score compared to all the other scoring methods was statistically significant, whereas in the heterozygous dataset only the difference between the SR-score and the poorest performing ρ-score was significant (Table S2, ESI†). The overlap between the FD- and ρ-scores is relatively small in the homozygous dataset and even smaller in the heterozygous dataset, further implying that these complementary approaches find different drug–target interactions (Fig. S4, insets ESI†).
Table S3 (ESI†) provides more detailed information on a selected collection of direct drug–target interactions;17,18 for example, in the heterozygous dataset, the interaction lovastatin–HMG1 has ranks 1 and 874 when using the FD- and ρ-scores, respectively. Integration of these ranks led to ranks 1 (I-score) and 2 (SR-score). On the other hand, in the haploid dataset, camptothecin–TOP1 interaction was ranked as poorly as 1252 using the FD-score and 185 with the ρ-score. When integrating these through the I-score, the ranking was improved to 29. In general, the SR-score in the haploid dataset results in the best average ranking of the known drug–target interactions (Table S3, ESI†).
Finally, we also tested how the confidence level of the STITCH links affects the results; in all cases, the SR-score provided consistently the best performance in all datasets when considering all ranks (Table S1, ESI†). This indicates that the good performance of the SR-score does not originate from any subset of the STITCH links used in the evaluation.
![]() | ||
Fig. 3 The effect of using negative, positive or both tails of the genetic interaction profiles (ε-score) or chemical–genetic interaction profiles (FD-score). The coloured traces describe the predicted accuracy for the ρ-score when using different percentages of non-missing FD- and ε-score pairs selected according to the respective tail of the SGA or haploid datasets in the calculation of correlations. The traces for the I- and SR-scores are calculated using the negative tail of the SGA dataset, which shows the best performance. Since the FD-scores alone do not rely on correlations, their AUC values are not affected by the percentage of the pairs used. |
Perhaps a more interesting question is whether the ρ-score can be improved by calculating the correlations over only certain pairs of the chemical-genomic and genetic interaction profiles, instead of using all the pairs with non-missing values in both the FD- and ε-scores. We addressed this question by including different quantiles of pairs in the calculation of correlations based on the negative, positive or both tails of the genetic interaction as well as the chemical-genomic profiles (see Fig. S6, ESI† for an example).
We observed that synthetic lethal genetic interactions (negative tail of the ε-score) provide most information on the prediction of drug–target interactions (Fig. 3). More specifically, the optimal performance of the ρ-score was achieved when correlations are based on the most negative portion of genetic interactions in conjunction with the FD-score distribution ranked according to increased sensitivity. In the haploid dataset, the optimal percentage was 75% (referred to as ρ*-score), after which the performance starts again to decrease. The ρ*-score showed significant improvement in AUC-values compared to the normal ρ-score (p < 0.001). For example, the rank of the gene CTA1 under hydrogen peroxide treatment decreased from 207 (ρ-score) to 26 (ρ*-score) in the haploid dataset (Table S3, ESI†).
We then studied how the optimal ρ*-score affects the I- and SR-scores (Fig. 3 and Fig. S5, ESI†). The shape of the curve obtained with the SR-score resembles closely the curve of the ρ-score, which implies that the ρ-score has, in general, a larger effect on the SR-score than on the I-score (most notably seen in the heterozygous dataset, Fig. S5, ESI†). However, both the I- and SR-scores can be enhanced by applying the optimized ρ*-score in their calculations.
Interestingly, calculating correlations according to the positive tail of the genetic interaction dataset immediately decreased the performance of the ρ-score, with AUC being eventually even worse than that of random scoring. This suggests that leaving out even a small fraction of the most synthetic lethal genetic interactions has clearly a negative effect on the ρ-score. On the other hand, concentrating on the most sensitive (negative) fitness defects in conjunction with the genetic interaction profiles had hardly any effect on the ρ-score. While leaving out the most sensitive drug–target interactions decreased the ρ-score, the effect was smaller compared to excluding the most negative genetic interactions. Taken together, these results suggest that the synthetic lethal interactions correspond to sensitive fitness defects in respective genes more often than observed vice versa. Similar observations can be made also in the homo- and heterozygous datasets (Fig. S5, ESI†), further emphasizing the significant role that synthetic lethal genetic interactions have in drug target prediction.
The differences between the basic versions of each of the four scoring approaches were relatively small, suggesting that all the approaches result in equally reproducible rankings (Table 2). When considering the best variation of each scoring approach, however, the I*-score seems to be the one providing the most coherent rankings (Table 2 and Fig. S7A, ESI†). Notably, using the ρ*-scores in the calculations of the I*- and SR*-scores improved significantly the performances compared to those versions calculated with the normal ρ-scores (Table 2). Hence, choosing the pairs for computing the correlations according to the negative tail of the SGA dataset does not only improve the recovery of the STITCH links, but also improves the consistency of the potential target gene ranking. The complete results obtained using the different tails of each score distribution are shown in Table S4 (ESI†).
Diagonal: area under the curve (AUC); all AUCs are better than random (p < 0.001). Above diagonal: pair-wise differences of AUCs (row–column). Below diagonal: p-values for the pair-wise score differences. ρ*-score: correlation based on 75% of pairs using the negative tail of the SGA dataset. I* and SR*-scores: calculated using the negative tail of the FD-score and the ρ*-score. Basic versions of each of the scoring approach are in italics; the best variations are in bold. |
---|
![]() |
The corresponding results for the heterozygous dataset are provided in ESI† (Table S5 and Fig. S7B). Here, the AUC values are, in general, again much smaller compared to those of the homozygous dataset, suggesting that besides recovering the STITCH links better, the homozygous dataset also provides more coherent results. Interestingly, the positive tail, along with the absolute values, of the FD-scores clearly outperformed all the other methods when considering reproducibility of the results. This indicates that the selection of the distribution, the tail and the percentage of the pairs used in the calculation of the ρ*-, I*- and SR*-scores according to the best performance at recovering the STITCH links does not lead to the optimal reproducibility in the heterozygous dataset.
![]() | ||
Fig. 4 An integrated network of the rapamycin-interacting genes, pathways and complexes. The network is based on the genes most related to rapamycin by the SR-score in the homozygous dataset. Thin edges correspond to genetic interactions and thick edges to mixed genetic and physical interactions, where the color of the edge indicates whether the genetic interaction is negative or positive. Two parallel lines indicate pure physical interactions. The sensitivity or resistance of each gene deletion strain was assigned based on the sign of the FD-score. The gray nodes indicate those genes which are not present in the homozygous dataset (e.g. essential genes). The nodes with bolded border lines indicate those gene deletion strains having evidence for rapamycin sensitivity in SGD (http://www.yeastgenome.org) database.19 The protein complexes are shaded, and the background color indicates which GO process a gene is related to. The network was constructed using Cytoscape.20 The interactions between genes were retrieved from the BioGRID,21 version 3.1.85. |
A total of 14 of the top 50 genes were linked to either one of these selected GO terms. Four of these genes (SYS1, VPS5, VPS29, VPS35) were linked to more than 20% of the different chemical treatments in the homozygous dataset according to the SR*-score, hence defining the corresponding gene as multi-drug resistant (MDR).4 Thus, the remaining 10 genes were defined specific to rapamycin. The sensitivity or resistance of the corresponding gene deletion strain was assigned based on the sign of the FD-score. For example, target of rapamycin, TOR1, had ranks 7 and 4 when using the SR*- and FD-scores, respectively (ranks 92 and 14 for the ρ*- and I*-scores; Table S3, ESI†). Thus, TOR1 appears in the network as a sensitive node.
Notably, each of the protein complexes present in the network included at least one of the 14 rapamycin-interacting genes (Fig. 4). For instance, the two targets in the RIC1p–RGP1p and GARP complexes, RGP1 and VPS51, respectively, were found early when using the SR*-score. While missing in STITCH, these gene deletion strains have in fact been reported as sensitive to rapamycin in Saccharomyces Genome Database (SGD).19
In the retromer complex, all of the four genes present in the homozygous dataset were linked to rapamycin using the SR*-score. Interestingly, one of these genes, VPS17, is not related to rapamycin according to SGD or STITCH. However, this gene shares both genetic and physical interaction partners with the other members of the retromer complex, the deletion of which is known to lead to rapamycin sensitivity.19 Also, the SR*-score assigns an early rank for this gene, suggesting a potential interaction between rapamycin and VPS17 for further study.
One of the genes, namely YPT6, seems to be a central pathway hub (i.e. highly connected network node). While there is evidence for this gene deletion strain being sensitive to rapamycin in SGD, it was not among the top-ranked targets by the SR*-score. However, the rapamycin-dependence can be predicted based on the network's connectivity structure, further demonstrating the importance of such integrative analysis.
In general, most of these selected gene deletion strains present in the network were sensitive to rapamycin, whereas there were only a few resistant strains, YPT32, SFT2 and GGA1, which all seem to reside in the non-central part of the network not belonging to any of the protein complexes. As expected, physical interactions are common within complexes, while between-complex interactions tend to be genetic, with the negative type being the dominant one. These results suggest that protein complexes have a central role when modelling polygenic response patterns to chemical treatment.
Surprisingly, the heterozygous profiling (HIP) turned out to be less effective at recovering drug–gene interactions reported in STITCH compared to the haploid and homozygous diploid deletion profiling (HOP). The underlying technical similarities between the HOP chemical-genomic assays and the haploid double-mutant genetic interaction assays cannot account for the better performance of the fitness defect-based method (FD-score), since this score does not rely on the genetic interactions. Neither does the lack of most of the essential genes explain the poor performance of the heterozygous dataset compared to the HOP assays (Fig. S9, ESI†). Perhaps more likely, this result could reflect the fact that in the HIP profiling only direct targets of the chemical under analysis are recovered, whereas in the HOP profiling also genes involved in buffering the drug target pathway can be detected.2,3,5 Indeed, when studying the set of well-established direct drug–target interactions, the HIP FD-score performed as expected (Table S3, ESI†). For instance, the HIP profiling recovered as its top-rank HMG1, the target of anticholesterol drugs atorvastatin and lovastatin. As another example, TUB3, the target of antifungal drugs thiabendazole and nocodazole, was also top-ranked (ranks 1 and 12 with the FD-score, respectively). Interestingly, the recovery of the interaction between TUB3 and nocodazole can further be improved by the I-score (rank 1) and the SR-score (rank 4).
In general, the target rankings based on the fitness defects (FD-score) were more accurate at recovering the STITCH links compared to the approach utilizing the conventional profile correlations (ρ-score). However, the target detection could be improved by applying the integrative scoring approaches (I- and SR-scores). An advantage of the SR-score is that it captures a wide range of drug–target links recovered either by the FD- or ρ-score, with an overall improved accuracy compared to the previously proposed integration approach (I-score). Moreover, being rank-based statistic, the SR-score is relatively insensitive to both outliers and missing data points, which are quite frequent in high-throughput screening datasets. Here, for instance, the homozygous chemical-genomic screening dataset had a missing value rate of 10%, making the traditional data mining approaches, such as the FD-score, vulnerable to unobserved interactions. This may also partly explain why the heterozygous dataset has conventionally been considered more informative for drug target analysis.
A limitation of any correlation-based approach is that they cannot determine whether the gene or mutation is associated with compound sensitivity or resistance. However, by combining the information obtained from the SR- and FD-scores enables one to construct drug–target networks and to analyse the gene nodes and their interactions with respect to sensitivity and resistance in the context of selected biological processes (Fig. 4). In the interpretation of such drug–gene-pathway relationships, physical interactions and complexes were found to provide useful information by which to decompose the complex network into cross-connected sub-network modules. This is in line with recent studies, which show that functional modules, such as protein complexes and biological pathways, are central in explaining the genetic landscape of yeast.10,13,23 Moreover, protein complexes are relatively stable in response to chemical perturbation, even when their functional connections are reorganized,24 suggesting that protein complexes could serve as robust processing units when modelling, explaining and predicting drug responses on a global network level.25
It was also found out that the pure correlation-based approach, such as the ρ-score used in previous studies,3,7,10,11 seems to be sub-optimal when identifying targets of bioactive compounds. Interestingly, the performance of the correlation-based scoring approaches could further be enhanced by focusing on the negative tail of the genetic interactions (i.e. synthetic lethal/sick interactions). This supports the earlier results,7,11 also when using the recent quantitative SGA assays which allow the detection of both the positive and negative ends of the genetic interaction spectrum.10 While the positive genetic interactions often connect functionally distinct protein complexes,13 we showed here that the negative genetic interactions were especially useful for predicting drug–target interactions. In line with this observation, pathway-specific hubs in the synthetic lethal genetic interaction network were recently used to predict compounds that would target a given pathway of interest.26 However, since there may be both negative and positive genetic interactions within and between functional modules,10,22,27,28 it seems likely that the whole spectrum of quantitative genetic interactions is needed when eventually moving from drug–target interactions toward predicting drug target pathways and networks.29
Once carefully evaluated in the high-quality yeast datasets, the same principles can later be applied to drug and target discovery in human diseases. In particular, the concept of synthetic lethality has recently gained much interest as a principled strategy to develop more effective and selective cancer treatments.30–32 However, despite the advances in biotechnologies, such as RNA interference and high-throughput chemical screening, which enable systematic detection of synthetic lethal interactions in human cells, there remain experimental and computational challenges in the discovery of new drug targets for personalized therapies.14,33 Computational scoring approaches, such as those evaluated here, play a major role in the drug discovery process by identifying the most promising chemical compounds and their cellular targets. These results should therefore prove useful also for future developments in network pharmacology,34e.g., for explaining observed polypharmacology of single drugs or predicting effective drug combinations to fight treatment resistance.35
Footnote |
† Electronic supplementary information (ESI) available: Supplementary Fig. S1–S9 and supplementary Tables S1–S5. See DOI: 10.1039/c3mb25591c |
This journal is © The Royal Society of Chemistry 2013 |