Inferring gene functions through dissection of relevance networks: interleaving the intra- and inter-species views†
Abstract
Inference of accurate gene annotations requires integration of existing biological knowledge, structured in a form of ontology, with data from transcriptomics high-throughput technologies. This undertaking requires developing algorithms that integrate genome-scale data, even for model organisms. Gene relevance networks have emerged as a powerful representative of the structure of the data. Such networks can be used for intra-species transfer of gene annotations following the guilt-by-association principle. An analogous principle can serve as a basis for inter-species transfer of gene annotations by comparing well-defined subnetworks. In this review, we compare and contrast the concepts of relevance and proximity networks and briefly review the concept of semantic similarity. We then provide a detailed account of quantitative guilt-by-association inference in the setting of genome-scale relevance networks. Moreover, we systematically survey the existing network-based approaches for automated gene function annotation and categorize them under one umbrella in terms of employed methodology. Furthermore, we discuss suitable data selection strategies required for deriving meaningful and unbiased genome-scale networks from large transcriptomics compendia. Lastly, by simulating gene function prediction with a classical network-based algorithm, we show how the number of genes of unknown function influences prediction within a species and pinpoint the need and the requirements for inter-species knowledge transfer.