Jump to main content
Jump to site search

Issue 7, 2012
Previous Article Next Article

Is newer better?—evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae

Author affiliations


Recent high-throughput experiments have produced a wealth of heterogeneous datasets, each of which provides information about different aspects of the cell. Consequently, integration of diverse data types is essential in order to address many biological questions. The quality of any integrated analysis system is dependent upon the quality of its component data, and upon the Gold Standard data used to evaluate it. It is commonly assumed that the quality of data improves as databases grow and change, particularly for manually curated databases. However, the validity of this assumption can be questioned, given the constant changes in the data coupled with the high level of noise associated with high-throughput experimental techniques. One of the most powerful approaches to data integration is the use of Probabilistic Functional Integrated Networks (PFINs). Here, we systematically analyse the changes in four highly-curated and widely-used online databases and evaluate the extent to which these changes affect the protein function prediction performance of PFINs in the yeast Saccharomyces cerevisiae. We find that the global trend in network performance improves over time. Where individual areas of biology are concerned, however, the most recent files do not always produce the best results. Individual datasets have unique biases towards different biological processes and by selecting and integrating relevant datasets performance can be improved. When using any type of integrated system to answer a specific biological question careful selection of raw data and Gold Standard is vital, since the most recent data may not be the most appropriate.

Graphical abstract: Is newer better?—evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae

Back to tab navigation

Supplementary files

Publication details

The article was received on 30 Sep 2011, accepted on 20 Mar 2012 and first published on 23 Apr 2012

Article type: Paper
DOI: 10.1039/C2IB00123C
Citation: Integr. Biol., 2012,4, 715-727

Search articles by author