Jump to main content
Jump to site search


An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features

Abstract

Prediction of essential genes helps to identify a minimal set of genes that are absolutely required for a cell for appropriate functioning and survival. Available machine learning techniques for essential gene predictions are inherent with problems like imbalanced provision of training datasets, choice of a best model biased for a given balanced dataset, choice of a complex machine learning algorithm, and data-based automated selection of biologically relevant features for classification. Here, we propose a simple support vector machine-based learning strategy for prediction of essential genes in Escherichia coli K-12 MG1655 metabolism that integrates a non-conventional combination of an appropriate sample balanced training set, unique organism-specific genotype, phenotype attributes that characterize essential genes, and optimal parameters of the learning algorithm to generate a best machine learning model (model with the highest accuracy among all models trained for different sample training sets) for essential gene identification. For the first time, we also introduce flux-coupled metabolic subnetwork based features for enhancing the classification performance. Our strategy proves to be superior as compared to previous SVM-based strategies in obtaining a biologically relevant classification of genes with high sensitivity and very low specificity. This methodology was also trained with datasets of other recent supervised classification techniques for essential gene classification and tested using reported test datasets. Testing accuracy was always high as compared to the known techniques proving that the method outperforms the known methods. Observations from our study indicate that essential genes are conserved among homologous bacterial species, demonstrate codon usage biasedness, GC content, high gene expression, and predominantly possess a tendency to form physiological flux modules within metabolism.

Back to tab navigation

Supplementary files

Publication details

The article was received on 19 Apr 2017, accepted on 14 Jun 2017 and first published on 14 Jun 2017


Article type: Paper
DOI: 10.1039/C7MB00234C
Citation: Mol. BioSyst., 2017, Accepted Manuscript
  •   Request permissions

    An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features

    S. Nandi, A. Subramanian and R. Sarkar, Mol. BioSyst., 2017, Accepted Manuscript , DOI: 10.1039/C7MB00234C

Search articles by author

Spotlight

Advertisements