LASSO modeling of the Arabidopsis thaliana seed/seedling transcriptome: a model case for detection of novel mucilage and pectin metabolism genes
Abstract
Whole genome transcript correlation-based approaches have been shown to be enormously useful for candidate gene detection. Consequently, simple Pearson correlation has been widely applied in several web based tools. That said, several more sophisticated methods based on e.g. mutual information or Bayesian network inference have been developed and have been shown to be theoretically superior but are not yet commonly applied. Here, we propose the application of a recently developed statistical regression technique, the LASSO, to detect novel candidates from high throughput transcriptomic datasets. We apply the LASSO to a tissue specific dataset in the model plant Arabidopsis thaliana to identify novel players in Arabidopsis thaliana seed coat mucilage synthesis. We built LASSO models based on a list of genes known to be involved in a sub-pathway of Arabidopsis mucilage synthesis. After identifying a putative transcription factor, we verified its involvement in mucilage synthesis by obtaining knock-out mutants for this gene. We show that a loss of function of this putative transcription factor leads to a significant decrease in mucilage
- This article is part of the themed collection: Molecular BioSystems Emerging Investigators 2012