Jump to main content
Jump to site search


Choosing proper normalization is essential for discovery of sparse glycan biomarkers

Author affiliations

Abstract

Rapid progress in high-throughput glycomics analysis enables the researchers to conduct large sample studies. Typically, the between-subject differences in total abundance of raw glycomics data are very large, and it is necessary to reduce the differences, making measurements comparable across samples. Essentially there are two ways to approach this issue: row-wise and column-wise normalization. In glycomics, the differences per subject are usually forced to be exactly zero, by scaling each sample having the sum of all glycan intensities equal to 100%. This total area (row-wise) normalization (TA) results in so-called compositional data, rendering many standard multivariate statistical methods inappropriate or inapplicable. Ignoring the compositional nature of the data, moreover, may lead to spurious results. Alternatively, a log-transformation to the raw data can be performed prior to column-wise normalization and implementing standard statistical tools. Until now, there is no clear consensus on the appropriate normalization method applied to glycomics data. Nor is systematic investigation of impact of TA on downstream analysis available to justify the choice of TA. Our motivation lies in efficient variable selection to identify glycan biomarkers with regard to accurate prediction as well as interpretability of the model chosen. Via extensive simulations we investigate how different normalization methods affect the performance of variable selection, and compare their performance. We also address the effect of various types of measurement error in glycans: additive, multiplicative and two-component error. We show that when sample-wise differences are not large row-wise normalization (like TA) can have deleterious effects on variable selection and prediction.

Graphical abstract: Choosing proper normalization is essential for discovery of sparse glycan biomarkers

Back to tab navigation

Supplementary files

Article information


Submitted
29 Nov 2019
Accepted
10 Mar 2020
First published
10 Mar 2020

This article is Open Access

Mol. Omics, 2020, Advance Article
Article type
Research Article

Choosing proper normalization is essential for discovery of sparse glycan biomarkers

H. Uh, L. Klarić, I. Ugrina, G. Lauc, A. K. Smilde and J. J. Houwing-Duistermaat, Mol. Omics, 2020, Advance Article , DOI: 10.1039/C9MO00174C

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. Material from this article can be used in other publications provided that the correct acknowledgement is given with the reproduced material.

Reproduced material should be attributed as follows:

  • For reproduction of material from NJC:
    [Original citation] - Published by The Royal Society of Chemistry (RSC) on behalf of the Centre National de la Recherche Scientifique (CNRS) and the RSC.
  • For reproduction of material from PCCP:
    [Original citation] - Published by the PCCP Owner Societies.
  • For reproduction of material from PPS:
    [Original citation] - Published by The Royal Society of Chemistry (RSC) on behalf of the European Society for Photobiology, the European Photochemistry Association, and RSC.
  • For reproduction of material from all other RSC journals:
    [Original citation] - Published by The Royal Society of Chemistry.

Information about reproducing material from RSC articles with different licences is available on our Permission Requests page.


Social activity

Search articles by author

Spotlight

Advertisements