Robust determination of differential abundance in shotgun proteomics using nonparametric statistics†
Label-free shotgun mass spectrometry enables the detection of significant changes in protein abundance between different conditions. Due to often limited cohort sizes or replication, large ratios of potential protein markers to number of samples, as well as multiple null measurements pose important technical challenges to conventional parametric models. From a statistical perspective, a scenario similar to that of unlabeled proteomics is encountered in genomics when looking for differentially expressed genes. Still, the difficulty of detecting a large fraction of the true positives without a high false discovery rate is arguably greater in proteomics due to even smaller sample sizes and peptide-to-peptide variability in detectability. These constraints argue for nonparametric (or distribution-free) tests on normalized peptide values, thus minimizing the number of free parameters, as well as for measuring significance with permutation testing. We propose such a procedure with a class-based statistic, no parametric assumptions, and no parameters to select other than a nominal false discovery rate. Our method was tested on a new dataset which is available via ProteomeXchange with identifier PXD006447. The dataset was prepared using a standard proteolytic digest of a human protein mixture at 1.5-fold to 3-fold protein concentration changes and diluted into a constant background of yeast proteins. We demonstrate its superiority relative to other approaches in terms of the realized sensitivity and realized false discovery rates determined by ground truth, and recommend it for detecting differentially abundant proteins from MS data.