Issue 14, 2019

Informatics analysis of capillary electropherograms of autologously doped and undoped blood

Abstract

An ‘Autologous Blood Transfusion’ (ABT) is the reinjection of blood previously taken from an athlete to increase its oxygen transport capabilities. Despite the World Anti-Doping Agency's ban on such practices, ABT abuse continues. Autologous blood doping (ABD) is challenging to detect because of the similarities between an individual's doped and undoped blood. Recently, Harrison et al. reported that high-speed capillary electrophoresis may identify ABD. In their work, first order derivatives of the electropherograms were used to identify doping. However, this method suffered from false negatives due to the subjective nature of the analysis. Here, we provide an informatics analysis of the data from this study, contrasting the results of traditional statistical methods and less traditional mathematical techniques. First, three well-known multivariate statistical tools: cluster analysis, principal component analysis (PCA), and partial least squares (PLS) are applied to develop calibrations and/or group electropherograms of undoped (0%) and doped (5% and 10%) blood samples. (These doping levels were chosen due to the low physiological effect of doping below 5%, with 10% corresponding to the approximate ‘gain’ derived from the transfusion of a single unit of blood into an adult.) Different preprocessing and variable selection methods were considered. Due to variation in the electropherograms and the limited sample size, these methods were inadequate. We next considered four less commonly used mathematical/informatics tools: pattern recognition entropy (PRE), the Euclidean distance between vectors, a peak fitting/integration method, and the second moment (SM). Each of these techniques showed some ability to differentiate between the 0, 5, and 10% doped samples. We then evaluated the prediction capabilities of inverse least squares (ILS) models based on these summary statistics. An ILS calibration based on PRE, the Euclidean distance, and peak fitting/integration proved more successful than the PLS model at predicting levels of blood doping from the corresponding electropherograms; the ILS model distinguished between doped (5% and 10%) and undoped (0%) blood. This methodology may be applicable to other challenging informatics problems like determining risk factors for genetically linked diseases, robust pattern finding in peak-like data such as ChIP-seq, or other genomic sequencing for understanding the 3D genome.

Graphical abstract: Informatics analysis of capillary electropherograms of autologously doped and undoped blood

Supplementary files

Article information

Article type
Paper
Submitted
24 Janv. 2019
Accepted
22 Febr. 2019
First published
28 Febr. 2019

Anal. Methods, 2019,11, 1868-1878

Informatics analysis of capillary electropherograms of autologously doped and undoped blood

S. Chatterjee, S. C. Chapman, G. H. Major, D. L. Eggett, B. M. Lunt, C. R. Harrison and M. R. Linford, Anal. Methods, 2019, 11, 1868 DOI: 10.1039/C9AY00192A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements