Marfran C. D. Santosa,
Yasmin M. Nascimentobc,
Josélio M. G. Araújobc and
Kássio M. G. Lima*a
aBiological Chemistry and Chemometrics, Institute of Chemistry, Federal University of Rio Grande do Norte, Natal 59072-970, Brazil. E-mail: kassiolima@gmail.com; Tel: +55 84 3342 2323
bLaboratory of Molecular Biology for Infectious Diseases and Cancer, Department of Microbiology and Parasitology, Federal University of Rio Grande do Norte, Natal 59072-970, Brazil
cLaboratory of Virology, Institute of Tropical Medicine, Federal University of Rio Grande do Norte, Natal 59072-970, Brazil
First published on 12th May 2017
In most cases of virus infections the viral load is directly related to the intensity of the disease. Nowadays, the main routine diagnoses for dengue fever are only qualitative, they only inform us if the patient has dengue fever or not. However, it is important to be aware of the patient's viral load so that proper care can be taken. In this study we used attenuated total reflection Fourier transform infrared spectroscopy (ATR-FTIR) coupled with multivariate analysis techniques to identify and discriminate dengue serotype 3 (DENV-3) diluted in different concentrations in serum and blood samples with the purpose of developing a simple, fast and non-destructive methodology for a quantitative analysis of the dengue virus. Techniques such as principal component analysis-linear discriminant analysis (PCA-LDA), successive projection algorithm – linear discriminant analysis (SPA-LDA) and genetic algorithm – linear discriminant analysis (GA-LDA) were applied in this classification problem. Forty samples (40 for serum and 40 for blood) were infected with DENV-3 at different concentrations (ten samples for each concentration) and analyzed by IR spectroscopy. The results showed that the models were successful in classifying the virus, the best results being for blood samples. The results of the multivariate classification were tested based on sensitivity, specificity, positive and negative predictive values, Youden's index and positive and negative likelihood ratios, suggesting that ATR-FTIR spectroscopy coupled with multivariate analysis algorithms is an effective tool in quantifying the dengue virus in providing rapid results, in addition to being non-destructive to the sample.
Clinical complications caused by dengue virus infection may range from mild asymptomatic infection to dengue fever (DF) or more severe manifestations such as dengue hemorrhagic fever and dengue shock syndrome (DSS).5–7 In most cases, patients do not feel the symptoms or feel an undifferentiated fever with or without redness showing on the body. In the case of DF symptoms, they are usually a high fever, headache, muscle pain (myalgia), joint pain (arthralgia), pain behind the eyes and redness on the body. Patients with DHF and DSS present high fever, bleeding, thrombocytopenia and hemoconcentration as the main symptoms, and may develop other complications such as pleural effusion and gastrointestinal or gingival bleeding.7,8
Viral isolation, RNA detection, antigen detection and serological methods for detecting IgM and IgG are among the most widely used methods in laboratory diagnosis and in virological studies,9,10 but both of these methods have some limitations such as sample handling, requiring samples in the acute phase, and are time-consuming to achieve results, among others. Analyzing diagnostic methods, the most widely used methods in hospitals and diagnostic clinics are serological methods. These methods are qualitative, that is, they only inform if the patient is or is not infected. However, knowledge of the patient's viral load is of great importance, since in most cases the viral load is directly related to the intensity and stage of the disease. The more viruses circulating, the more severe the disease is normally, and the greater the care taken. Finally, identification of the viral load, for some viruses (HIV mainly), can be used to evaluate if the treatment is being effective (if the viral load decreases, it is assumed that the treatment is working). In the case of dengue, rapid determination of viral load can be used as a parameter to decide whether or not treatment is needed. With this in mind, a quantitative test that is inexpensive, quick and requires no sample handling is necessary so that clinical treatment can be started as quickly as possible.
Spectroscopy studies the behavior of samples against their interaction with radiation. Spectroscopic techniques are known to provide fast and reliable results and its use in biological studies has been defined by biospectroscopy.11 Several studies have been carried out using spectroscopy for virology purposes, such as detection and quantification of the poliovirus using FTIR spectroscopy,12 detection of hepatitis C virus infection using NMR spectroscopy,13 and diagnosis of HIV-1 infection using near-infrared spectroscopy,14 among others, showing promising results regarding the technique's ability to identify the presence of the virus.
The mid-infrared region comprises the 400 to 4000 cm−1 of the electromagnetic spectrum. This radiation is absorbed by molecules present in biological samples. The 900 to 1800 cm−1 range is known as the biomolecule fingerprint region, because spectral bands present referral tolipids (∼1750 cm−1), carbohydrates (∼1155 cm−1), proteins (amide I, ∼1650 cm−1, amide II, ∼1550 cm−1, amide III, ∼1260 cm−1) and DNA/RNA (∼1225 cm−1, 1080 cm−1). 15,16 In general, attenuated total reflection Fourier-transform infrared spectroscopy (ATR-FTIR) can be used for collecting spectra in this range.15
When interrogating different biological samples with FTIR, the generated spectra have a lot of information which can make interpreting difficult; therefore, it is convenient to use algorithms that aid and facilitate spectral interpretation. In this study we used principal component analysis (PCA – it reduces data dimensionality, making use of the components that explain data variability),16 successive projection algorithm and genetic algorithm (SPA and GA – they reduce the size of the data, selecting the variables that discriminate classes)17,18 and linear discriminant analysis (LDA – it provides a maximum separation of classes through the ratio of the variance between classes and within the classes) as chemometric algorithms of multivariate classification for discriminating classes via spectral information.
Classification by different concentrations of the dengue virus through ATR-FTIR spectra using the PCA-LDA, SPA-LDA and GA-LDA algorithms has never been studied. This study aims to quantitatively discriminate DENV-3 samples in serum and blood. The results were encouraging and show the potential of the technique to identify and quantify DENV-3, and it may be used in the future as ancillary tools for clinical diagnostics.
X = TPT + E | (1) |
Successive Projection Algorithm (SPA) is a variable selection method (in this case, the variables are the wavenumbers) that uses simple operations in a vector space to minimize variable collinearity. SPA is a direct selection method, meaning it starts with 1 variable and then incorporates another variable in each interaction until it reaches a number of more discriminant N variables.20
Genetic algorithm (GA) was another technique used for variable selection. This algorithm makes use of techniques based on biological genetics and evolution. A population is created with n subsets, where each subset is composed by a random combination of variables (wavenumbers). Each subset is formed by m (the total number of variables that can be chosen), 1's (variables selected by the model) and 0's (unselected variables). Therefore, in genetic terms, each variable represents a gene, and a set of variables represents a chromosome. For example, for a selection problem with 10 variables, a chromosome could be 1001010110, where variables 1, 4, 6, 8 and 9 would be the variables selected for the model, and variables 2, 3, 5, 7 and 10 would be the variables to be optimized.21 In this study for the GA routine, the number of individuals (population) for each generation was 24, with the number of generations equal to 12. The genetic operator mutation and crossover were held constant at 10 and 60%, respectively. GA was repeated three times and the best result was used. In this study, in order to select the best optimal number of variables for SPA and GA we used a cost function calculated in the validation set,22 as shown in eqn (2):
(2) |
(3) |
Linear Discriminant Analysis (LDA) was another technique employed. It is a supervised technique that is based on the discriminant process developed by Fisher in 1936. LDA maximizes the ratio between the variance between classes and intraclass variation in any particular data set, thus ensuring maximum separability. LDA is efficient when combined with dimensionality reduction techniques (such as PCA, SPA and GA).
The results of the multivariate classification for PCA-LDA, SPA-LDA and GA-LDA were tested based on:sensitivity (confidence in obtaining a positive result for a truly positive sample):
(4) |
(5) |
(6) |
(7) |
YOU = sens − (1 − spec) | (8) |
(9) |
(10) |
MATLAB R2012b software (Math-works, Natick, USA) was used for data import, pre-treatment, and construction of the chemometric classification models. The raw spectra were pre-processed with cuts between 900 and 1800 cm−1 (235 wavenumbers at 4 cm−1 spectral resolution), baseline-corrected and Savitzky–Golay smoothing (window 15 points). The samples were divided into three sets for the PCA-LDA, SPA-LDA and GA-LDA models (60% for training, 20% for validation and 20% for prediction) using the classic Kennard–Stone uniform sampling algorithm.23 The KS algorithm was applied separately for each class to maximize the Euclidean minimum distances between selected and unselected samples.
Fig. 1 Raw spectra for each original class: (a) DENV-3 in serum (1 × 103, 1 × 102, 1 and 0, 1 PFU mL−1); (b) DENV-3 in blood (1 × 103, 1 × 102, 10 and 0, 1 PFU mL−1). |
Due to the great similarity between the spectra of the different classes, it is necessary to use algorithms (PCA-LDA, SPA-LDA and GA-LDA, in this case) that mathematically find spectral information capable of discriminating one class from another. The models were applied to classify the serum and blood samples by the concentration of DENV-3 present, and finally their performances were compared. The results are discussed below.
Fig. 3 DF1 × DF2 discriminant function values calculated with 4 PC's for: (a) DENV-3 in serum (1 × 103, 1 × 102, 1 and 0,1 PFU mL−1); (b) DENV-3 in blood (1 × 103, 1 × 102, 10 and 0.1 PFU mL−1). |
In the case of infected serum samples the SPA-LDA selected 20 wave numbers, while for the case of infected blood samples the SPA-LDA selected 19 wave numbers (Table 1). The selected wave numbers are like biological markers, selected by the models as the variables that most discriminate one class from another. Fig. 4a and b show the graphs of the selected variables by SPA-LDA for serum and blood DENV-3 samples, respectively. Using these selected variables, the Fisher scores were calculated (shown in Fig. 5). As can be seen, Fisher scores for SPA-LDA show a better segregation of classes than those calculated for PCA-LDA, being visually better separated for dengue-3 samples in blood than in serum.
Chemometric analysis | Wavenumber selected (cm−1) |
---|---|
SPA-LDA for DENV-3 in serum | 908, 943, 989, 1041, 1080, 1124, 1144, 1194, 1302, 1315, 1360, 1441, 1477, 1500, 1541, 1578, 1632, 1695, 1724, 1801 |
SPA-LDA for DENV-3 in blood | 918, 989, 1049, 1076, 1105, 1151, 1227, 1304, 1317, 1356, 1414, 1489, 1524, 1564, 1618, 1653, 1682, 1718, 1801 |
Fig. 4 Graph of the variables selected by SPA-LDA, marked in the average spectrum of: (a) DENV-3 in serum and (b) DENV-3 in blood. |
In analyzing the wavenumbers selected by SPA-LDA for the contaminated serum samples, we observed that the main biological changes of interest that discriminate the different concentrations are related to amide II of proteins (≈1500 cm−1) and RNA (≈1080 cm−1) vibrations. When examining the wavenumbers selected by SPA-LDA for infected blood samples, the main changes are related to carbohydrate (≈1151 cm−1), protein structures (amide I, ≈1653 cm−1, amide III, ≈1317 cm−1), and RNA (≈1227 cm−1, ≈1076 cm−1) vibrations.
In the case of the GA-LDA, 17 wavenumbers were selected for the infected serum samples and 11 wavenumbers for the infected blood samples (Table 2). The graph of the selected variables can be seen in Fig. 6. As can be seen in Fig. 7, Fisher scores showed good visual segregation between classes, mainly in the case of infected blood samples (as in the case of SPA-LDA).
Chemometric analysis | Wavenumber selected (cm−1) |
---|---|
GA-LDA for DENV-3 in serum | 905, 926, 949, 984, 1047, 1055, 1176, 1196, 1277, 1311, 1321, 1396, 1448, 1554, 1603, 1753, 1761 |
GA-LDA for DENV-3 in blood | 986, 1053, 1065, 1225, 1337, 1352, 1529, 1541, 1666, 1686, 1689 |
Fig. 6 Graph of the variables selected by GA-LDA, marked in the average spectrum of: (a) DENV-3 in serum and (b) DENV-3 in blood. |
Among the 17 wavenumbers selected by GA-LDA that best discriminate infected serum samples, we can highlight changes related to protein structures (amide III, ≈1311 cm−1, amide II, ≈1554 cm−1), COO− symmetric stretch in fatty acids (≈1396 cm−1) and lipid (≈1753 cm−1) vibrations. Among the 11 wavenumbers selected by GA-LDA in the case of infected blood samples, those that appear to be of major biological interest are related to RNA (≈1238 cm−1), protein structures (amide III, ≈1329 cm−1, amide I, ≈1661 cm−1) and lipid (≈1743 cm−1) vibrations. The changes associated with RNA make sense, since even within a single serotype there are minimal differences found between the RNA of one viral particle and another.
The good segregations shown for SPA-LDA and for GA-LDA give us an indication that these chemometric models will provide good results for figures of merit [sensitivity, specificity, PPV, NPV, YOU, LR(+) and LR(−)].
The performances of the method were evaluated through validation measures. Tables 3 and 4 presents the performance measures for PCA-LDA, SPA-LDA and GA-LDA for serum and blood samples with different concentrations of DENV-3, respectively. As can be seen, the models made some errors for the serum samples, however they obtained an excellent classification performance for the blood samples (in this case, PCA-LDA, SPA-LDA and GA-LDA obtained 100% sensitivity and specificity), demonstrating that ATR-FTIR spectroscopy together with classification techniques has the potential to quantitatively discriminate the dengue virus (in this case DENV-3) in serum and blood, and may, in the near future, assist in the more detailed diagnosis and correct treatment of patients.
DENV-3 in serum | |||
---|---|---|---|
Stage performance features | PCA-LDA | SPA-LDA | GA-LDA |
1 × 103 PFU mL−1 | |||
Sensitivity (%) | 100.0 | 100.0 | 50.0 |
Specificity (%) | 100.0 | 100.0 | 100.0 |
Positive predictive values (PPV) | 100.0 | 100.0 | 100.0 |
Negative predictive values (NPV) | 100.0 | 100.0 | 66.6 |
Youden index (YOU) | 100.0 | 100.0 | 50.0 |
Positive likelihood ratios (LR+) | 0.0 | 0.0 | 0.0 |
Negative likelihood ratios (LR−) | 0.0 | 0.0 | 0.5 |
1 × 102 PFU mL−1 | |||
Sensitivity (%) | 100.0 | 100.0 | 100.0 |
Specificity (%) | 100.0 | 100.0 | 50.0 |
Positive predictive values (PPV) | 100.0 | 100.0 | 66.6 |
Negative predictive values (NPV) | 100.0 | 100.0 | 100.0 |
Youden index (YOU) | 100.0 | 100.0 | 50.0 |
Positive likelihood ratios (LR+) | 0.0 | 0.0 | 2.0 |
Negative likelihood ratios (LR−) | 0.0 | 0.0 | 0.0 |
1 PFU mL−1 | |||
Sensitivity (%) | 0.0 | 50.0 | 50.0 |
Specificity (%) | 0.0 | 50.0 | 100.0 |
Positive predictive values (PPV) | 0.0 | 50.0 | 100.0 |
Negative predictive values (NPV) | 0.05 | 0.0 | 66.6 |
Youden index (YOU) | 100.0 | 0.0 | 50.0 |
Positive likelihood ratios (LR+) | 0.0 | 1.0 | 0.0 |
Negative likelihood ratios (LR−) | 0.0 | 1.0 | 0.5 |
0.1 PFU mL−1 | |||
Sensitivity (%) | 50.0 | 100.0 | 0.0 |
Specificity (%) | 50.0 | 100.0 | 100.0 |
Positive predictive values (PPV) | 50.0 | 100.0 | 0.0 |
Negative predictive values (NPV) | 50.0 | 100.0 | 50.0 |
Youden index (YOU) | 0.0 | 100.0 | 0.0 |
Positive likelihood ratios (LR+) | 1.0 | 0.0 | 0.0 |
Negative likelihood ratios (LR−) | 1.0 | 0.0 | 1.0 |
DENV-3 in blood | |||
---|---|---|---|
Stage performance features | PCA-LDA | SPA-LDA | GA-LDA |
1 × 103 PFU mL−1 | |||
Sensitivity (%) | 100.0 | 100.0 | 100.0 |
Specificity (%) | 100.0 | 100.0 | 100.0 |
Positive predictive values (PPV) | 100.0 | 100.0 | 100.0 |
Negative predictive values (NPV) | 100.0 | 100.0 | 100.0 |
Youden index (YOU) | 100.0 | 100.0 | 100.0 |
Positive likelihood ratios (LR+) | 0.0 | 0.0 | 0.0 |
Negative likelihood ratios (LR−) | 0.0 | 0.0 | 0.0 |
1 × 102 PFU mL−1 | |||
Sensitivity (%) | 100.0 | 100.0 | 100.0 |
Specificity (%) | 100.0 | 100.0 | 100.0 |
Positive predictive values (PPV) | 100.0 | 100.0 | 100.0 |
Negative predictive values (NPV) | 100.0 | 100.0 | 100.0 |
Youden index (YOU) | 100.0 | 100.0 | 100.0 |
Positive likelihood ratios (LR+) | 0.0 | 0.0 | 0.0 |
Negative likelihood ratios (LR−) | 0.0 | 0.0 | 0.0 |
10 PFU mL−1 | |||
Sensitivity (%) | 100.0 | 100.0 | 100.0 |
Specificity (%) | 100.0 | 100.0 | 100.0 |
Positive predictive values (PPV) | 100.0 | 100.0 | 100.0 |
Negative predictive values (NPV) | 100.0 | 100.0 | 100.0 |
Youden index (YOU) | 100.0 | 100.0 | 100.0 |
Positive likelihood ratios (LR+) | 0.0 | 0.0 | 0.0 |
Negative likelihood ratios (LR−) | 0.0 | 0.0 | 0.0 |
0.1 PFU mL−1 | |||
Sensitivity (%) | 100.0 | 100.0 | 100.0 |
Specificity (%) | 100.0 | 100.0 | 100.0 |
Positive predictive values (PPV) | 100.0 | 100.0 | 100.0 |
Negative predictive values (NPV) | 100.0 | 100.0 | 100.0 |
Youden index (YOU) | 100.0 | 100.0 | 100.0 |
Positive likelihood ratios (LR+) | 0.0 | 0.0 | 0.0 |
Negative likelihood ratios (LR−) | 0.0 | 0.0 | 0.0 |
This journal is © The Royal Society of Chemistry 2017 |