ATR-FTIR and multivariate analysis as a screening tool for cervical cancer in women from northeast Brazil: a biospectroscopic approach

Ana C. O. Nevesa, Priscila P. Silvaa, Camilo L. M. Moraisa, Cleine G. Mirandab, Janaina C. O. Crispimb and Kássio M. G. Lima*a
aInstitute of Chemistry, Biological Chemistry and Chemometrics, Federal University of Rio Grande do Norte, Natal 59072-970, RN, Brazil. E-mail: kassiolima@gmail.com; Tel: +55 84 3342 2323
bHealthy Sciences Center, Federal University of Rio Grande do Norte, Natal 59010-180, RN, Brazil

Received 25th August 2016 , Accepted 7th October 2016

First published on 14th October 2016


Abstract

Cervical cancer is the fourth most frequent cancer in women worldwide and the third in Brazil. Screening methods can substantially reduce new cases of cervical cancer by identifying pre-cancerous lesions, making it possible to offer correct management and treatment. For this purpose, this work reports the use of attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy coupled with principal component analysis (PCA) and variable selection techniques, such as successive projections algorithm (SPA) and genetic algorithm (GA) associated to linear or quadratic discriminant analysis (LDA/QDA), to classify samples for negative for intraepithelial lesion or malignancy (NILM), n = 43, and squamous intraepithelial lesion (SIL), n = 40, directly from blood plasma. Furthermore, the possibility to categorize SIL subclasses according to low-grade squamous intraepithelial lesion (LSIL) and high-grade squamous intraepithelial lesion (HSIL) lesion degrees was evaluated. Application of variable selection algorithms, especially GA, considerably improved the classifications by choosing spectral variables that reflect the chemical differences between a healthy and pre-cancerous plasma sample. This method was able to correctly classify NILM vs. SIL with sensitivity and specificity for both classes varying around 77% using LDA. With QDA, the results were enhanced to sensitivity around 90% and specificity of 83%. NILM vs. LSIL presented sensitivity and specificity ranging between 67–94% and 82–94%, respectively. In addition, NILM vs. HSIL were found to have sensitivity and specificity from 76–97% to 73–100%, respectively, where QDA substantially provided better classifications. These findings highlight the potentiality of ATR-FTIR spectroscopy combined with multivariate analysis as a screening tool for pre-cancerous cervical lesions, which could contribute to reduce cervical cancer incidence.


Introduction

Cervical cancer is the fourth most frequent cancer in women, with a global estimate of 528[thin space (1/6-em)]000 new diagnosed cases and 266[thin space (1/6-em)]000 deaths in 2012. This disease is considerably more common in less developed regions, where it is responsible for 12% of all female cancers, accounting for almost 85% of all the cervical cancer statistics worldwide.1 The Brazilian National Cancer Institute (INCA) expects 16[thin space (1/6-em)]340 new cases of cervical cancer in 2016, corresponding to the third most incurring cancer type in Brazilian women.2 Nowadays, it is well-known that the human papillomavirus (HPV) plays a very important role in the development of cervical cancer.

HPV is a small non-enveloped virus, a member of the family papilloma viruses, with a circular double-stranded DNA genome, which infects the epithelia of skin and mucosa. More than 180 HPV types have been identified, and can be separated into high-risk, intermediate-risk and low-risk, according to their potential to induce cancer in infected tissues. The high-risk HPVs most related to cervical carcinogenicity are HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59 and 66, where HPV 16 and 18 are the most prevalent types and responsible for more than 70% of all cases of invasive cervical cancer.3–7 Although HPV infection is the most frequent sexually transmitted disease worldwide, approximately 90% of all infected women are able to clear the virus within 2 years after infection by natural action of their immune system. If the immune system does not properly fight against the virus, the infection can develop to cause cervical intraepithelial neoplasia (CIN 1, CIN 2, CIN 3, according to the severity of the lesions), which is initially an asymptomatic condition that can either spontaneously regress to normal without any treatment, or can progress to invasive cervical cancer in 5–20 years.8,9 These pre-cancerous lesions have different rates of progression to invasive cancer, where CIN 1 has a low rate, and CIN 3 has a high rate if left untreated.10,11 In the Bethesda system of classification of cervical cytology, CIN 1 is classified as low-grade squamous intraepithelial lesion (LSIL), and CIN 2 and CIN 3 are grouped together as high-grade squamous intraepithelial lesions (HSIL).12–16 In this context, it is essential to identify the occurrence/recurrence of these cervical lesions early to guarantee correct treatment and to avoid the risk of developing invasive cancer in the following years.

Some screening methods commonly used today include tests for HPV, tests to detect cervical lesions by cytology (Pap smear) and unaided visual inspection with acetic acid (VIA), being the Pap smear most currently employed in developing countries.17–19 Implementation of the Pap smear as a screening method worldwide has substantially decreased the morbidity and mortality from squamous carcinoma of the cervix. In the UK, the screening program using the Pap smear has considerably reduced the incidence of cervical cancer to become the eleventh most common female cancer in this region.20,21 However, considering the human subjectivity present in this method due to the sampling and sample management being interpreted by the cytologist, the sensitivity (meaning the percentage of true positive cases detected) and/or specificity (meaning the percentage of true negative cases that are negative) of the Pap smear are 51% (30–87%) and 98% (86–100%), respectively. Sensitivity is particularly affected by the inter observer variability, and this lack of accuracy can lead to high false-negative rates that can induce failures in preventing cervical cancer, mainly in women that do not follow the correct periodicity of the screening programs.7,20 Furthermore, some questions like poorly developed healthcare services, cultural and religious factors, limited resources and information can play a role in putting up barriers to implement Pap smearing as a screening method, especially in developing or rural regions where these issues are still a strong reality.21

Infrared spectroscopy is a vibrational technique that has the capacity to analyze biological systems, since complex molecules such as proteins, lipids, carbohydrates and nucleic acids exhibit distinct vibrational behaviors according to their molecular structure and conformation.22 ATR-FTIR spectroscopy is a powerful alternative to be employed in resolving biological issues, considering its ability to reflect on the composition and variability of samples, and especially in the region of the “bio-fingerprint” (1800–900 cm−1), where many important biomolecules have individual absorbing frequencies, thereby allowing scientists to search for biomarkers and metabolic profiles.23 Remarkably, ATR-FTIR is a fast, non-destructive and clean method, making it possible to analyze a considerable number of samples in a day and to reuse them after spectra acquisition, thus avoiding the necessity of many reagents and sample handling steps, and promoting a reduction of waste generation and making the experiment more simple and cost-effective.20,22 ATR-FTIR has been attracting great attention in cancer research as a powerful tool, leading to relevant publications over the last few years.24 Theophilou and co-workers have used this technique to analyze ovarian tissues and to discriminate them between normal, borderline or malignant,22 while Lima and co-workers have applied ATR-FTIR to classify blood plasma or serum samples according to their ovarian cancer stage.23 Moreover, Purandare and co-workers have showed the capability of vibrational spectroscopy to segregate low-grade cervical cytology-based samples considering their potential to regress, remain static or progress.25 Also in this field, Lima and co-workers have successfully classified cervical cytology specimens between high-risk or low-risk HPV infection.26

ATR-FTIR is definitely a remarkable tool for studying chemical species due to its ability to provide a high number of substantial information, however, when biological samples are taken into account, this technique itself may not provide enough specificity in the search for biomarkers since there are many biomolecules contributing to the whole signal, leading to a high amount of complex data. On the other hand, multivariate analysis has been proven to be effective in overcoming this drawback, allowing for the successful use of ATR-FTIR for biological purposes. This is especially evident in the possibility to extract essential information related to biomarkers, which reflects the particularity of each chemical system. In this context, SPA and GA have made it possible to select the most significant variables from complex spectral data, which can be associated to biomarkers. For classification, the employment of these algorithms is commonly associated to LDA, in the way that samples can be separated into groups based on their spectral similarities, and the classification model is used to predict unknown samples.26

The 21st century has been characterized by the search for alternative tools in several medical fields, and in this context the combination of inexpensive spectroscopic techniques and computational treatments emerges as a very promising strategy for screening cervical cancer. In this paper, we report our findings in the application of ATR-FTIR spectroscopy and multivariate analysis to differentiate NILM and SIL classes directly from blood plasma. Furthermore, we investigated the ability of this method to separate cervical squamous intraepithelial lesions into low-grade and high-grade lesions (LSIL and HSIL), respectively. Chemometric approaches were based on the use of PCA, SPA and GA algorithms associated to linear and quadratic discriminant analysis (LDA and QDA, respectively). To the best of our knowledge, this is the first work involving screening cervical pre-cancer stages in Brazilian women using ATR-FTIR and chemometrics. In addition, PCA-QDA, SPA-QDA and GA-QDA have never been reported in literature for this purpose. Considering the high incidence of cervical cancer all over the world and the relevance of its early detection, this fast, simple and inexpensive methodology may substantially contribute to cancer prevention, especially in developing countries.

Methods

Collection and preparation of specimens

This study involved women living in the state of Rio Grande do Norte/Brazil, attending the Maternidade Escola Januário Cicco (MEJC) of the Public Health System for cervical pathology screening consultations and reference services for colposcopy, from July 2014 to January 2016. All experiments were performed in compliance with the guideline “Biomedical research ethics review method involving people” (Brazil), and approved by the medical ethics committee at Hospital Universitário Onofre Lopes (HUOL), Brazil (protocol # 526/11), Federal University of Rio Grande do Norte. Informed consents were obtained from human participants of this study.

Collection of fasting blood samples (containing the anticoagulant EDTA) was performed per patient prior to cytology smears or large loop excision surgery of the transformation zone (LLETZ), accounting for a total of 83 blood samples. In this study, atypical squamous cells (ASC) of undetermined significance (ASC-US and ASC-H) were excluded. Within two hours after blood collection by venipuncture, blood plasma was separated by density gradient, and aliquots were transferred into cryogenic tubes and stored at −80 °C until analysis. Before analysis, cytological samples (Pap smear) were obtained from women who were referred either in NILM or SIL groups. For women undergoing LLETZ surgery, histopathological analysis was performed on sections from paraffin blocks in 4 μm thickness and stained with hematoxylin/eosin. Cytology and histopathology are reported according to the Bethesda system:27 43 patients (NILM), 16 patients (LSIL) and 24 patients (HSIL).

ATR-FTIR spectroscopy

ATR-FTIR spectra [n = 830, 83 samples (NILM (n = 43), LSIL (n = 16), HSIL (n = 24))] were collected from a Bruker VERTEX 70 FTIR spectrometer (Bruker Optics Ltd., Coventry, UK) with Helios ATR attachment containing a diamond crystal internal reflective element using a 45 incidence angle of IR beam. The instrument was set up to perform a total of 16 scans with 4 cm−1 spectral resolution on both background and sample. Frozen plasmas were thawed at room temperature for 30–40 min, and 10 μL of each sample was transferred onto 10 different IR-reflective glass slides (Low-E; Kevley Technologies), resulting in a utilization of 100 μL of each plasma. Disposed plasmas were air-dried for approximately 30 min, until forming homogenous dried films23,28 and immediately all samples were submitted to ATR-FTIR spectra acquisition. ATR crystals were washed with 70% v/v alcohol before each sample spectra acquisition and a background was collected.

Data analysis

MATLAB® R2010a software (Math Works Inc, Natick, MA, USA) was used for data import, pre-treatment and construction of multivariate classification models (PCA-LDA, SPA-LDA, GA-LDA, PCA-QDA, SPA-QDA and GA-QDA). ATR-FTIR spectra were pre-processed by cutting the region of interest between 1800 and 900 cm−1 (450 wavenumbers), baseline-corrected and normalized by the amide I peak (i.e., ≈1650 cm−1). Samples were divided into training (70%), validation (15%) and prediction (15%) sets for all classification models by applying the Kennard–Stone (KS) algorithm to the IR spectra.29 Training samples were used in the model construction and optimization (variable selection by SPA and GA algorithms) while the prediction set was only applied to evaluate the classification model using LDA and QDA discrimination approaches. The optimal number of variables for SPA-LDA/QDA and GA-LDA/QDA was determined with an average risk G of LDA/QDA misclassification. Such a cost function is calculated in the validation set as:
 
image file: c6ra21331f-t1.tif(1)
where gn is defined as
 
image file: c6ra21331f-t2.tif(2)
where I(n) is the index of the true class for the nth validation object xn; r2(xn,mI(n)) is the squared Mahalanobis distance between object xn (of class index I(n)) and the sample mean mI(n) of this true class; and r2(xn,mI(m)) is the squared Mahalanobis distance between object xn and the center of the closest wrong class.26

To obtain a discriminant profile, the LDA classification score (Lij) is calculated for a given class k by the following equation:

 
image file: c6ra21331f-t3.tif(3)
where xi is an unknown measurement vector for sample i; [x with combining macron]k is the mean measurement vector of class k; Σpooled is the pooled covariance matrix; and πk is the prior probability of class k.30

On the other hand, the QDA classification score (Qij) is estimated using the variance–covariance for each class k and an additional natural logarithm term, as follows:

 
image file: c6ra21331f-t4.tif(4)
where Σk is the variance–covariance matrix of class k; and loge|Σk| is the natural determinant logarithm of variance–covariance matrix Σk.

Additionally, the main differences between these discrimination methods are that QDA forms a separated variance model for each class and does not assume classes having similar variance–covariance matrices; whereas LDA does not take into account different variance structures in each class, assuming that the analyzed classes have similar variance–covariance matrices.31 The GA-LDA/QDA calculations were performed during 40 generations with 80 chromosomes each. One-point crossover and mutation probabilities were set to 60% and 10%, respectively. Moreover, the algorithm was repeated three times, starting from different random initial populations. The best solution (in terms of the fitness value) resulting from the three GA repetitions was employed.

The classification models were built for ATR-FTIR spectra pooled into three different cases:

(1) NILM (430 spectra) vs. SIL (LSIL and HSIL) (400 spectra);

(2) NILM (220 spectra) vs. low-grade lesions (LSIL) (160 spectra);

(3) NILM (220 spectra) vs. high-grade lesions (HSIL) (240 spectra);

Calculations of sensitivity (probability that a test result will be positive when disease is present) and specificity (probability that a test result will be negative when disease is not present) were performed for this study as important quality measures of model accuracy. Both parameters have a maximum value of 1 and a minimum of 0, and can be obtained by using the following equations:

 
image file: c6ra21331f-t5.tif(5)
 
image file: c6ra21331f-t6.tif(6)
where FN is defined as a false negative and FP as a false positive. TP and TN are defined as true positive and true negative, respectively.26

Results and discussion

Blood plasma ATR-FTIR mean spectra of NILM, LSIL and HSIL categories are shown in Fig. 1. A total of n = 83 specimens were collected generating 830 spectra to be analyzed, where the NILM class accounted for 43 samples, and LSIL and HSIL pre-cancerous lesions represented 16 and 24 samples, respectively. In the region of interest between 900 and 1800 cm−1, called the “bio-fingerprint region”, some characteristic IR absorption bands can be observed in the spectra, such as the major peaks at ≈1650 cm−1 (amide I) and 1550 cm−1 (amide II) of aminoacids and proteins, as well as methylene groups of lipids at ≈1400–1470 cm−1. Other important bands (although less intense) include the asymmetric and symmetric phosphate stretching vibrations at ≈1225 cm−1 and 1080 cm−1, respectively, and also peaks at ≈1155 cm−1 corresponding to C–OH and C–O groups present in some aminoacids (such as serine, tyrosine, threonine) and carbohydrates, and the smooth band at ≈1030 cm−1 related to glycogen.23
image file: c6ra21331f-f1.tif
Fig. 1 ATR-FTIR mean spectra of NILM (blue), LSIL (red) and HSIL (green) samples, in the region of 900–1800 cm−1.

It is possible to verify that the spectra present strong similarity related to absorption bands, in addition to being highly overlapped, in a way that it becomes difficult to categorize samples only considering the complex spectral information available. In this sense, application of multivariate algorithms is an essential strategy to extract the important spectral information, enabling the discrimination of samples between NILM or SIL classes based on their pathophysiological condition reflected in the spectral bands. Furthermore, variable selection algorithms such as SPA and GA are powerful tools to be used in the search for biomarkers in blood plasma, allowing that less complex models be obtained. In this study, all spectra were pre-processed by applying normalization (amide I band) and baseline correction, and the classification models (PCA-LDA/QDA, SPA-LDA/QDA and GA-LDA/QDA) were built using both the processed and the raw data in order to compare results. In general, sensitivity and specificity values of models were higher when classification was performed using the raw data, and the best results can be appreciated in the following discussions. Considering the importance of screening methods for reducing the new cases of cervical cancer, the main objective of this study was to apply chemometric tools to extract the biochemical information of samples representing women with or without cervical lesions, making it possible to separate samples in to the two classes of NILM and SIL. Additionally, more specific models were also investigated to categorize samples in attempt to show the potentiality of the proposed classification method, taking into account the existence of subgroups in the cervical lesion (SIL), LSIL and HSIL classes. This approach could be of great interest in clinical routine, since medical conduct is totally different in face of a patient with a low-grade lesion or high-grade lesion condition. Therefore, the whole NILM dataset (430 spectra) was divided approximately by half for this purpose, and a NILM dataset of only 220 spectra (22 samples) was used for models in order to have a similar data size compared to the LSIL and HSIL datasets. In all cases, a comparison between LDA and QDA models was performed by analyzing the sensitivity and specificity values obtained for both linear and quadratic models.

NILM vs. SIL

PCA-LDA/QDA, SPA-LDA/QDA and GA-LDA/QDA were used to classify NILM vs. SIL. In this case, models using the non-processed dataset achieved better results for all algorithms, and especially for GA-LDA/QDA. In using 6 component scores (which accounts for >90% of the explained variance of the whole data), PCA-LDA produced a sensitivity and specificity of 37% for NILM class, and sensitivity and specificity values for SIL class were 80% and 75%, respectively. Application of QDA improved the sensitivity of NILM to 74%, however the specificity was lower than with LDA (26%), while application of QDA resulted in inferior classification rates for the SIL group (see Table 1). SPA-LDA and QDA were applied to the dataset in order to obtain a classification model using a considerably reduced number of variables, chosen by the minimum of the cost function G. Using only three selected wavenumbers (1404, 1508 and 1637 cm−1), Fisher scores were obtained and the classification indexes (sensitivity and specificity, shown in Table 1) were very similar to those of PCA-LDA/QDA, both for NILM and SIL classes. Another variable selection strategy was the application of GA-LDA and GA-QDA to build classification models, considering only the variables most related to chemical information present in the dataset.
Table 1 Results (sensitivity and specificity) of prediction samples for the models PCA-LDA/QDA, SPA-LDA/QDA and GA-LDA/QDA: NILM vs. SIL; NILM vs. LSIL; NILM vs. HSIL
  Model LDA QDA
Sensitivity (%) Specificity (%) Sensitivity (%) Specificity (%)
NILM vs. SIL PCA 37/80 38/75 74/52 26/52
SPA 40/78 40/78 61/42 37/58
GA 77/78 75/78 89/83 90/82
NILM vs. LSIL PCA 60/75 60/75 79/37 21/62
SPA 63/71 60/71 76/47 24/58
GA 76/83 82/87 94/67 94/83
NILM vs. HSIL PCA 54/97 54/97 67/44 76/42
SPA 45/94 45/94 48/28 51/72
GA 76/94 73/86 88/97 91/100


It is possible to observe from Table 1 that the GA-LDA model using the 68 selected wavenumbers from a whole 450 spectral variables improved classification rates for prediction samples when compared to PCA-LDA and SPA-LDA results. The GA-LDA model presented sensitivity of 77 and 78% for NILM and SIL, respectively, and also maintained very similar specificity results for both classes (75 and 78% for NILM and SIL classes, respectively). Using quadratic discriminant analysis associated to GA algorithm provided even better classification models, according to Table 1.

The wavenumbers selected by GA are shown highlighted in Fig. 2A. GA-QDA model presented sensitivity and specificity values of 89 and 90%, respectively, for NILM class; and the model achieved sensitivity and specificity of 83 and 82% for SIL class, respectively, maintaining agreement between the classification indexes for both classes. Sample separation into the two categories is shown for GA-LDA and GA-QDA in Fig. 2B and C, respectively. Two clusters are adequately visualized, where samples are softly and more correctly grouped into their own classes with the GA-QDA model. GA-LDA/QDA have selected particularly interesting wavenumbers (Fig. 2A); namely, the variables at 1747 and 1724 cm−1, associated to C[double bond, length as m-dash]O stretching vibrations of lipids and aldehydes, respectively. The major peaks of 1639 cm−1 (amide I) of C[double bond, length as m-dash]O stretching vibration of the amide group coupled to the N–H bond bending and the C–N bond stretching, as well as 1539 cm−1 (amide II) of C–N stretching and N–H deformation were observed. Finally, there are methyl and methylene groups of lipids and proteins at 1400 and 1454 cm−1, respectively, asymmetric and symmetric stretching vibrations of phosphate at 1219 and 1080 cm−1, respectively, and C–O groups of carbohydrates at 1155 cm−1 which also were observed.


image file: c6ra21331f-f2.tif
Fig. 2 Segregated categories based on the presence or absence of intraepithelial lesion of cervix, and selected wavenumbers by GA-LDA/QDA models. (A) 68 selected wavenumbers by GA-LDA/QDA models for NILM vs. SIL. (B) DF1 × specimens calculated by using the variables selected by GA-LDA from plasma samples segregated into NILM vs. SIL. (C) DF1 × specimens calculated by using the variables selected by GA-QDA from plasma samples segregated into NILM vs. SIL. (D) 41 selected wavenumbers by GA-LDA/QDA models for NILM vs. LSIL. (E) DF1 × specimens calculated by using the variables selected by GA-LDA from plasma samples segregated into NILM vs. LSIL. (F) DF1 × specimens calculated by using the variables selected by GA-QDA from plasma samples segregated into NILM vs. LSIL. (G) 45 selected wavenumbers by GA-LDA/QDA models for NILM vs. HSIL. (H) DF1 × specimens calculated by using the variables selected by GA-LDA from plasma samples segregated into NILM vs. HSIL. (I) DF1 × specimens calculated by using the variables selected by GA-QDA from plasma samples segregated into NILM vs. HSIL.

NILM vs. LSIL (low-grade)

In this case, the classification between NILM and LSIL samples were investigated. As can be seen using the raw dataset in Table 1, PCA-LDA with 6 components presented sensitivity and specificity for NILM of 60%, while these values were superior (75%) for SIL. When QDA was applied, the sensitivity of NILM increased to 79%; however, a loss in specificity was observed (21%), and also the LSIL classification was impaired. Four variables were selected for SPA-LDA (1459, 1531, 1583 and 1641 cm−1), and the sensibility and specificity values for both NILM and LSIL were very similar to those for PCA-LDA (around 60 and 70%, respectively). Better results were not found for QDA. Regarding Table 1, GA-LDA and QDA led to the best classification for this case in using 41 selected wavenumbers, as shown in Fig. 2D. Sensitivity and specificity for NILM were 76 and 82%, respectively, and these values were 83 and 87% for LSIL using the GA-LDA model. For the NILM class, GA-QDA was found presenting 94% of sensitivity and specificity, leading to better results. Separation of samples with GA-LDA/QDA models can be viewed in Fig. 2E and F, where both models have well defined clusters corresponding to NILM and LSIL samples.

In this case, GA-LDA/QDA selected some interesting variables (see Fig. 2D), namely the wavenumbers at 1724 and 1461 cm−1 associated to C[double bond, length as m-dash]O stretching vibrations of aldehydes and methylene lipid groups; amide III from proteins at 1334 cm−1; asymmetric and symmetric stretching vibrations of phosphate at 1221 and 1089 cm−1; and out-of-plane C–H bending at 960 cm−1. It is worth mentioning that some of these variables are coincident with those selected for NILM vs. SIL classification, as described above.

NILM vs. HSIL (high-grade)

Lastly, this event was performed in order to evaluate the classifications of NILM vs. HSIL. By observing Table 1, it can be seen that both PCA-LDA and SPA-LDA showed poor sensitivity and specificity for NILM, while these values were above 90% for HSIL prediction samples. However, calibration and validation sets were found to be poorly classified. PCA-LDA used 6 components and SPA-LDA selected 3 variables: 1483, 1535 and 1641 cm−1. A gentle enhancement in results was achieved for the NILM class with the PCA-QDA model (see Table 1). As was observed in the previous cases, the best results for classification were obtained with the GA-LDA/QDA models. Using 45 selected wavenumbers (see Fig. 2G) for NILM, sensitivity and specificity were around 73%. For HSIL, these parameters have superior values of 94 and 86% (Table 1) with the GA-LDA model. In this particular case, application of QDA considerably improved results, leading to more accurate classifications for both classes. Regarding NILM, sensitivity and specificity values were 88 and 91%, respectively; while HSIL had 97% sensitivity and 100% specificity with the GA-QDA model. Fig. 2G and H show the classification of samples. A great separation may be observed, with two very-well defined clusters representing the analyzed classes.

In this case, GA-LDA/QDA have selected some interesting variables (see Fig. 2G), namely: variables at 1758 and 1729 cm−1 are associated to C[double bond, length as m-dash]O stretching vibrations of lipids and aldehydes, respectively, major peaks at 1639 cm−1 (amide I) of C[double bond, length as m-dash]O stretching vibration of the amide group coupled to the bending of the N–H bond and the stretching of the C–N bond, the right and side amide II at 1531 cm−1, methylene lipid groups at 1467 cm−1, amide III from proteins at 1342 cm−1, out-of-plane C–H bending at 968 cm−1, and the variables at 1043 and 1063 cm−1 representing glycogen band due to OH stretching coupled with bending and CO–O–C symmetric stretching of phospholipids and cholesterol esters, respectively.23

Conclusions

Results obtained in this study present the potentiality of ATR-FTIR spectroscopy associated with multivariate classification models (PCA-LDA/QDA, SPA-LDA/QDA and GA-LDA/QDA) as an alternative approach for cervical cancer screening. This method was able to correctly classify specimens between NILM and SIL (LSIL and HSIL) directly in blood plasma with sensitivity and specificity varying between 80 and 100% using a fast, clean, minimally invasive and cost-effective methodology. Application of variable selection algorithms (especially GA) considerably improved the classifications by choosing spectral variables that reflect the chemical differences between a healthy and pre-cancerous plasma sample. In addition, this study demonstrated utilization of quadratic discriminant analysis (QDA) and compared its results to those provided by LDA. These findings are very encouraging to be tested with much larger numbers of samples in order to robustly validate this method and better associate spectral variables to biomarkers of cervical intraepithelial lesions. In this sense, this biospectroscopic approach could contribute to traditional screening methods for early detection of pre-cancerous lesions of cervix, and then reduce the number of new cases of this invasive disease worldwide.

Acknowledgements

A. A. O. Neves and Camilo L. M. Morais would like to acknowledge the financial support from PPGQ/UFRN/CAPES for the research grant. The authors would like to acknowledge the financial support from the Brazilian National Council for Scientific and Technological Development (CNPq). K. M. G. Lima acknowledges the CNPq Grant (305962/2014-0) for financial support. This work was funded by a grant from CNPq/Capes project (Grant 070/2012).

References

  1. http://globocan.iarc.fr/, accessed on 18/08/2016.
  2. http://www2.inca.gov.br/, accessed on 18/08/2016.
  3. C. M. de Oliveira, I. G. Bravo, N. C. S. e. Souza, M. L. N. D. Genta, J. H. T. G. Fregnani, M. Tacla, J. P. Carvalho, A. Longatto-Filho and J. E. Levi, Infect., Genet. Evol., 2015, 34, 44–51 CrossRef PubMed.
  4. E. J. Nam, J. W. Kim, S. W. Kim, Y. T. Kim, J. H. Kim, B. S. Yoon, N. H. Cho and S. Kim, Gynecol. Oncol., 2007, 104, 207–211 CrossRef CAS PubMed.
  5. F. Cannella, A. Pierangeli, C. Scagnolari, G. Cacciotti, G. Tranquilli, P. Stentella, N. Recine and G. Antonelli, Immunobiology, 2015, 220, 363–368 CrossRef CAS PubMed.
  6. S. Franceschi and S. Vaccarella, Cancer Epidemiol., 2015, 39, 1152–1156 CrossRef PubMed.
  7. B. F. Lees, B. K. Erickson and W. K. Huh, Am. J. Obstet. Gynecol., 2016, 214, 438–443 CrossRef PubMed.
  8. C. P. Crum and C. M. McLachlin, J. Cell. Biochem., 1995, 23, 71–79 CrossRef CAS PubMed.
  9. J. Paavonen, Int. J. Infect. Dis., 2007, 11(2), S3–S9 CrossRef PubMed.
  10. C. J. De Witte, A. J. M. Van De Sande, H. J. Van Beekhuizen, M. M. Koeneman, A. J. Kruse and C. G. Gerestein, Gynecol. Oncol., 2015, 139, 377–384 CrossRef CAS PubMed.
  11. T. M. Wilkinson, P. H. H. Sykes, B. Simcock and S. Petrich, Am. J. Obstet. Gynecol., 2015, 212, 769.e1–769.e7 CrossRef PubMed.
  12. A. G. Waxman, D. Chelmow, T. M. Darragh, H. Lawson and A.-B. Moscicki, Obstet. Gynecol., 2012, 120, 1465–1471 CrossRef PubMed.
  13. R. Nayar and D. C. Wilbur, Cancer Cytopathol., 2015, 123, 271–281 CrossRef PubMed.
  14. S. Tabbara, A. B. D. Saleh, W. A. Andersen, S. R. Barber, P. T. Taylor and C. P. Crum, Obstet. Gynecol., 1992, 79, 338–346 CrossRef CAS PubMed.
  15. S. Mittal, I. Ghosh, D. Banerjee, P. Singh, J. Biswas, R. Nijhawan, R. Srinivasan, C. Ray and P. Basu, Int. J. Gynecol. Obstet., 2014, 126, 227–231 CrossRef PubMed.
  16. N. Santesso, R. A. Mustafa, H. J. Schünemann, M. Arbyn, P. D. Blumenthal, J. Cain, M. Chirenje, L. Denny, H. De Vuyst, L. O. Eckert, S. E. Forhan, E. L. Franco, J. C. Gage, F. Garcia, R. Herrero, J. Jeronimo, E. R. Lu, S. Luciani, S. C. Quek, R. Sankaranarayanan, V. Tsu and N. Broutet, Int. J. Gynecol. Obstet., 2015, 3, 1–7 Search PubMed.
  17. B. S. Apgar and G. Brotzman, Am. Fam. Physician, 2004, 70, 1905–1916 Search PubMed.
  18. R. H. Kaufman, E. Adam, J. Icenogle and W. C. Reeves, Am. J. Obstet. Gynecol., 1997, 177, 930–936 CrossRef CAS PubMed.
  19. R. A. Mustafa, N. Santesso, R. Khatib, A. A. Mustafa, W. Wiercioch, R. Kehar, S. Gandhi, Y. Chen, A. Cheung, J. Hopkins, B. Ma, N. Lloyd, D. Wu, N. Broutet and H. J. Schünemann, Int. J. Gynecol. Obstet., 2016, 132, 259–265 CrossRef PubMed.
  20. M. J. Walsh, M. J. German, M. Singh, H. M. Pollock, A. Hammiche, M. Kyrgiou, H. F. Stringfellow, E. Paraskevaidis, P. L. Martin-Hirsch and F. L. Martin, Cancer Lett., 2007, 246, 1–11 CrossRef CAS PubMed.
  21. J. Martínez-Mesa, G. Werutsky, R. B. Campani, F. C. Wehrmeister and C. H. Barrios, Prev. Med., 2013, 57, 366–371 CrossRef PubMed.
  22. G. Theophilou, M. Paraskevaidi, K. M. Lima, M. Kyrgiou, P. L. Martin-Hirsch and F. L. Martin, Expert Rev. Mol. Diagn., 2015, 15, 693–713 CrossRef CAS PubMed.
  23. K. M. G. Lima, K. B. Gajjar, P. L. Martin-Hirsch and F. L. Martin, Biotechnol. Prog., 2015, 31, 832–839 CrossRef CAS PubMed.
  24. N. C. Purandare, J. Trevisan, I. I. Patel, K. Gajjar, A. L. Mitchell, G. Theophilou, G. Valasoulis, M. Martin, G. von Bünau, M. Kyrgiou, E. Paraskevaidis, P. L. Martin-Hirsch, W. J. Prendiville and F. L. Martin, Bioanalysis, 2013, 5, 2697–2711 CrossRef CAS PubMed.
  25. N. C. Purandare, I. I. Patel, K. M. G. Lima, J. Trevisan, M. Ma’Ayeh, A. McHugh, G. Von Bünau, P. L. Martin Hirsch, W. J. Prendiville and F. L. Martin, Anal. Methods, 2014, 6, 4576–4584 RSC.
  26. K. M. G. Lima, K. Gajjar, G. Valasoulis, M. Nasioutziki, M. Kyrgiou, P. Karakitsos, E. Paraskevaidis, P. L. Martin and F. L. Martin, Anal. Methods, 2014, 6, 9643–9652 RSC.
  27. R. Nayar, The Bethesda System for Reporting Cervical Cytology: Definitions, Criteria and Explanatory Notes, Springer, 3rd edn, 2015 Search PubMed.
  28. M. J. Baker, J. Trevisan, P. Bassan, R. Bhargava, H. J. Butler, K. M. Dorling, P. R. Fielden, S. W. Fogarty, N. J. Fullwood, K. a. Heys, C. Hughes, P. Lasch, P. L. Martin-Hirsch, B. Obinaju, G. D. Sockalingum, J. Sulé-Suso, R. J. Strong, M. J. Walsh, B. R. Wood, P. Gardner and F. L. Martin, Nat. Protoc., 2014, 9, 1771–1791 CrossRef CAS PubMed.
  29. R. Kennard and L. Stone, Technometrics, 1969, 11, 137–148 CrossRef.
  30. W. Wu, Y. Mallet, B. Walczak, W. Penninckx, D. L. Massart, S. Heuerding and F. Erni, Anal. Chim. Acta, 1996, 329, 257–265 CrossRef CAS.
  31. S. J. Dixon and R. G. Brereton, Chemom. Intell. Lab. Syst., 2009, 95, 1–17 CrossRef CAS.

This journal is © The Royal Society of Chemistry 2016
Click here to see how this site uses Cookies. View our privacy policy here.