ATR-FTIR spectroscopy coupled with chemometric analysis discriminates normal, borderline and malignant ovarian tissue: classifying subtypes of human cancer

Surgical management of ovarian tumours largely depends on their histo-pathological diagnosis. Currently, screening for ovarian malignancy with tumour markers in conjunction with radiological investigations has a low speci ﬁ city for discriminating benign from malignant tumours. Also, pre-operative biopsy of ovarian masses increases the risk of intra-peritoneal dissemination of malignancy. Intra-operative frozen section, although su ﬃ ciently accurate in di ﬀ erentiating tumours according to their histological type, increases operation times. This results in increased surgery-related risks to the patient and additional burden to resource allocation. We set out to determine whether attenuated total re ﬂ ection Fourier-transform infra-red (ATR-FTIR) spectroscopy, combined with chemometric analysis can be applied to discriminate between normal, borderline and malignant ovarian tumours and classify ovarian carcinoma subtypes according to the unique spectral signatures of their molecular composition. Formalin-ﬁ xed, para ﬃ n-embedded ovarian tissue blocks were de-waxed, mounted on Low-E slides and desiccated before being analysed using ATR-FTIR spectroscopy. Chemometric analysis in the form of principal component analysis (PCA), successive projection algorithm (SPA) and genetic algorithm (GA), followed by linear discriminant analysis (LDA) of the obtained spectra revealed clear segregation between benign versus borderline versus malignant tumours as well as segregation between di ﬀ erent histological tumour subtypes, when these approaches are used in combination. ATR-FTIR spectroscopy coupled with chemometric analysis has the potential to provide a novel diagnostic approach in the accurate diagnosis of ovarian tumours assisting surgical decision making to avoid under-treatment or over-treatment, with minimal impact to the patient.


Introduction
Ovarian cancer is the 5 th most common gynaecological cancer (incidence of 18 per 100 000) and the 4 th most common cause of cancer death (mortality of 8.8 per 10 000) in women in the UK. 1 In 2009, 5900 women were diagnosed with ovarian cancer; 3500 died from the disease the year after. 2 The high related mortality is a consequence of late presentation and diagnosis at stage III or IV resulting in five-year survival rates of 20% and 6%, respectively. 2arian cancer encompasses a heterogeneous group of tumours, as indicated by differences in epidemiological and genetic risk factors, precursor lesions, patterns of spread, molecular events during oncogenesis, response to chemotherapy and prognosis. 3Ninety percent of ovarian cancers are malignant epithelial tumours termed carcinomas, the remainder being germ cell and sex cord-stromal tumours. 4The commonest types of ovarian carcinomas are high-grade serous carcinoma (HGSC), low-grade serous carcinoma (LGSC), mucinous carcinoma (MC), endometrioid carcinoma (EC), clear cell carcinoma (CCC), carcinosarcoma (CS) and mixed tumours (MT).Carcinomas are graded according to their cellular differentiation from normal tissue.This does not apply to HGSC and LGSC as they are considered different entities. 5Borderline epithelial tumours comprise approximately 15% of epithelial ovarian tumours and have a good prognosis.With the implementation of the "international classification of diseases for oncology (ICD-O-3)", these tumours are no longer considered malignant. 6he complexity and heterogeneity of ovarian cancer with regards to risk factors, precursor lesions, morphological and clinical manifestations has hindered the development of robust population-based screening programs.Currently, in the UK, the assessment for ovarian cancer is based on "Risk Malignancy Index" (RMI 1 or 2), which encompasses menopausal status, ultrasonographic ovarian presentation and blood levels of the tumour marker Ca125 (see ESI Table S1 †). 7,8Other blood-derived biomarkers with similar accuracy to RMI have been suggested, for example HE4 and ROMA, but have not been established in practice. 9][12] Prognostic factors for ovarian cancer include stage and grade of cancer at diagnosis and residual disease after primary staging surgery. 13Histopathological tumour type is important when considering personalised treatment options.This plays a major role in chemotherapy responsiveness and therefore in overall survival rate. 13,14For example, HGSC demonstrates much better response to platinum-based chemotherapy than CCC.This results in CCC having a lower 5-year survival than HGSC. 15It is obvious that current methods of ovarian cancer diagnosis and management have significant limitations.This is therefore a field that can benefit from research to identify novel methods of detecting and categorizing ovarian cancer to aid personalized intra-and post-operative management while also minimizing patient risk and resource expenditure.
Vibrational spectroscopy is a bio-analytical tool that has the potential to classify normal and pathological tissue according to their chemical and molecular differences. 168][19][20][21] The resulting spectral differences may be used to distinguish benign from cancerous processes and classify cancer subtypes.Examples of areas studied by these methods include breast, 22 endometrial, 23 cervical, 24 prostatic 25 and brain cancers. 26e utilised ATR-FTIR spectroscopy to interrogate ovarian tissue harvested from women undergoing oophorectomies for several reasons including pelvic pain, postmenopausal bleeding, menorrhagia or dysfunctional uterine bleeding, premenstrual tension, risk reduction due to breast cancer or positive family history and imaging revealing ovarian cysts/ masses.We hypothesized that spectrochemical analysis of ovarian tissue will allow diagnostic segregation of benign, borderline and cancerous tumours.Additionally, this method will allow classification of epithelial ovarian cancer subtypes.Resulting spectral datasets were analysed using multivariate analysis in the form of principal component analysis followed by linear discriminant analysis (PCA-LDA) and variable selection techniques in the form of successive projection algorithm (SPA) and genetic analysis (GA) again followed by LDA (SPA-LDA, GA-LDA).These chemometric techniques reduced the complexity of the spectral datasets and allowed visual representation.They were combined to form a classification machine.Additionally, we analysed our spectral datasets using multivariate control charts based on PCA 27 to examine whether biospectroscopy could correctly classify normal, borderline and cancerous ovaries.

Tissue collection and preparation
Ovarian specimens were acquired from the Royal Preston Hospital bio-bank with appropriate ethics clearance (REC reference 10/H0308/75).They included n = 35 histologically benign ovarian samples, n = 30 samples containing borderline ovarian tumours and n = 106 samples with a diagnosis of epithelial carcinoma.The ovarian carcinomas were further subdivided to HGSC (n = 46), LGSC (n = 9), EC (n = 15), MC (n = 12), CCC (n = 13), CS (n = 7) and MT (n = 4).Table 1 lists the specific histological diagnoses for these samples.The tissue samples were embedded in paraffin.Ten-μm-thick tissue sections were floated onto Low-E IR reflective slides (Kevley Technologies, Chesterland, OH, USA) slides.These were de-waxed by serial immersion in three sequential fresh xylene baths for 5 min and washed in an acetone bath for a further 5 min.The resulting samples were allowed to air dry and placed in a desiccator until analysis.Four-μm-thick parallel tissue sections were floated onto glass slides and stained with H&E for histological comparison, where needed.

Classification of ovarian tissues according to histophathological characteristics
Fig. 1 shows a benign ovarian tumour (mucinous cystadenoma) (Fig. 1a), a borderline tumour (Fig. 1b) and different ovarian carcinoma subtypes stained (Fig. 1c-i) following H&E staining.The World Health Organization (WHO) criteria for classification of epithelial ovarian tumours are based on optical microscopy after H&E staining.They describe the tissues these carcinomas resemble and how they differ from each other in general terms.The WHO lists general criteria to assist the differentiation between the different subtypes (see ESI Table S2 †). 28,29R-FTIR spectroscopy IR spectra were obtained using a Bruker Vector 27 FTIR spectrometer with a Helios ATR attachment containing a diamond crystal (≈250 μm × 250 μm sampling area) (Bruker Optics Ltd, Coventry, UK).Spectra were acquired from 10 randomly selected locations across the specimen to try and minimize bias by selection of areas with particular histopathological phenotype.A new background measurement was taken for every sample processed.The ATR crystal was cleaned with distilled water and dried with dry tissue paper before the acquisition of spectral background.The spectral resolution was 8 cm −1 with 2× zero filling of the interferogram giving data spacing of 4 cm −1 .Spectra were co-added for 32 scans;  Analyst Paper these were converted into absorbance by Bruker OPUS software.Absorbance spectral images were converted to suitable digital files (.txt) for input to Matlab software.

Computational analysis
The ATR-FTIR datasets were processed using an in-house toolbox (iRootlab) 30 and PLS toolbox 7.8 (Eigenvector Research, Inc.3905 West Eaglerock Drive, Wenatchee, WA 98801) within a MATLAB R2014a environment (Mathworks Inc, Natick, MA, USA).The wavenumber regions inputted were between 4000 cm −1 and 600 cm −1 .Spectra were then cut to include the regions between 1800-900 cm −1 .PCA-LDA reduces the complex spectral dataset into single points in hyperspace, while maximizing inter-class variation and minimizing intraclass variation.The disadvantage of this method is the potential over fitting of spectra causing arbitrary separation and therefore positive results.This can be counteracted by using large spectral datasets of more than five times the number of variables.
For the PCA-LDA, SPA-LDA and GA-LDA models, the samples were divided into training (70%), validation (15%) and prediction sets (15%) by applying the classic Kennard-Stone (KS) uniform sampling algorithm to the IR spectra. 31raining samples were used in the modelling procedure (including variable selection for LDA), whereas the prediction set was only used in the final evaluation of the classification.The optimum number of variables for SPA-LDA and GA-LDA was determined from the minimum cost function G calculated for a given validation dataset: where g n is defined as and I(n) is the index of the true class for the n th validation object x n .The GA routine was carried out during 100 generations with 200 chromosomes each.Crossover and mutation probabilities were set to 60% and 10%, respectively.Moreover, the algorithm was repeated three times, starting from different random initial populations.The best solution (in terms of the fitness value) resulting from the three realizations of the GA was employed.For this study, LDA scores, loadings, and discriminant function (DF) values were obtained for the specimens.The first LDA factor (LD1) was used to visualize sample alterations in 1-dimensional (1D) score plots that indicate the main biochemical alterations.
Multivariate control charts were based on PCA.When the PCA model is applied on data collected when only common use variation is present, the future data behaviour can be referenced against this "in-control" model.In this sense, new multivariate observations can be projected onto the plane defined by the PCA loading vectors to obtain their scores (t i,new = p T i y new ) and the residuals e new = y new − ŷnew , where ŷnew = P A T A,new , t A,new is the (A × 1) vector of scores from the model and P A is the (q × A) matrix of loadings.The presence of samples within the ±2 s control limits in the Shewhart control chart is built using the relevant PC scores.Trends and systematic behaviours in the scores plot are clear indications of "out-of-control" processes (i.e., normal ovarian tissue, borderline tissue and carcinoma subtypes).
To examine these visible differences and attempt classification of the three categories, three types of chemometric analysis were used.Classification was achieved using PCA-LDA, SPA-LDA and GA-LDA.Seventy percent of the spectra were used to train the algorithm, 15% to test it internally and 15% to validate it externally (see ESI Table S3 †).On comparing the spectra using PCA-LDA, seven PCs where used as this number provided significant classification (P < 0.001) without the introduction of arbitrary separation.Fig. 3c shows the 2-D scores plot derived by PCA-LDA.It reveals segregation of cancerous tissue from normal and borderline tumours, with the latter classes overlapping.The majority of differences between the normal and cancerous ovaries were attributed to Amide I (1674 cm −1 ), nucleic acids (1620 cm −1 ), different conformations of phenyl rings (1585 cm −1 , 1504 cm −1 ), polysaccharides (1431 cm −1 ) and symmetric phosphate stretching vibrations (ν s PO 2 − ; 1096 cm −1 ) (Fig. 3a).The chemometric technique that classified the three classes most successfully [66.4%] was GA-LDA using 29 variables determined from the minimum cost function G (Fig. 3g and h) The related 2-D scores plot illustrates that spectral points from different classes dissociate while spectral points from the same class co-cluster (Fig. 3i).SPA-LDA also achieved considerable classification [55.9%] with separation of classes on a 2-D scores plot (Fig. 3f ) when applied using 23 variables (Fig. 3d), again using the minimum cost function G (Fig. 3e) (see ESI Table S4 †).All three techniques identified differences that aided classification within similar spectral regions.These differences were tentatively identified in the spectral regions of = 1400 cm −1 ( protein), = 1740 cm −1 (lipid), = 1045 cm −1 ( phosphate), = 1545 cm −1 (carbohydrate).

Lipid-to-protein ratio, phosphate-to-carbohydrate ratio and RNA-to-DNA ratio
To further evaluate the importance of the above spectral regions in classifying the ovarian tumours to benign, borderline and malignant specimens, intensity ratios of spectral areas important for classification were measured (Table 2).Fig. 4a shows the lipid-to-protein ratio, which is obtained by calculating the ratio of band intensities at 1750 cm −1 to 1730 cm −1 (lipids) and 1410 cm −1 to 1390 cm −1 ( protein).The lipid-to-protein ratio was higher in neoplastic tissue and lower in borderline and benign tissue (Table 2).Normal and benign tissue exhibited similar ratios.The parameters used for the tentatively-assigned phosphate-to-carbohydrate ratio in each IR spectrum were derived from the intensity of phosphate at 1055 cm −1 to 1045 cm −1 and of carbohydrate at 1555 cm −1 to 1535 cm −1 .The phosphate-to-carbohydrate ratio is mildly increased in ovarian carcinomas relative to borderline and benign tissue and the difference is also significant (P < 00001) (Fig. 4b; Table 2).When comparing the intensity ratios of RNA (1111 cm −1 to 1131 cm −1 ) to DNA (1010 cm −1 to 1030 cm −1 ), the ovarian carcinomas exhibited slightly lower ratios (Fig. 4c).
Classification of normal ovaries, ovaries with borderline tumours and those with ovarian carcinoma using multivariate control charts based on PCA Multivariate control charts are commonly used in industry for quality control of chemical substances.A similar approach may be used in biospectroscopy.Tissue from benign ovaries can act as "control tissue" against which borderline and neoplastic tissues are compared.The control is represented by a line at zero, and another line is drawn usually at two standard deviations (SDs).How far from normal this line is from zero depends on the variability that exists within the examined tissue.When comparing borderline and malignant tissue with benign control tissue everything outside the SD lines is considered abnormal.Interestingly, control charts, derived from the PCA already performed are able to distinguish between normal and neoplastic ovaries (Fig. 5a) and normal and those with borderline tumours (Fig. 5b).
Classification of carcinoma subtypes using PCA-LDA, SPA-LDA and GA-LDA Similar chemometric techniques have been used to classify epithelial ovarian carcinomas according to their subtypes.The aforementioned pre-processing of the spectral datasets was applied in this case also.Seven PCs were used for PCA (Fig. 6b), 23 wavenumbers for SPA (Fig. 6d) and 44 wavenumbers for GA (Fig. 6g) (see ESI Table S5 †).The number of wavenumbers to be used was again determined by the minimum cost function G (Fig. 6e and h).PCA, SPA and GA followed by LDA were not adequately successful when comparing the spectral datasets of all cancer subtypes together as revealed by the associated 3-D scores plots (Fig. 6c, f and i).There was however visible separation between clear cell carcinoma (cyan), carcino- sarcoma ( pink) and high-grade serous carcinoma (blue) subtypes when analysed SPA-LDA and separation between clear cell carcinoma (cyan), carcinosarcoma ( pink) spectral classes when analysed by GA-LDA.Unfortunately there was not adequate visual separation between classes with PCA-LDA.

Two-category discriminant analysis of carcinoma subtypes using PCA-LDA, SPA-LDA and GA-LDA
To increase the classification success rate, spectral datasets representing different epithelial tumour subtypes were compared in pairs.The three chemometric techniques previously mentioned where utilised again.Similar validation methods were used with 70% of the data being used to train the system, 15% for internal validation and 15% for external validation.The optimum number of PCs for PCA and variables for SPA-LDA and GA-LDA was determined by power versus cost calculation using the minimum cost function G (see ESI Fig. S1 †).ESI Fig. S2-S4 † represent graphically the 2-D scores plots derived by PCA-LDA, SPA-LDA and GA-LDA, respectively, following comparison of all the carcinoma subcategories after ATR-FTIR spectroscopy.The three analytical techniques were not equally successful at distinguishing between the categories compared.Fig. 7 presents the percentage success for classification with each method.In general, distinguishing between the different carcinoma subclasses was more successful when using GA-LDA.Table 2 Statistical significance of classification of normal, borderline and malignant ovarian tumours based on ratios of lipid-to-protein, phosphate-to-carbohydrate and RNA-to-DNA spectral intensities.Statistical significance was calculated using one-way ANOVA followed by Tukey's test

Discussion
Our study demonstrates that ATR-FTIR spectroscopy in conjunction with powerful chemometric approaches has the potential to distinguish between normal, borderline and neoplastic ovarian tissue.It also has the potential to distinguish between different ovarian epithelial carcinoma subtypes.The most conspicuous differences are between normal ovaries and overt carcinoma, identified by the chemometric techniques employed.This finding is significant due to its potential for translation into clinical practice.Currently, histological identification is the gold standard in the diagnosis of ovarian cancer and therefore essential for surgical decision-making.Benign ovarian masses do not require extensive surgery, while ovarian carcinomas will usually be managed by "staging" surgery involving a bilateral salpingooopherectomy, hysterectomy, omentectomy and pelvic lymphadenectomy.Pre-operative biopsy methods have been suggested to obtain histological diagnosis before embarking in major surgery.For example, imageguided fine needle aspiration cytology (FNAC) and core biopsy using ultrasound, CT or MRI imaging have been shown to be effective with a diagnostic accuracy of 80.9% and 93% respectively. 32,33hese diagnostic modalities are usually preserved for women with co-morbidities that prohibit primary staging surgery or where imaging has revealed potentially inoperable disease.The reason for this is the risk of upstaging the disease by causing intra-peritoneal spillage of cancerous cells.Where there is a high clinical suspicion of ovarian cancer, a "staging procedure" is performed, which includes bilateral salpingooopherectomy, hysterectomy, omentectomy and pelvic lymphadenectomy.
In cases where clinical suspicion alone is not enough to embark on staging surgery, intra-operative consultation by a pathologist is pursued.This utilises "frozen section" of the specimen, which is then stained, usually with H&E, and is  examined by optical microscopy.Frozen section distinguishes benign from malignant tumours very accurately, but is less accurate for borderline tumours. 34,35It prevents morbidity associated with surgical staging procedures in benign cases and under-treatment of malignant tumours, which would otherwise require restaging surgery or chemotherapy.Frozen section has several limitations that include sampling difficul-ties, interpretation errors and communication breakdown. 36It also causes increases in surgical times, with resultant morbidity to the patient.ATR-FTIR spectroscopy in conjunction with chemometric analysis allows the identification of biomarkers that can be adapted for easy discrimination between benign and neoplastic tissue during surgery.Indeterminate ovarian masses that would otherwise require a frozen section may be  processed.The processing of ovarian tumour samples may be performed in an area adjacent to an operating theatre using desktop ATR spectroscopy instruments or even within an operating theatre, using handheld devices.In our experiment the spectral dataset for each sample was acquired in approximately seven minutes (approximately 1 second per scan, 32 scans for each of 10 areas).The remaining time was spent readjusting the stage to a different area within the sample.Our spectral analysis was performed in a stepwise manner, therefore timing for the analysis is difficult to estimate.Our in-house tools for matlab, irootlab 30,37,38 can be adjusted to analyse spectral datasets using analysis cascades that would complete the procedure within minutes, reducing surgical duration.They would also require minimal operator training.Spectral datasets of control tissue could be pooled retrospectively and used for the calibration of the different analyses performed and increase accuracy of the tests. 39For example multivariate control charts may be used to distinguish malignant ovarian tumours that would require extensive surgery from benign or borderline tumours that will not.
The clinical importance for the diagnosis of ovarian carcinoma subtypes lays with their implications in immediate and subsequent management, medical or surgical, their follow-up and genetic counselling.Patients with early-stage (1a) mucinous or endometrioid carcinoma can be treated with surgery alone.Patients with high-grade serous carcinoma will routinely have adjuvant chemotherapy.Those with mucinous, endometrioid, and clear cell carcinomas may have adjuvant or neoadjuvant combination radiotherapy and chemotherapy.Highgrade serous adenocarcinoma is also associated with BRCA mutations; therefore, patients may be referred for genetic testing and if proven positive their families would be screened.ATR-FTIR spectroscopy coupled with a chemometric machine has the potential of being adopted as an additional tool for pathological interpretation of ovarian carcinomas.

Conclusion
The purpose of this study was to identify spectral differences within ovarian tissues with the capability of classifying them in accordance to their histopathological status.Utilising ATR-FTIR spectroscopy, n = 171 ovarian tissues were examined.Morphological and molecular alterations within these tissues have already been associated with neoplasia.Spectroscopic analysis of these tissues reveals specific alterations linked to malignancy.The responsible changes for this segregation were primarily alterations in the tentatively-assigned lipid (1740 cm −1 )-to-protein (1400 cm −1 ) ratio, with a marked increase associated with carcinomas.IR spectroscopy coupled with chemometric analysis has the potential to differentiate not only neoplastic from borderline and benign tissues but also distinguish between different carcinoma subtypes.Further validation of these approaches exploiting other biospectroscopy techniques and using larger and architecturally more robust datasets is required.

Fig. 2
Fig. 2 Analysing ovarian tissues by ATR-FTIR spectroscopy and pre-processing resulting datasets.(a) 10 μm-thick ovarian tissue sections; (b) sample in close proximity with the ATR diamond; (c) unprocessed spectra; and (d) resulting spectra after pre-processing.

Fig. 3
Fig. 3 Classification of benign, borderline and malignant ovarian tissue by spectral analysis using PCA-LDA, SPA-LDA and GA-LDA.(a) Loadings plot identifying the major discriminant wavenumbers for the three classes; x-axis is cm −1 and the y-axis represents absorbance coefficient.The five wavenumbers contributing to the most segregation were derived from the points furthest away from the x-axis.(b) Cost/function plot identifying the optimal number of PCs to be used for PCA.(c) Scores plot graphically representing classification by PCA-LDA; x-axis represents LD1 and the y-axis LD2.(d) Wavenumber selection for SPA-LDA.(e) Cost/function plot identifying the optimal number of wavenumbers to be used for the SPA algorithm.(f ) Scores plot graphically representing classification by SPA-LDA; x-axis represents LD1 and the y-axis LD2.(g) Wavenumber selection for GA-LDA.(h) Cost/function plot identifying the optimal number of wavenumbers to be used for the GA algorithm.(i) Scores plot graphically representing classification by GA-LDA; x-axis represents LD1 and the y-axis LD2 (red = cancer, green = borderline, blue = benign).

Fig. 4
Fig. 4 Classification of ovarian tumours to benign, borderline and malignant using spectral intensity ratios.(a) Intensity ratio of lipid-to-protein; (b) intensity ratio of phosphate-to-carbohydrate; and, (c) intensity ratio of RNA-to-DNA.

Fig. 5
Fig. 5 Classification of ovarian tumours to benign, borderline and malignant using Shewhart control charts after PCA.(a) Benign ovarian tissue vs. malignant tissue; and, (b) benign ovarian tissue vs. borderline tissue.

Fig. 6
Fig. 6 Classification of ovarian carcinoma subtypes by spectral analysis using PCA-LDA, SPA-LDA and GA-LDA.(a) Pre-processed spectral dataset.Each colour represents a particular neoplastic subtype.(b) Cost/function plot identifying the optimal number of PCs to be used for PCA.(c) Scores plot graphically representing classification by PCA-LDA; x-axis represents LD1 and y-axis LD2.(d) Wavenumber selection for SPA-LDA.(e) Cost/function plot identifying the optimal number of wavenumbers to be used for the SPA algorithm.(f ) Scores plot graphically representing classification by SPA-LDA; x-axis represents LD1 and y-axis LD2.(g) Wavenumber selection for GA-LDA.(h) Cost/function plot identifying the optimal number of wavenumbers to be used for the GA algorithm.(i) Scores plot graphically representing classification by GA-LDA; x-axis represents LD1 and y-axis LD2 (blue = high grade serous, red = low grade serous, black = endometrioid carcinoma, yellow = mixed, green = mucinous, cyan = clear cell, pink = carcinosarcoma).

Fig. 7
Fig. 7 Percentage successful classification of ovarian carcinoma subtypes when compared in pairs using three chemometric analyses: PCA-LDA, SPA-LDA, GA-LDA.Red boxes represent the most successful technique for each particular pair analysed.Amber colour represents the second most successful technique and green the least.

Table 1
Histopathology of the ovarian tissues interrogated: Borderline tumours and malignant epithelial carcinomas are similarly staged according to FIGO Ovarian Cancer Staging 2014