Evaluation of grade and invasiveness of bladder urothelial carcinoma using infrared imaging and machine learning †

Urothelial bladder carcinoma (BC) is primarily diagnosed with a subjective examination of biopsies by his-topathologists, but accurate diagnosis remains time-consuming and of low diagnostic accuracy, especially for low grade non-invasive BC. We propose a novel approach for high-throughput BC evaluation by combining infrared (IR) microscopy of bladder sections with machine learning (partial least squares-discrimi-nant analysis) to provide an automated prediction of the presence of cancer, invasiveness and grade. Cystoscopic biopsies from 50 patients with clinical suspicion of BC were histologically examined to assign grades and stages. Adjacent tissue cross-sections were IR imaged to provide hyperspectral datasets and cluster analysis segregated IR images to extract the average spectra of epithelial and subepithelial tissues. Discriminant models, which were validated using repeated random sampling double cross-validation, showed sensitivities (AUROC) ca. 85% (0.85) for the identi ﬁ cation of cancer in epithelium and subepithe-lium. The diagnosis of non-invasive and invasive cases showed sensitivity values around 80% (0.84 – 0.85) and 76% (0.73 – 0.80), respectively, while the identi ﬁ cation of low and high grade BC showed higher sensitivity values 87 – 88% (0.91 – 0.92). Finally, models for the discrimination between cancers with di ﬀ erent invasiveness and grades showed more modest AUROC values (0.67 – 0.72). This proves the high potential of IR imaging in the development of ancillary platforms to screen bladder biopsies.


Introduction
Bladder urothelial carcinoma (BC) constitutes 3% of cancer diagnoses worldwide.7][8][9] The implementation of appropriate patient treatment should include morphological verification and recognition of its type, stage, and grade.The most common bladder cancer is urothelial bladder carcinoma, which constitutes 90% of cases in this localization.Histological examination is particularly difficult for small biopsies, especially if they are thermally damaged during their preparation.Moreover, unequivocal tissue morphology requires additional immunohistochemistry (IHC).The reproducibility of the histological recognition of BC features is approximately 60% while the reproducibility of its invasiveness amounts to 75% due to intra-and inter-observer variations in the subjective judgment of pathologists. 4,5isdiagnosis also results from the histological similarity of some tumours or poor cell differentiation in the tissue.
In the last decade, Fourier Transform Infrared (FTIR) microspectroscopy has been proposed as an alternative technique for the diagnosis of cancer in tissue.The IR spectrum shows absorption bands from different biochemical components including, proteins, lipids, carbohydrates, and DNA.][12][13] The main advantage of this technique is that it can be fast and easily automatable, without requiring staining and the inspection of a trained pathologist. 14Infrared (IR) spectral histopathology offers information at a microscopic scale.Identified chemical changes appearing in cancers can indicate novel biomarkers and then serve as ancillary and diagnostic tools.IR data sets are mathematically analysed using well-developed algorithms for unsupervised image segmentation in the first step and then for blind classification. 14So far, IR spectroscopy and microscopy have been used for BC diagnosis by analysing urothelial cell lines, 15,16 urine sediment, [17][18][19] cytology, 20 and also bladder tissues. 21,22A few studies showed differences in IR spectra between BC subtypes but from only nine patients or between control and BC when the bladder was probed by fiberoptics. 22,23Other modalities have been also tested for the detection of BC, including Raman 24,25 and Coherent anti-Stokes Raman spectroscopy (CARS) 24 spectroscopy as well as Second-harmonic imaging (SHG) imaging. 25A report by Demos et al. has demonstrated the detection of bladder carcinoma from autofluorescence signal in Raman spectra from fresh bladder tissues but without the recognition of the cancer aggressiveness. 24Whilst Raman, CARS, and SHG imaging perfectly have differentiated urothelial cells in urine for the classification of high grade BC from control patients. 25he most important predictive factor in BC diagnosis is staging referring to the degree of tumour infiltration into the bladder and metastases.Non-invasive BC (Tni here) includes in situ and papillary BC (Tis and Ta, respectively).Invasive stages are denoted as follows, T1tumour present in subepithelial tissue only, T2tumour invading the muscularis propria, T3tumour invasion to perivesical connective tissue, and T4tumour invasion to other organs. 6,26,27A grading system was also introduced by the International Society of Urological Pathology and WHO to express the low (LG) and high (HG) malignant potential of urothelial carcinoma. 6,28The currently used WHO system aims at assigning tumours to different prognostic groups, which then allows for appropriate treatment through transurothelial resection of tumour or cystectomy. 6A significant number of patients are nowadays diagnosed with non-invasive BC (51%), then localized (T1, 34%), and regional BC (T2 and T3, 7%).Approximately 5% of cases are urothelial BC with metastasis (T4). 1 The survival rate is ca.95% for Tni and is lower for higher stages, for example, 69.5 and 36.3% for T1 and T2, respectively. 1ince clinical classification of BC is not trivial, this study evaluates FTIR microspectroscopic as digital pathology to discriminate BC stage and grade in a large cohort of biopsies from 50 patients.IR imaging identified not only the tumour region as in HE but also other tissue types as confirmed by the certified pathologists.Next, we examined each of them to select promising candidates for further discrimination analysis of the clinical importance.Finally, we determined Receiver Operating Characteristic (ROC) curves to assess accuracy.Our FTIR-based models are directly related to clinical histopathology procedures.

Experimental
The study was accepted by the First Local Ethics Committee of the Jagiellonian University Medical College in Krakow -UJ CM (no.1072.6120.100.2018).Patient consent was not necessary because we used only the remaining sample after standard diagnostics.Permission to use human tissues for research was obtained with the supervision of clinically-qualified staff in accordance with Polish law.

Sample collection and clinical classification
Clinical characteristics of fifty patients are listed in Table 1.Bladder excisions were collected during cystoscopy according to a standard operating procedure in the Department of Urology of UJ CM after standard clinical diagnosis of suspicion of bladder cancer.The inclusion criterion was adult, and the exclusion criteria were pregnancy, history of radiotherapy in the pelvic region, and bladder cancer other than the urothelial type.Biopsy samples were fixed with 4% buffered formalin and embedded in paraffin in the Department of Pathomorphology, UJ CM.Adjacent cross-sections were placed on CaF 2 and glass slides for FTIR imaging and staining, respectively.Sections were prepared at a thickness of 7 and 3.5 μm, respectively.Expression of glucose transporter 1 protein (Glut-1) was determined according to a standard IHC protocol employed in UJ CM (1 : 200 rabbit polyclonal antibody; Bioassay Technology Laboratory, Shanghai, China).Photographic documentation was collected by using an Olympus BX53 light microscope equipped with an Olympus DP27 digital camera.A histopathological assessment was performed according to guidelines of WHO 2017 and the International Society of Urological Pathology. 6appropriate.
An imaged area was adjusted to the size of the tissue crosssection (approx.4900 μm × 3500 μm) and one IR image was acquired from this area (Fig. 1).We applied also high spatial sampling with 5× enhancing optics with a projected FPA pixel size of 1.1 μm × 1.1 μm (High Definition (HD) FTIR imaging).All transmission FTIR spectra were recorded by co-adding 32 (SD) and 128 (HD) scans in the range of 3700-900 cm −1 with a spectral resolution of 4 cm −1 .

Data acquisition and pre-treatment
Pre-processing and chemometric analysis of collected FTIR images was performed using CytoSpec (ver.2.00.01), 33 MATLAB (R2021a, MathWorks, Natick, MA, USA), Unscrambler X (v.10.5, Camo, Montclair, NJ, USA), and Origin 9.1 (ver.2020b, OriginLab, OriginLab Corporation, Northampton, MA, USA) software.Subsequently, the removal of water vapour lines, quality test (based on signal-to-noise ratio in the regions of 1700-1600 and 1800-1900 cm −1 , respectively), PCA-based denoising (10 PCs), calculation of second derivative spectra (Savitzky-Golay; 13 points) and vector normalization (spectral region: 1780-1000 cm −1 ) were performed before segmenting images (CytoSpec, MATLAB).Unsupervised hierarchical cluster analysis (UHCA) was executed in the 1780-1000 cm −1 region using the second derivative FTIR spectra.Spectral distances were calculated as D-values while individual clusters were extracted according to a Ward algorithm.The number of UHCA classes was adjusted according to HE tissue morphology.In that way, IR images were clustered into classes of tissue structures (UHCA maps) and their mean FTIR spectra were used for building models and their verification.In some cases, more than one class had to be assigned to epithelial and subepithelial tissues due to the complexity of tissue organisation.

Model building and testing
Standard Normal Variate and mean centring were performed before modelling in the 950-1800 cm −1 region.Repeated random sampling double cross-validation (DCV) was used to (1) optimize the model and (2) provide a realistic estimation of classification errors when the model is applied to external patients.In the outer loop, the spectra were randomly split between calibration (66%) and test (34%) keeping all the spectra of a patient either as a calibration or as a test set.Then, PLS-DA models were created and optimized with the calibration dataset using the leave-one patient-out to select the optimal number of latent variables (from a maximum of 10).The selected model was next used to predict the corresponding test set and figures of merit of classification, i.e.AUROC (Area Under the Receiver Operating Characteristic), Specificity, Sensitivity, and Accuracy.This process was repeated 100 times with different calibration and test sets until obtaining a distribution of the different classification parameters, which was employed to study the generalization capabilities of the model.On the other hand, to study the regression vector, a model was constructed with the entire data set and validated by cross-validation of the leave-one patient out.The model was validated using a permutation test as in 34 to ensure the significance of the spectral markers obtained.The classifications considered were BC versus normal, invasive BC versus non-invasive BC, and HG BC versus LG BC.A schematic of our approach is presented in Fig. 1.

Morphological and IHC assessment
The inner layer of the urinary bladder wall is lined with transitional epithelium called urothelium.The deeper structuresubepithelial connective tissue includes lamina propria and muscularis propria.The outer structureadventitia is composed of fat and fibrous tissue and leads to adjacent organs.Herein, taking into account the morphology of typical excisions, which rarely contain muscularis propria, the segmentation of SD IR images gave classes attributed to the epithelium (blue shades of UHCA classes) and subepithelium ( pink shades of UHCA classes), see Fig. 2A and D. The epithelium contains both normal urothelium and epithelium subjected to the neoplastic process.
The subepithelial class with lamina propria includes fibrous tissue, blood and lymph vessels, muscularis mucosae, and muscularis propria (Fig. 2C and F).Muscularis propria might be cumbersome in identification in HE images, even by the trained pathologist, since muscle fibres are often irregular, with directions and are embedded in connective fibrous tissue.The latter is also heterogeneous. 35lustering all IR spectra into the epithelium and subepithelium was important to answer the question of whether the spectra of the deeper tissue, which is not BC, could be useful in spectroscopic BC diagnosis.Pathologists cannot recognize BC based only on changes in subepithelial tissue morphology without BC cells.
The normal epithelium is composed of 5 to 7 cellular layers that often form invaginations called Von Brunn nests.The most important morphological features of BC are increased nuclear to cytoplasmic ratio, altered cell stratification, an increased thickness of the epithelium, mitoses, irregular nuclei shape, and hyperchromasia. 6Papillary structures are features of bladder cancer, hyperplasia, and PUNLMP.The exemplary HE images show the complexity of the histological features, see Fig. 2.
Analysing massive hyperspectral data brings to us some valuable observations.One of them regards the usefulness of standard and high definition IR imaging for mirroring tissue structure.Applying higher spatial resolution provides better discrimination of tissue structures and their borders than SD (Fig. 2).The irregular contours show the invasion of carcinoma by small nests or cells, whereas the smooth ones result from the tangential arrangement of structures.In particular, HD FTIR images reveal irregular patches of different classes in invading BC, contrary to smooth epithelium borders in PM and Tni cases.From a histopathological point of view, the smoother the epithelium borders are, the lower the possibility of BC invasion is expected.In turn, SD FTIR microscopy is a more efficient tool than HD because of the shorter data collection time and thus imaging of the whole excision is possible.Bearing in mind that SD FTIR imaging has a 5-fold lover spatial resolution, we can calculate that it is 200 times faster in the measurement of the same area than HD FTIR.The question appears whether the whole sample imaging is required to bring valuable information about BC.If early BC foci are to be detected and the heterogeneity of advanced BC needs to be investigated, the answer is that the SD FTIR approach is more robust than HD FTIR imaging (Fig. 2).
The IHC detection of glucose transporter 1 (Glut-1) can support the proper recognition of malignant epithelium in the bladder.Membranous positive Glut-1 expression around single cells co-occurs with BC morphological changes and is associated with an increased glucose uptake assisting the proliferation and survival of cancer cells.Its overexpression is also associated with tumour hypoxia and implicates a worse prognosis and chemotherapy-resistant neoplasm.We compared Glut-1 immunostaining patterns in the investigated cases with well-detected carbohydrates in the 1000-1200 cm −1 region of FTIR spectra (Fig. 3, detailed band assignments in Table S1 †). 36BC cases have a membranous positive Glut-1 reaction with a patchy/focal pattern (Fig. 3).We observed there a cytoplasmic reaction also, but it is not useful for diagnosis.
The majority of the positive Glut-1 area in Tni is similar to those of the highest carbohydrate level in the IR images in contrast to Tinv cases (Fig. 3.1A-C).
This finding results from the fact that the IR-based level of carbohydrates does not illustrate the distribution of the glucoseprotein transporter complex only.The IR profile in the 1000-1200 cm −1 region shows variation in the signal shape in normal epithelium and BC cases (Fig. 3.2).Furthermore, the accumulation of carbohydrates in some cases is more sensitive to early neoplasia than the expression of Glut-1 (Fig. S1 †).This implicates that the IR distribution patterns of carbohydrates have power as a label-free glycomic tool.The Comparison of IR distribution of carbohydrates and IHC staining.(A) IR images for integration in the 1000-1200 cm −1 region, (B and C) Glut-1 IHC images; the darker colour is observed, the higher is the presence of transporter density (magnification 2× and 20×, respectively).( 2) FTIR spectra from pixels marked with a white star in (1.A).I-V: normal urothelium (N), potentially malignant urothelium (PM), and BC groups (T0 LG, T1 HG, and T2 HG).
correlation of carbohydrate FTIR-based distribution and IHCbased Glut-1 expression at different stages of remains enigmatic, because of the similar location of the high carbohydrate level and membranous Glut-1 expression in T0 but not at higher stages accompanied by changes in the IR band shape (Fig. 3).In addition, we demonstrate that the IR map for the carbohydrate distribution localised small foci of early BC, even though immunohistochemistry does not indicate them (Fig. S1 †).
IR imaging-based segmentation of urothelial tissues and classification model SD FTIR imaging enables scanning of large areas, and our primary goal was to extract unique spectral profiles of epithelium and subepithelium tissues annotated in HE images (blue and pink classes in UHCA maps, Fig. 2A).Next, High Definition (HD) FTIR imaging was applied to the selected regions of the bladder wall to examine whether additional classes of a unique spectral signature could be discriminated (Fig. 2D).
Using spectra extracted from the UHCA analysis, a deep learning method was employed for the classification of bladder carcinoma.PLS-DA methodology is often used in spectral discrimination. 37Here, the generalization power of the models was established by using repeated random sampling, where 100 splits of calibration and independent tests were considered.In all cases, the splits were performed considering the patients, i.e.FTIR spectra of the epithelial and subepithelial tissues from the same patients were always included either in calibration or test sets.Parameters of the PLS-DA models for various pairs of experimental groups show a high power to differentiate BC patients from healthy individuals (see Table 2 and Fig. S2 †).
Here, FTIR spectra of both layers of the bladder wall can be used for the recognition of any BC case with accuracy above 85%.The attempts at the classification of the PM cases produced any model.Next, the cancer cases were classified to recognise bladder cancer invasiveness of urothelial cancer to deeper tissue layers (Tni versus Tinv) and grading (LG BC versus HG BC), c.f. Table 2.The AUROC, accuracy, sensitivity, and specificity for Tni and Tinv indicated the good performance of the test when IR spectra of the subepithelial tissue were considered.This also indicated significant spectral alternations in this bladder tissue due to the heterogeneous invasion of cancer cells.Discrimination attempts between the T1 and T2 spectra failed.The low and high grade groups that also included non and invasive urothelial carcinomas were well classified according to the IR signature of the epithelium.Moreover, the comprehensive investigation proved that the division of N vs. BC into the more complex model, for example, N vs. Tni and N vs. Tinv or N vs. LG BC and N vs. HG BC, is profitable as they present higher performance (Table 2).Averaged second derivative FTIR spectra are consistent with PLS-DA regression vectors calculated in the developed models (Fig. 4).FTIR spectra of detailed groups are collected in Fig. S3 and S4.† The IR spectra displayed in Fig. 4, Fig. S3 and S4 † show that high intensity of the bands of proteins (1652 and 1544 cm −1 ) and nucleic acids (995 and 967 cm −1 ) and lowintensity collagen features (1393, 1335, and 1200 cm −1 ) indicate BC.For instance, unordered structures of proteins (1637 cm −1 ) became pronounced in the HG BC tissue matrix compared to α-helical conformations and the collagen level.In turn, the glycogen spectral signature (1154, 1081, and 1026 cm −1 ) does not show a clear linear relationship considering the malignant nature of the BC disease and the highest level is observed in the early LG BC.The invasive stages of BC are characterised by increased absorbances at 1637, 1393, and 1233 cm −1 and decreased ones at 1544 and 1200 cm −1 compared to non-invasive BC.This could be associated with epithelial to mesenchymal transformation.Moreover, the carcinogenesis caused the band shifts indicating changes in structures of biocomponents, e.g.1543 vs. 1547 ( proteins), 1028 vs. 1025 (carbohydrates), and 1120 vs. 1117 cm −1 (nucleic acids), in N and BC, respectively.The most dominant effect on cancer metabolism is the Warburg effect. 38Cancer cells need more glucose, and this is accomplished by up-regulation of Glut-1.Glut-1 upregulation is in turn induced in early stages of many  39,40 Cancer cells need more glucose, and this is accomplished by up-regulation of Glut-1.summarise the potential IR markers of the BC invasiveness and grading based on the most dominant vectors from the plots in Fig. 4B (vectors have been detailed in Table S2 †).The epithelium and subepithelium IR features in the N vs. BC model show main differences in the range from 1700 to 1400 cm −1 (the protein region) whereas the Tni and Tinv groups exhibit several peaks in the entire fingerprint region.This comparison of the vector positions proves the fact that the IR profile of the studied cases is unique for each patient group.
The PLS-DA models reveal similar accuracy when epithelial and subepithelial spectra are used for the general BC classification (Table 2 and Fig. S2 †).Whilst the better parameters of the assignment are obtained from the subepithelial IR spectra for the recognition of BC infiltration, the epithelial profiles are suitable for the grade definition.The performance (AUROC values) of epithelium and subepithelium-based classification for the N vs. Tni model is similar to the N vs.
BC groups (ca.86) whereas for the N vs. Tinv is lower (70-80).In the case of grading, the accuracy of the epithelial model for N vs. LG BC and N vs. HG BC groups is higher than for N vs. BC (91, 92, and 86, respectively).Interestingly, we achieved similar classification accuracy for LG BC and HG BC for IR imaging of cytological samples. 20This indicates the high power of this molecular tool for the prediction of BC grading.Considering the worst assignment of BC staging, we conclude that the invasion is often accompanied by inflammatory cells which could affect IR spectra of the tissue.In this work, the T1 group cannot be distinguished from T2 and higher stages because of too large variability between spectra of patients in the advanced BC epithelium and pronounced spectral changes in the subepithelial class for deep infiltration.In turn, potentially malignant cases (PM) studied here and assigned to the N or BC groups implicated a need for long prospective study with the observation of patients and the determination of their molecular and metabolomic features to confirm their correct classification.This is not surprising since very early neoplastic changes can stop, progress or regress.

Conclusions
FTIR imaging combined with clustering of the epithelium and subepithelium provides a powerful tool for the identification of BC and its invasion.This approach has detection and prognostic capability.Our work proves the possibility of BC recognition and its invasion based on IR spectra of the superficial and deep layers of the bladder wall with the specific features for grades and stages.The determined parameters of the classification are satisfactory and comparable to clinical standards.The added value of our findings is also the fact that the spectroscopic analysis can be limited to surface layers of the bladder wall reducing the time of data collection and analysis.Therefore, our work gives "the green light" to establish automatic spectral histopathology of bladder excisions supported by machine learning approaches.

Fig. 1
Fig.1The data collection and analysis scheme.

Fig. 2
Fig. 2 IR imaging results for normal (N), potential malignant (PM) and BC groups (Tni LG, T1 HG and ≥T2 HG).(A and D) UHCA segmentation, (B and E) distribution of carbohydrates, (C and F) HE microphotographs (magnification 2× and 20×, respectively).Colour code in (A and D): blueepithelial tissue (light and dark blue correspond to a high-and low-content of carbohydrates, respectively); pink-subepithelial fibrous tissue; redsubepithelial muscular tissue.The green rectangle in (C) marks HD FTIR image.

Table 1
Clinical characteristics of 50 patients with pathological classification

Table 2
Results of the Monte Carlo double cross-validation cancers.