Sebastian
Berisha
a,
Shengyuan
Chang
b,
Sam
Saki
c,
Davar
Daeinejad
a,
Ziqi
He
d,
Rupali
Mankar
a and
David
Mayerich
*a
aDepartment of Electrical and Computer Engineering, University of Houston, Houston, TX, USA. E-mail: mayerich@uh.edu
bSchool of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, Hubei, China
cDepartment of Electronics, Information and Bioengineering (DEIB), Polytechnic University of Milan, Milan, Italy
dMing Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA
First published on 23rd November 2016
There has recently been significant interest within the vibrational spectroscopy community to apply quantitative spectroscopic imaging techniques to histology and clinical diagnosis. However, many of the proposed methods require collecting spectroscopic images that have a similar region size and resolution to the corresponding histological images. Since spectroscopic images contain significantly more spectral samples than traditional histology, the resulting data sets can approach hundreds of gigabytes to terabytes in size. This makes them difficult to store and process, and the tools available to researchers for handling large spectroscopic data sets are limited. Fundamental mathematical tools, such as MATLAB, Octave, and SciPy, are extremely powerful but require that the data be stored in fast memory. This memory limitation becomes impractical for even modestly sized histological images, which can be hundreds of gigabytes in size. In this paper, we propose an open-source toolkit designed to perform out-of-core processing of hyperspectral images. By taking advantage of graphical processing unit (GPU) computing combined with adaptive data streaming, our software alleviates common workstation memory limitations while achieving better performance than existing applications.
In addition to improvements in optics and detector systems, IR spectroscopy may benefit extensively from the availability of quantum cascade laser (QCL) sources.8 At the time of writing this manuscript, discrete frequency infrared (DFIR) systems using QCL sources are commercially available. These systems provide fast absorbance imaging at individual wavelengths and can significantly reduce imaging time when the desired spectral components are known beforehand.9 With further advances in IR instrumentation on the horizon, spectroscopic imaging systems are approaching the data throughput and robustness necessary for practical clinical applications.
However, a major bottleneck remains in the area of image processing and analysis. In particular, spectroscopic data sets contain hundreds of times more spectral content than traditional color histology images. This can result in data sets exceeding a terabyte of storage for a high-resolution (e.g. 1.1 μm per pixel) image of a tissue microarray (≈2 cm3). Data sets of this size are difficult to manage using the consumer hardware generally available through laboratory workstations. Alternatives include the use of high-performance computing (HPC) using clusters or supercomputers. However, these options generally require specialized software development, and the data sets must be transferred using ethernet connections with a limited bandwidth.
The problem of data maintenance is amplified in research applications where data often undergoes a range of processing steps, with experiments being performed at each output stage. These processing algorithms can range from basic piecewise linear baseline correction to more complex scattering inversions based on Mie theory.10,11 However, the tools for managing large data sets on standard workstations are severely limited. While standard mathematical packages, such as MATLAB (Mathworks), Octave, and SciPy, are robust and flexible, they require that data fit in a fraction of the available memory for processing. This is generally because the underlying functions rely on highly optimized algorithms that are memory intensive. Commercial applications such as ENVI (Harris Geospatial) provide the ability to manage large data sets, but these packages are expensive and difficult to extend without expertise in IDL software development.
In this paper, we describe the implementation of an open-source software framework for hyperspectral image processing, with a focus on biomedical image analysis with large data sets. All algorithms are implemented “out-of-core” by streaming from a secondary storage (e.g. hard drives, RAID, NAS). In this way, data fetches can hide processing time, ideally resulting in the processing of a data set in the same amount of time required to copy that data. In cases where data processing significantly exceeds data streaming, GPU-based algorithms are used to accelerate the processing and thereby minimize overall processing time. GPU processing is natively supported and transparent to the user (if appropriate hardware is available). Our algorithms and all the required libraries are distributed using the BSD license, and can therefore be included in other open-source applications and even commercial software. Data structures are designed to provide extensibility and are easily included in other software packages.
In general, the community has established a set of common processing protocols that have been shown to be effective for biological samples.12 In this paper, we will discuss our results in implementing a broad base of algorithms that generally fall into the following categories:
• Noise reduction – FTIR images often undergo a limited noise reduction process that is broadly divided between the processing of independent pixels, such as apodization and Savitzky–Golay13 filters, and image processing algorithms, commonly including the maximum noise fraction (MNF) transform.14,15
• Baseline correction/normalization – IR images are prone to effects caused by light scattering and density changes within heterogeneous samples. The proposed corrections include piecewise linear baseline correction or derivative calculations. These are generally followed by calculating either the vector (Euclidean) norm or performing band-normalization by dividing by a single common band (e.g. Amide I at ≈1650 cm−1).
• Dimension reduction – IR absorbance spectra generally contain information that is considered sparse in some subdomains. Identifying the areas of sparsity can significantly simplify downstream processing by reducing the number of image bands. Commonly used techniques include principal component analysis (PCA), end-member estimation such as vertex component analysis (VCA),16 or various chemometric techniques.17,18
• Classification – The goal of most histological studies is the study of the distribution of various tissue types in a tissue section. The final step in the spectroscopic imaging process is therefore most commonly pixel classification based on spectral features. This can take the form of unsupervised techniques, such as k-means clustering or hierarchical cluster analysis (HCA). Alternatively, supervised techniques can be used in combination with tissue annotations19–21 or even tissue overlays to duplicate classic stains.22
The primary strategy behind the implementation of our framework is two fold: (a) hide the computational cost of algorithms within data fetches from a secondary storage and (b) minimize the time spent on these data fetches. When the computational costs for an algorithm cannot be hidden behind data fetches, we rely on the graphics hardware to further reduce computational costs.
In SIproc, we implement an adaptive scheme that utilizes asynchronous streaming while automatically selecting streaming and processing parameters during runtime using a gradient descent. Our software samples various parameters, particularly the streaming batch size, and maximizes data throughput over time using the gradient descent. For large data sets this technique offers increased performance by adjusting batch size to take advantage of disk caching and buffering at the beginning of a process, and dynamically changing batch size to compensate for slower data throughput as processing continues.
We have developed specialized versions of each algorithm for the three most common interleave formats: BSQ, BIL, and BIP (Fig. 1). In cases where an interleave format makes an algorithm prohibitively time consuming, conversion is performed automatically.
For example, we have implemented PCA using the covariance matrix method, which is less stable than the alternatives but does not require random access across the entire data set. This algorithm requires three steps:
• Calculation of the mean spectrum, which can be done efficiently with any interleave format.
• Calculation of the mean covariance matrix, which requires an O(b2)-time tensor product calculation for each spectrum, where b is the number of bands.
• Eigendecomposition of the mean covariance matrix.
Since the eigendecomposition is independent of the number of pixels, this operation is relatively efficient with standard libraries like LAPACK. The most intensive algorithm in this example is computing the mean covariance matrix. When the data are optimally interleaved (BIP), the O(b2) calculation can be done in main memory. On most systems, this becomes the bottleneck. Fortunately, the tensor product is highly data parallel and amenable to GPU implementation. We use the cuBLAS matrix library to perform a fast rank-2 update, significantly improving performance (Fig. 4). Even after this optimization, PCA calculation is still compute limited and could be further parallelized on a multi-GPU system. The same principles apply to other common statistical methods, such as the MNF transform.14
Fig. 4 Performance improvements using asynchronous adaptive streaming with GPU computing. Data processing throughput (in MB s−1) is shown as a function of data size (in GB). Dotted lines with circle markers indicate the performance of MNF (forward and inverse algorithms) while solid lines with triangle markers show the performance of PCA (statistics computing and projection) implementations. Note that, due to memory limitations, performance results for MATLAB are shown only for a data size of 11.5 GB. The LAPACK matrix libraries are used by all tested applications, including MATLAB,28 IDL,29 and SIproc. However, MATLAB uses more stable and efficient methods that are impractical to apply to data sets processed out-of-core. SIproc attempts to mitigate these instabilities by using 64-bit floating point operations, which reduces GPU performance by ≈50%. |
Other algorithms, such as the BSQ → BIP interleave conversion, are more ambiguous. On almost every system we tested, interleave conversions were IO limited. However, our SSD RAID0 system (see Results and discussion) provided sufficient IO throughput to see a significant gain in performance (≈2×) using CUDA (Fig. 3).
In addition to standard pre-processing and statistical methods, we have also included supervised and unsupervised learning methods such as k-means clustering, random forests,25–27 and artificial neural networks designed to implement stainless staining.22
Workstation profiles of Fig. 3 were tested on a single 64 GB bone biopsy tissue microarray (TMA) and tested across four systems:
• System A (high-performance workstation) – 2× Intel Xeon E5-2643 processors, 64 GB RAM, 3TB SSD RAID0, Tesla K40 GPU.
• System B (mid-range workstation) – Intel i7-4790, 32 GB RAM, 1TB SSD, GeForce GTX 970 GPU.
• System C (personal computer) – Intel i5-4690, 16 GB RAM, 3TB HDD, GeForce GTX 970 GPU.
• System D (laptop) – Intel i7-4702HQ, 8 GB RAM, 3TB USB-3 external drive, GeForce GTX 970M GPU.
Data sets used for analysis were collected using an imaging spectrometer consisting of a Cary 670 Series FTIR Spectrometer coupled to a Cary 620 Series FTIR Microscope (Agilent Technologies, Santa Clara, CA). The system was equipped with a 128 × 128 pixel focal plane array (FPA) detector and 0.62NA objective, providing a pixel size of 1.1 μm resolution in high-magnification mode and 5.5 μm when mapping the FPA to the objective FOV. Spectra were recorded at 4 cm−1 resolution.
We demonstrate results from SIproc on the pre-processing and k-means clustering of spectra as well as MNF noise reduction. Both a kidney and breast biopsy array (Fig. 5 and 6) were purchased from Amsbio (AMS Biotechnology Europe). The samples were formalin fixed and paraffin embedded (FFPE) tissue sections cut at 5 μm thickness and mounted on calcium fluoride (CaF2) slides. Adjacent sections were mounted on standard glass histology slides and stained with hematoxylin and eosin (H&E). The IR slides were imaged on a Cary Series FTIR microscope using 1 co-addition, resulting in relatively noisy images. The images were pre-processed to remove noise using the MNF transform, keeping 10 signal bands. Spectra from the kidney images are shown in Fig. 5 both before and after the MNF processing step. The images were then baselined using a piecewise linear correction and normalized to the Amide I band (1650 cm−1). k-Means clustering was performed using k = 3 and k = 6 clusters. The results were color-mapped and overlaid onto the IR images at Amide I in the baseline-corrected data so that structural features could be seen. Note that increasing the cluster resolution provides significantly more class specificity, allowing the separation of ductal epithelium at k = 6, which is barely visible in k = 3 (Fig. 6b). Intra-lobular stroma, which is confounded with the adjacent epithelium at k = 3, is also more clearly separated at k = 6. Colors were manually specified based on the class most closely corresponding to epithelium (red/orange) and stroma (green/cyan).
For the majority of data processing algorithms on most systems, data processing throughput is limited by IO. In these cases, we see an ≈40% increase using an adaptive search for an optimal batch size for the specific workstation hardware. In the cases where throughput is limited by data processing, GPU computation significantly increased performance over the corresponding CPU-based algorithm (Fig. 4). Comparisons of CPU-based code are done with ENVI 8.2 and MATLAB 2016a.
Note that GPU-based plugins are available for MATLAB and IDL through third-party extensions. We would expect these to perform similar to SIproc for algorithms that are not IO limited. However, MATLAB is generally optimized for the analysis of data sets that can be stored in the main memory. SIproc is optimized for data streaming and long-term efficiency during data processing. While this comes at the expense of flexibility for smaller data sets, we also provide functions to reduce and manipulate the data to make it easier to import into standard packages such as MATLAB.
We have implemented algorithms to manage data from several commercial vendors, including FFT and mosaic construction for Agilent Cary FTIR imaging systems and mosaic assembly for Daylight Solutions SPERO QCL imaging systems. Our FFT is implemented using the cuFFT GPU accelerated FFT library and performs ≈10× faster than those provided with the instrumentation. Other algorithms implemented in SIproc include: spectral baseline correction, normalization, image classification using random forests,27,30 digital staining,22 MNF noise reduction,14 dimension reduction, masking and thresholding, and standard image manipulation tools such as merging and cropping. SIproc also includes a tool for visualization of hyperspectral images using streaming. The source code, testing data, and a detailed tutorial are available online.31
For example, standard processing for a biopsy TMA can be completely scripted. This includes the assembly of raw data, FFT, baseline correction, normalization, PCA, and k-means clustering (Fig. 6).
The primary feature of SIproc is that the code is open source and available under the BSD license. The code is free to use in open-source and commercial software and all libraries used in our software follow the same license. We have designed the software to be simple to integrate into other applications. All data are stored as raw binary files with spatial and spectral information encoded using the publicly available ENVI header format.32
This journal is © The Royal Society of Chemistry 2017 |