Laura
Barsanti
,
Lorenzo
Birindelli
and
Paolo
Gualtieri
*
CNR, Istituto di Biofisica, Via Moruzzi 1, Pisa, 56124, Italy. E-mail: paolo.gualtieri@ibf.cnr.it; Tel: +39 050 315 3026
First published on 31st August 2021
Marine and freshwater microalgae belong to taxonomically and morphologically diverse groups of organisms spanning many phyla with thousands of species. These organisms play an important role as indicators of water ecosystem conditions since they react quickly and predictably to a broad range of environmental stressors, thus providing early signals of dangerous changes. Traditionally, microscopic analysis has been used to identify and enumerate different types of organisms present within a given environment at a given point in time. However, this approach is both time-consuming and labor intensive, as it relies on manual processing and classification of planktonic organisms present within collected water samples. Furthermore, it requires highly skilled specialists trained to recognize and distinguish one taxa from another on the basis of often subtle morphological differences. Given these restrictions, a considerable amount of effort has been recently funneled into automating different steps of both the sampling and classification processes, making it possible to generate previously unprecedented volumes of plankton image data and obtain an essential database to analyze the composition of plankton assemblages. In this review we report state-of-the-art methods used for automated plankton classification by means of digital microscopy. The computer-microscope system hardware and the image processing techniques used for recognition and classification of planktonic organisms (segmentation, shape feature extraction, pigment signature determination and neural network grouping) will be described. An introduction and overview of the topic, its current state and indications of future directions the field is expected to take will be provided, organizing the review for both experts and researchers new to the field.
Laura Barsanti | Laura Barsanti, graduated in Natural Sciences from the University of Pisa, is a scientist at the Biophysics Institute of the Italian National Council of Research (CNR) in Pisa (Italy). |
Lorenzo Birindelli | Lorenzo Birindelli, graduated from a technical high school, is a technician at the Biophysics Institute of the Italian National Council of Research (CNR) in Pisa (Italy). |
Environmental significanceThe monitoring of environmental water quality is essential for the appropriate management of water resources, for their governing and repairing by taking prompt actions in case of alert events. An effective water management procedure should take into evaluation microalgae present in water samples, because they reflect the overall water quality, integrating in their metabolism the effects of physical and chemical changes over time. In this tutorial review, we will focus on available digital microscopy systems, i.e. automatic systems based on a microscope interface with a personal computer equipped with an image processing unit, which have been developed for the identification and taxonomic classification of microalgae. The goal of automated systems is to combine a level of accuracy and precision higher than that of an expert taxonomist with a minimum analysis time. |
The term plankton refers to all organisms that live suspended in the water column and drift with the currents, because they are entrained by the prevailing movement of water. Plankton is the sustaining base of food chains in water bodies; its distribution and abundance play an essential role in the ecological balance of this environment, and can give reliable signals of its changes. Hence, the analysis of planktonic organisms is essential for an early alert of low water quality as prescribed, for example, by the European Water Framework Directive.2
Plankton can be divided into broad trophic groups: phytoplankton, consisting of photo-autotrophic algae; zooplankton, consisting of small heterotrophic protozoans or metazoans such as crustaceans; nutrient re-cycling bacterioplankton and mycoplankton (fungus-like organisms); and virioplankton, i.e. floating viruses. Planktonic organisms can be identified according to the size of their components: megaplankton, organisms of about 10 cm (e.g. jellyfish); macroplankton, organisms of about 1 cm (e.g. krill); mesoplankton, organisms of about 1 mm (e.g. copepods); microplankton, organisms in the size range 5–100 μm (e.g. microalgae and cyanobacteria); nanoplankton, organisms of about 1 μm (e.g. small eukaryotic protists); picoplankton, organisms of about 100 nm (e.g. bacteria); femtoplankton, organisms of about 10 nm (e.g. marine viruses).3
Micro-phytoplankton organisms (microalgae from now on) thanks to their short lifespan (on an average seven weeks) and generation times (on an average one day) are capable of fast, strong and predictable responses to different ecological and toxicological factors by modifying the composition and density of their population.4
Microalgae are routinely examined by means of a wide field optical microscope, one of the most commonly used laboratory tools, because it allows both shape recognition and provides inside details of organisms in the size range of these microorganisms (5–100 μm), which together with the color they possess, because of the presence of photosynthetic pigments, are essential for human-based taxonomic recognition and classification.
This first analytical step could be speeded up by making it as automated as possible to improve its reproducibility and effectiveness for water monitoring and protection purposes. Many automated microscope systems for the identification and the successive statistical analysis of the microalgae population have been implemented until now. These systems are effective in assessing the condition of water bodies even if they have to deal with hindering factors such as the very different size and morphological and physiological features of the thousand existing species of microalgae.5
According to the literature, identification of microalgae can achieve an accuracy between 67% and 83% for trained but not routinely engaged personnel, which increases to about 84–95% for routinely engaged personnel.6–8 This variation is due to the lack of unanimity in the classification, even when the inspected microalgae possess a very distinct morphology. The goal of automated systems is to combine a level of accuracy and precision higher than that of an expert taxonomist with a minimum analysis time.
In the following section, after an overview of the topic, we will focus on available digital microscopy systems, i.e. automatic systems based on a microscope interface with a personal computer equipped with an image processing unit, which have been developed for the identification and taxonomic classification of microalgae, with or without limitation to relatively narrow taxonomic groups.
Recently Lee et al.13 used an Imaging FlowCytoBot to acquire in situ high-frequency microalgae images. This automated, submersible equipment is based on flow cytometry and hydrodynamic focusing, can work underwater for months, and is able to capture up to 30000 high resolution images/h. This set-up provides a desirable improvement of flow cytometry methodologies.
Metagenomic analysis begins with the isolation and selection of the cells from the environmental sample by means of size fractionation, using filters with different porosities, or by flow cytometry. DNA is then extracted for sequencing by shotgun metagenomic (i.e. random sequencing of the whole DNA) or by barcoding gene amplification (i.e. the search for ubiquitous genes such as the 16S ribosomal DNA for prokaryotic algae and the 18S ribosomal DNA for eukaryotic algae). Due to the high target number of these RNA molecules in the cells, it is possible to design 18–25 base pair length probes with a very high taxonomic specificity.17 This specificity must be tested by comparing nucleotide sequences to sequence databases and calculating the statistical significance (BLAST) in order to find regions of similarity.
For shotgun metagenomic or barcoding analysis, DNA fragments (up to 800 nucleotides) can be directly sequenced using next-generation sequencing technologies or cloned in a vector for amplification and subsequent sequencing. Taxonomic assignment of shotgun metagenomic sequences is a challenging task because of the highly fragmented nature of the sequences, and the unbalanced set of reference genomes. Bioinformatics analysis is the main bottleneck for metagenomic projects. Annotation is a time-consuming task requiring comprehensive bioinformatics skills and highly trained experts.18,19
Rapid target identification of single toxic algal species such as the dinoflagellates Alexandrium minutum and Gymnodinium catenatum can be performed by means of sandwich hybridization, which is another barcoding analysis technique.20 A capture probe bound to a solid surface immobilizes the target ribosomal RNA and forms a hybrid complex with a second signal probe. When the solid surface is an electrochemical biosensor, the detection event is transformed into a measurable electrical current.20
Genomic analysis possesses high taxonomic resolution and can be applied also to preserved environmental samples.21 High taxonomic resolution is mandatory when toxic and non-toxic strains are morphotypes of the same species, and hence identification is very difficult by optical microscopy (e.g., the Alexandrium tamarense species complex).20 Quantitative real-time PCR-based assay, which simultaneously amplifies and quantifies the DNA, is a more sophisticated technique that increases the accuracy of metagenomic analysis.22
The composition and densities of the microalgae population is difficult to estimate by metagenomic analysis because this methodology quantifies DNA and/or RNA, which are species-specific traits and can vary depending on the growth phase, not single cells. Moreover, DNA can properly identify only species already studied, whose sequenced reference genome has been deposited in a database. Generally, only a single species or strain can be analyzed at a time in a quantitative approach; multiple and parallel reactions can be performed, but the parallel determination of specific taxa of algae requires difficult, time consuming and expensive validation.23
From an economic point of view, real-time PCR instruments are becoming affordable also for small research groups and are now quite common in molecular biology-equipped laboratories, thanks also to the low cost of consumable per sample (duplicate reactions about 20 $), which makes real-time PCR a potential routine method for monitoring applications.
Algae possess high absorption in the blue and red bands and high reflectance in the green and near infrared bands.27 To produce an image that highlights the HAB, reflectances in the red and near-infrared bands have been used for a long time to create Normalized Vegetation Index (NDVI) images. Nowadays, to improve these images other algorithms are used, such as the Floating Algae Index (FAI), which processes spectral information of the red, infrared and short-wave infrared bands to correct for the atmospheric effect. Moreover, to evaluate the concentration of the HAB biomass, the Chlorophyll Reflection Peak Intensity Algorithm is used, which is based on the reflectance of the blue, green, and red bands, and utilizes the correlation between the algae concentration and chlorophyll content.24
Remotely monitoring HABs could be complicated by the presence of multiple co-occurring species, optically complex waters and cloud gaps, and in general by variable atmospheric conditions. Hence, original satellite images are usually preprocessed to eliminate influences by aerosol and water vapor scattering, and cloud covering.
New hyperspectral sensors currently being studied, designed, and built for satellites, such as the NASA Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission, and scheduled to launch by 2023 hopefully will change the way of monitoring water quality from space with increased spectral, temporal, and spatial resolution.28 The sensitivity of this system will improve the limits of the previous systems, allowing the identification of the phytoplankton community composition and separation of phytoplankton pigment absorption from that of colored dissolved organic matter.29
Microalgae taxonomists usually base their analysis on algal cell morphological and hue features; therefore digital microscope systems equipped with a spectrophotometric unit, which possess sub-micrometric lateral spatial resolution, nanometric spectral resolution and detection of very low photon fluxes,27,41 seem to be the most adequate system for microalgae classification.
At the present stage of development, digital microscopy is not yet ready for field analysis applications, though it is very promising for automatic environmental monitoring and protection of public water supplies.
Though the above overview is over-simplified, we can draw a pros- and cons-list of the presented methodologies. Flow cytometry is the fastest, but the least accurate one; metagenomic is the most accurate, but it needs sound knowledge and a lot of patience; remote sensing can be used mainly for macro-scale analysis; digital microscopy has very good accuracy, but lab systems so far available are mainly limited to relatively narrow taxonomic groups. Still, it has been the best compromise until to now.
The average price for the set-ups of all these methodologies is over $100000. Leasing lab equipment is becoming a possible solution for accessing all the machines needed in a lab, especially the most expensive ones.
Fig. 1 The five steps of the automatic microalgae classification process by means of digital microscopy. |
The most extensive sampling effort should be made since statistical difference between sample devices and locations produces inaccurate and imprecise measurements and may eliminate the monitoring purpose. It should be kept in mind that the sampling step is mandatory for all the monitoring methodologies except for remote sensing.
An alternative method for sampling is the use of in situ imaging devices, consisting of submersible digital cameras. Some examples of these systems are the Shadowed Image Particle Profiling Evaluation Recorder,46 the Zooplankton Visualization System,47 the Video Plankton Recorder,48 the Imaging FlowCytobot,49 the in situ Ichthyoplankton Imaging System,50 the Underwater Video Profiler,51 the ZooScan,34 and the Scripps Plankton Camera.52 These sub-immerged systems are capable of acquiring and storing microalgae images in the field of view of the camera for a preset time. These images do not possess the necessary high quality, since they will be almost never acquired in transparent and calm water, will be blurred because of the passive or active movements of the cells, and will be barely in focus due to the great depth of the field. Moreover, images are acquired from a volume of several milliliters of water. Under these conditions, it is very difficult to identify microalgae in detail.
Slides from environmental samples are prepared in the laboratory and acquired on a digital microscope station. To obtain a higher density of cells for slides, some preliminary sedimentation procedure may be necessary for the environmental samples.53 Using a bright field microscope, equipped with a 40× objective, the slide is entirely scanned following a boustrophedonic path by means of a motorized microscope stage. One thousand images can be acquired in a slide that can contain about 15 μL of environmental sample (400 μm2 coverslip surface, ∼30 μm sample thickness). However, to improve the calculation of algae distribution in the sample, special devices such as the Utermohl chamber or fixed volume chambers with flow regulated by peristaltic micro-pumps in a microfluidic environment are used. A recent example of an image acquisition system based on fixed volume flow chambers was described by Kerr et al.54 These authors analyzed sub-samples of net planktonic materials collected out shore, and containing both phytoplankton and zooplankton, by processing them through a FlowCam VS-IVc automated plankton imaging system, fitted with a 300 μm path length flow cell and a 4× microscope objective, with an image acquisition rate of about 10 frames per second.54
The way images are acquired decides which kinds of features can be extracted for classification. Microalgae morphological features (i.e. contours, sizes, etc.), universally used for their identification require digital images with high spatial resolution (Full HD CCD camera with CMOS sensors). Less used algae color features (i.e. the pigment composition) require a color camera or microspectrophotometers. If the comparative standard for the automatic analysis is the judgment of an expert taxonomist, all the images should be acquired at the highest spatial (14 Mpixels with 1920 pixel for a line) and color resolution (24 bits with more than 16 million of colors) to gain a reliable understanding of the algal shape and pigment composition. As a consequence, slides from field samples should be prepared without any kind of processing or fixation to avoid unwanted manipulation of the algae shape and color.
It is very important to be aware that biological images are often far more difficult to process and recognize than daily-life images. The acquisition process should produce well-focused images with the highest information content to exploit in the successive steps. The microscope, the traditional one or a bench mounted one, should be set at the best performance of Koehler illumination requirements following the indication of Zieler.55 The illumination should be even and uniform to avoid shadows; the flux emitted from the tungsten lamp should be set so that the dark noise of the CCD camera has no influence and camera saturation does not occur. The lamp color temperature should be set at about 3000 K for color balance, while the selection of the optimal aperture diaphragm is made by inserting gray filters in the light path.55
B/W digital or color cameras placed in the optical tube by means a c-mount adapter ring are used for spatial information acquisition. The cameras must undergo calibration in three steps: white balancing, gamma correction, and matrix correction.56 The camera is interfaced with the computer by means of a firewire or USB ports. The standard time for digitalization and quantization is 40 ms. Up-to-date high storage disks (terabytes) eliminate storage problems.
Spatial and spectral information can be obtained also by using a transmission hyperspectral imaging (HSI) microscope system that generates a hyperspectral cube with x, y, and λ coordinates for each microalga.57 HSI data acquisition uses spatial-scanning and spectral-scanning techniques simultaneously. Spatial-scanning collects spatial information from a single narrow slit and reconstructs the whole image line by line, through a push-broom or whisk-broom scanning that relies on the movement of a motorized stage or the motion of a galvo-mirror respectively. The latter has the advantages of high imaging speed and efficiency.58,59 Spectral-scanning collects spectral information at different wavelengths, scanning the spectral range wavelength by wavelength by means of filter wheels,60 liquid crystal tunable filters (LCTFs),61 and acousto-optic tunable filters (AOTFs).62
Spectral information can be obtained also by adding fluorescence imaging as described by Schulze et al.39 and Degling et al.40 The first research group integrated an inverted fluorescence microscope with fluorescence imaging equipment using filters sets for chlorophyll a and b (excitation: 435 nm; beam splitter: 510 nm; emission: 515 nm), phycoerythrin (ex: 543; bs: 562 nm; em: 593) and phycocyanin (ex: 600/37 nm; b s: 625 nm; em: 655). The second research group acquired fluorescence images using a custom multi-band fluorescence imaging microscope, with multiple excitation wavelengths and monochromatic sensors.40 The use of fluorescence signals allows better discrimination between microalgal taxonomic groups and between microalgae and other objects present in the environmental samples. However, fluorescence imaging (with high light intensity) can cause bleaching of the pigments in the irradiated area, even if the exposure time is short. Moreover bleaching can occur also in positions adjacent to the irradiated area.
Another acquisition device is a digital microscope equipped with a polychromator-based microspectrophotometer that simultaneously records the in vivo absorption spectrum. In this system, a flat field imaging concave grating polychromator is connected to a high quality inspection probe (19 light-guides) in the back focal plane of one of the two ports of the binocular tube housing a CCD camera in the other port. The inspection probe forms a bundle at the level of the entrance pupil and becomes vertically aligned at the level of the exit pupil. Each light-guide acquires the light transmitted by a zone of the slide, and images it onto the diffraction grating, which disperses the impinging light into separate wavelengths. The dispersed image of the probe is in turn focused onto a digital slow scan cooled CCD camera. Absorption spectra from each light-guide can be measured using the values of the measured light intensity.63
Spectral information can be obtained also as described by Coltelli et al.37,56 In these systems, the digital microscope has a simple hardware set-up and implements sophisticate algorithms. Spectroscopic data are extracted from the color coordinates of the pixels of the digital image. In order to identify the color that characterizes the microalgae under examination, and achieve a better taxonomic discrimination, the color histogram of all the different pixel colors of the recognized in-focus cell in the L*C*h* color space is calculated. This color histogram, fitted in a mixture of multivariate Gaussian distribution using a maximum likelihood estimate of the component parameters, shows the region of the chloroplast (the photosynthetic pigments) and the region of the cytoplasm (the background). The coordinates of the mean of the Gaussian chloroplast region are the colors that represent the pigment signature of each algal species.37 From these characteristic color coordinates, Coltelli et al.56 reconstructed the absorption spectrum of the cell using a minimizing system of transcendental equations based on the absorption spectra of all the pigments under physiological conditions.
An interesting system to overcome the problem of using microalgae preservatives such as Lugol's iodine solution has been developed by Sbrana et al.64 Since Lugol drastically reduces the chlorophyll fluorescence signal, the group developed an opto-electronics system that combines 2D bright field microscopy and quantitative, non-interferometric phase microscopy. The system acquires out-of-focus bright field images and obtains information about the phase shift.
If the microscope setting requirements are satisfied, images within the scanning path show high quality since the illumination is even and has the appropriate intensity. However, out-of-focus images are still an unavoidable problem due to the fact that bright field microscopes can image only the focal plane and not all organisms occupy the same focal level once settled. Schulze et al.39 implemented an auto-focus function, which integrates different focal planes into one image, by scanning along the z axis. However, the process is time consuming and vibrations during the acquisition can produce artifacts.
Coltelli et al.37 used a fast and accurate method for recognizing in-focus cells, and discarding out-of-focus cells together with objects having a contour but with an irregular color distribution (empty cells, overlapping cells belonging to different algae taxa, colored particles, etc.). In-focus images with a unimodal cell color histogram are recognized from out-of-focus cells, which possess a bimodal cell color histogram, and are therefore discarded. An example of the color histograms of in-focus and out-of-focus cells is shown in Fig. 2.
Fig. 2 Example of automatic image acquisition: in-focus cell image (top) with the corresponding unimodal bi-dimensional and three-dimensional cell color histogram; out-of-focus cell image (bottom) with the corresponding bimodal bi-dimensional and three-dimensional cell color histogram. The magenta dots represent the mean color coordinates of the Gaussian fitting, while the orange dots represent the mean color coordinates of the background. Redrawn from ref. 36 and 37. |
Recently Guo et al.65 used a submergible digital holographic imaging system to acquire high resolution images of plankton. This system is based on a in situ imaging method, Digital Inline Holography (DIH), which illuminates the sampling volume with a laser beam and acquires the hologram produced by the interference between the scattered light from the particles present in the field and the non-scattered portion of the beam by means of a digital camera sensor. The in-focus 3d image of all the particles present in the sampling volume is reconstructed by numerically processing the holograms acquired at different planes.
These are few of the systems that use spectroscopic features, due to the complexity of the required hardware set-up. Coltelli et al.36,37 presented a system that extracts morphological features such as the contour, centroid distance spectrum, and dissimilarity measurement, together with spectrophotometric features such as the absorption spectrum and the characteristic color of single microalgae. They calculated the occurrence of all the different colors (color histogram) of the in-focus microalga under examination. This histogram was fitted in a mixture of the multivariate Gaussian distribution and showed only the photosynthetic pigment region (the chloroplasts) and the transparent region of the background. The color value of the maximum occurrence of the histogram chloroplast region is the color that represents the pigment signature of the taxonomic group the microalga belongs to. Fig. 4 shows the original image represented with millions of colors (a), the fitted color histogram and the fitted and digital chloroplast histogram (b), and the same image of Fig. 4a in which all the hues of the chloroplastic region are substituted with the hue of the calculated characteristic color (c).
Fig. 4 Example of feature extraction operation: characteristic color; (a) original cell image represented with millions of colors; (b) the fitted color histogram (top left), the fitted chloroplast histogram (center) and the original digital histogram of the cell; (c) the result of the substitution of the hue of the chloroplastic regions with the hue of the calculated characteristic color: the chloroplast is represented by a single hue, and the image looks identical to the original cell image. Redrawn from ref. 36 and 37. |
Other examples of the use of spectrophotometric features for microalgae description are those proposed by Verikas et al.,81 who exploited light and fluorescence microscopic images to extract geometry, shape and texture feature sets, and those of other groups previously cited.38–40
Fig. 5 Algal cell identification: objects recognized as microalgae are identified and yellow framed; objects recognized as out-of-focus cells, empty dead cells, overlapping cells are orange framed; objects recognized as cell debris, detritus and bacteria are red framed. Redrawn from ref. 36 and 37. |
For clustering the microalgae in taxonomic groups, these calculated vectors are used as input for Artificial Neural networks (ANNs). ANN models are used because they can solve problems of classification of raw data with remarkable success.82 ANNs are non-linear statistical data models that consist of artificial neurons, i.e. equations that simulate the functioning of biological neurons, with forward and backward connections in a hierarchy of layers. The mathematical theory of ANN is very complicated and outside the scope of this review; therefore, for more detailed explanations refer to the work by Abiodun et al.82
ANNs use two steps: the training step that groups the microalgae according to the feature vectors, and the testing step (or validation step) that assigns the segmented microalgae images to the corresponding taxonomic group. Therefore, the segmented microalgae vector dataset is then divided into two subsets: vectors used to train the clustering algorithm; vectors set aside for validation and classification. During the training phase, care must be taken in class selection, since this operation can be highly influenced by majority classes, which are observed more frequently, compared to minority classes, which are many and less frequently observed. These “class imbalances” can produce poor results.83 A possible solution can be a dual training phase: in the first phase vectors from a balanced number of images in majority classes and the minority classes are used, while in the second phase the entire vector dataset is used.84,85
In the following section, we describe how an example of neural network, i.e. the Self Organizing Map (SOM) works. A SOM consists of a two-dimensional layer of connected neurons.86,87
Each neuron corresponds to a taxonomic group, and the distance between neurons indicates how close the relation between them is. At the beginning of the training, the feature vector of each neuron (i.e. each taxonomic group) is randomly initialized, and the neurons are equally spaced (i.e. all the taxonomic groups are closely related). A first microalgae feature vector is fed to all the neurons in the map. The euclidean distances between the sample vector and all the neurons in the layer are calculated. The neuron with the minimum euclidean distance from the input vector is the winning neuron, which will be updated to be a little closer to the input vector; in the same way the distances between the winning neuron and its neighbors are also updated. The procedure ends when the map is no more modified by the input data. The result of the SOM is a partitioned map whose neurons represent the real taxonomic groups of algae, with the corrected feature vector and the appropriate distance with the other neurons, i.e. the appropriate taxonomic distance between groups (Fig. 6). At this step the number of cells belonging to each group is known, and therefore it is possible to also calculate the concentration of the different algae in the sample.37
Fig. 6 Result of the algal taxonomic grouping by SOM using the characteristic color and the dissimilarity measure as features. Redrawn from ref. 36 and 37. |
Microalgae classification systems are commonly based upon traditional computer vision techniques, i.e. extraction and calculation of morphological and spectroscopic features from algae images, followed by some form of image processing to train the system to map a set of input features into a taxonomic group. More recent automatic microalgae classification systems use Convolutional Neural Networks (ConvNets),88 an extension of a basic neural network, referred to as multi-layer perceptron.89 ConvNets combine feature extraction and pattern recognition algorithms into a single model, which at the same time performs feature extraction and classification. According to Kerr et al.54 ConvNets can be considered the addition of a visual cortex of neurons organized in a hierarchy of layers to the traditional ANN. In each layer of the ConvNet, numerous convolutional digital filters (a 3 × 3 prefixed values pixels windows) slide over the input image producing new images in a new feature space. The final goal of these “visual cortex” networks is to learn the values of each convolution filter, and extract essential features to correctly predict a classification.
The groups involved in developing ConvNets are going to investigate the possibility of developing a classification system that allows multiple unique learning models to collaborate when classifying the microalgae database instead of creating multiple distinct ConvNet architectures each harbouring unique innovations and properties. ConvNets are computationally very expensive; so far the shortest period of time in trials is about one day.54 When computers as powerful as “HAL 9000” will be available, results will be obtained in more reasonable time.
Though a comparison of the different automatic classification systems is difficult mainly because they have been trained, tested and validated on different taxonomic groups, it is still possible to describe their similarities and differences on the basis of their operating characteristics and resulting performance and robustness.
Table 1 shows a non-comprehensive list of some of the automatic classification systems used around the world, which highlights the key feature categories used to discriminate and recognize the algae, and affiliate them to the appropriate taxonomic grouping, the number of taxonomic groupings, and the achieved average accuracy. All these systems rely on artificial neural network classification models due to their ability to extract and represent high-level abstractions in data sets.
Name and/or reference | Key feature categories | Taxonomic groups | Average accuracy (%) |
---|---|---|---|
Zooscan/Grosjean et al.31 | Morphological | 29 | 75–85 |
ADIAC/Du buf & Bayer30 | Morphological | 37 | 75–90 |
Sipper/Remsen et al.32 | Morphological | 5 | 75–90 |
Sbrana et al.64 | Morphological and phase | 1 | 90.0 |
Simonyan & Zisserman90 | Morphological and ConvNets | 27 | 92.3 |
Dai et al.92 | Morphological and ConvNets | 13 | 93.7 |
Park et al.91 | Morphological and ConvNets | 8 | 95 |
Kerr et al.54 | Morphological and ConvNets | 104 | 96.2 |
Guo et al.65 | In situ digital inline holography | 10 | 93.8 |
Schulze et al.39 | Morphological and fluorescence | 10 | 94.7 |
Deglint et al.40 | Morphological and fluorescence | 6 | 96.1 |
Xu et al.36 | Morphological and absorption | 3 | 98.1 |
Coltelli et al.36,37 | Morphological and absorption | 24 | 98.6 |
To give an idea of the time necessary for a complete analysis of an environmental sample (from image acquisition to result validation), the digital microscope system developed by Coltelli et al. can be used as an example.37 The hardware set-up is based on a high quality transmission microscope equipped with a CCD color camera and a polychromator-based spectrophotometer. The cell features used are morphological and spectroscopic features, with major weights for the dissimilarity measurement of the cell contours and the characteristic colors of the segmented microalgae. As previously described the ANN used by the system is a SOM. The database contains 53869 algal images divided in 24 taxonomic groups; the time necessary for scanning a slide (1000 microscope fields) and building the input dataset is about 4.5 minutes. Most of this time is spent in removing the out-of-focus-cells. The SOM training process takes about 3.5 minutes. The resulting average accuracy is 98.6% (the result of the operation is verified by a phycology expert).
From the cited literature so far, digital microscopy achieved satisfactory results for a limited number of species. Even if hundreds of morphological features can be calculated from each microalga, they do not always allow reliable affiliation to a systematic group. The absorption spectrum of the pigments present inside the photosynthetic compartment of each alga, or its shortcut, i.e. the characteristic color that can be obtained with a simple color camera, should be considered an essential feature for algal recognition. Together with morphological features such as the contour, shape similarity and texture patterns, the color signature will allow accuracy higher than that of an expert taxonomist, provided that both sampling and acquisition (steps) are performed to perfection.
This journal is © The Royal Society of Chemistry 2021 |