A Fast and Scalable Computational Topology Framework for the Euler Characteristic

The Euler characteristic (EC) is a powerful topological descriptor that can be used to quantify the shape of data objects that are represented as fields/manifolds. Fast methods for computing the EC are required to enable processing of high-throughput data and real-time implementations. This represents a challenge when processing high-resolution 2D field data (e.g., images) and 3D field data (e.g., video, hyperspectral images, and space-time data obtained from fluid dynamics and molecular simulations). In this work, we present parallel algorithms (and software implementations) to enable fast computations of the EC for 2D and 3D fields using vertex contributions. We test the proposed algorithms using synthetic data objects and data objects arising in real applications such as microscopy, 3D molecular dynamics simulations, and hyperspectral images. Results show that the proposed implementation can compute the EC a couple of orders of magnitude faster than ${\tt GUDHI}$ (an off-the-shelf and state-of-the art tool) and at speeds comparable to ${\tt CHUNKYEuler}$ (a tool tailored to scalable computation of the EC). The vertex contributions approach is flexible in that it compute the EC as well as other topological descriptors such as perimeter, area, and volume (${\tt CHUNKYEuler}$ can only compute the EC). Scalability with respect to memory use is also addressed by providing low-memory versions of the algorithms; this enables processing of data objects beyond the size of dynamic memory. All data and software needed for reproducing the results are shared as open-source code.


Introduction
Data objects that appear in the form of fields (e.g., images, video, space-time data) are common in science and engineering.These data objects can be processed using powerful algorithms such as convolution operations, Fourier transforms, and singular value decomposition to extract information and to enable reduction and visualization.Convolutional neural networks (CNNs), in particular, are a highly flexible tool that can be used to extract diverse types of feature information from field data.However, CNNs have significant scalability limitations (e.g., require repetitive convolutions to learn adequate operators) and might require large amounts of data to be trained.Moreover, attributing meaning to features extracted from CNNs is not straight-forward [9].Topological data analysis (TDA) has recently emerged as a powerful framework to quantify the shape of field data [20,21,9].A simple topological descriptor known as the Euler Characteristic (EC), in particular, has gained significant attention in diverse applications [11].The EC is a descriptor that captures basic topological features of binary fields (e.g., connected components, voids, holes) and can be extended to continuous fields by using filtration/percolation procedures (i.e., the EC is computed at different filtration values).The EC has also been used for analyzing data in neuroscience [10,5], medical imaging [12,7], cosmology [17,13], and plant biology [1].The EC has also been recently used as a descriptor/feature to train simple machine learning models (e.g., linear regression) that have comparable prediction accuracy to those of CNNs but that are significantly less computationally expensive to build [21,9,19].
Enabling fast computations of topological descriptors is necessary to handle field data at high resolutions, high-throughput data, and to enable real-time applications (e.g., control).Methods for fast processing of small-scale images was proposed by Snidaro and Foresti [22]; specifically, they proposed to compute only the change in the EC over the filtration values.This idea was further explored in [8,27,15,16].Heiss and Wagner presented an algorithm that computes the EC for 3D fields that are too large to fit into memory and provide a software implementation called CHUNKYEuler [8].A parallel implementation of this method using GPUs has also been recently developed [27].
In this work, we provide parallel implementations of the vertex contributions methods of Snidaro and Foresti [22].We highlight that a major contribution of this work is the generalization of of this method to 3D fields (including handling of non-binary fields and parallel implementation); these capabilities allow us to process a broad range of data sets arising in applications.The vertex contributions method is scalable and flexible, in that contributions can be used to compute the EC and other relevant topological descriptors such as perimeter, area, volume (the approach implemented in CHUNKYEuler can only compute the EC).We provide background information on the computation of the EC with an in-depth look at 2D/3D field data and level set filtration and outline the vertex contribution method.We highlight aspects that make the proposed method highly parallelizable and describe our software implementation.In addition, we benchmark our implementations against state-of-the-art computational topology tools using synthetic data sets and data sets arising in real applications.Specifically, we analyze synthetic random fields that are systematically generated to obtain field data at different resolutions and we study data sets arising in real applications such as microscopy, molecular simulations, and hyperspectral images.Our results demonstrate that our implementation can compute the EC a couple of orders of magnitude faster than the off-the-shelf computational topology tool GUDHI [14].We also compare times of the proposed methods to those implemented in the CHUNKYEuler software [8] and their GPU implementation [27].

Methodology
The EC of a data object is computed by counting the contribution of each fundamental component to the overall topology of the object.These fundamental components, which we will refer to as simplexes, are the building blocks that contribute to the topology of the object.The simplexes that are important to this work, and more broadly 2D/3D field analysis, include 0-dimensional through 3-dimensional cubical simplexes (Figure 1).In this work we will refer to a 0-dimensional cubical simplex as a vertex, 1-dimensional as an edge, 2-dimensional as a face or pixel, and 3-dimensional as a cell or voxel.A 2D field data object is a collection of face/pixel data (e.g., images) that contains intensity information to describe what color is displayed at a particular location.By representing each pixel as a face, an image object may be represented as a collection of cubical simplexes (Figure 2).The EC can be defined as the alternating sum of cubical simplexes; for fields, this is: Here, V is the number of vertices, E is the number of edges, F is the number of faces/pixels, C is the number of cells or voxels, and χ represents the EC value.
To generate the EC for a field (a continuous object), one must first transform the original field into a binary field by applying a filtration at a desired face/pixel intensity level.A filtration/percolation is a function that is used to define which components (e.g., pixels in the case of a 2D image) should be included and which should not.The filtration function used throughout this work is a sublevel filtration: Here, c is the filtration level, faces are defined at position (x w , x h ) with intensity f , and set g - c contains faces that are less than or equal to c only.Importantly, when a face is included, all edges and vertices relevant to that face are also included in the set.An example of filtration for multiple values of c over a small image is shown in Figure 3.The collection of EC values obtained at different levels c is known as the EC curve.The filtrations in Figure 3 can be verified by counting the number of components included and utilizing (2.1).A generalization of the EC, called the EC curve, encodes information for the entire field by applying g - c for values of c that cover the entire range of face intensities for a given field.A small example of an EC curve is shown in Figure 4 for a 2D field.
A fundamental aspect to consider in EC computations is connectivity.There are a couple of types of adjacency in a 2D field: (i) vertex adjacency and (ii) edge adjacency.Vertex adjacency defines that a face that shares a vertex with another face is adjacent.This is often referred to as 8-connectedness (or 8-C) as all 8 of the faces that surround an arbitrary central face are considered connected to the central face, as seen in Figure 5. Edge adjacency is more strict, as a face is adjacent to another face only if an edge is shared.This is often referred to as 4-connectedness (or 4-C) as there are only 4 edge-connected faces from an arbitrary central face, as seen in Figure 5.In the above example, where the EC was computed for Figure 3, 8-C was assumed.Throughout this work, we will be using vertex adjacency for all dimensions: 8-C for 2D field analysis and 26-C for 3D field analysis.
A naïve approach for computing the EC curve would be to count the vertices, edges, and faces at each filtration level.This would require iterations over every vertex, edge, and face at each level, resulting in poor computational scalability: O (whn c ) where w and h are the width and height of the field, respectively, and n c is the number of filtration levels.Another method would be to consider the vertices, edges, and faces from the previous filtration level and only add the new ones to the current filtration; unfortunately, adding a new face does not necessarily add all vertices and edges Figure 4: EC curve for a 2D field.Note that, in this example, we see the emergence of a hole in the simplicial complex at a filtration level of c = 6.associated with the face as new components to the level set.This is because the EC follows the inclusion-exclusion principle: (2.3) As such, the unique components added to set A by adding set B, or a new face, would be those in B minus the intersection: A ∩ B. An illustration of this concept is included in Figure 6 where edges and vertices will be double-counted if we do not subtract the intersection of the new face and the existing set.
Figure 6: Adding a single face to the bottom right of a 3x3 field complex requires the subtraction of two edges and 3 vertices which are double counted.
One can avoid the aforementioned problem by tracking vertices and edges which are included in the set.This provide a means to check new vertices and edges against the existing set and only add novel components.However, this comes at large memory cost, as these additional edge and vertex arrays would require storage of a binary or integer value for each entry, exceeding the element size of the original face data array from the field data itself.A solution to both of these problems is to use bit neighborhoods to determine impact on EC [6].Vertices can be used to compute EC contribution by considering a face-sized neighborhood centered on a vertex.A quarter of each face that share the vertex contribute to make up this neighborhood.Subsequently, one-half of each edge that shares the vertex are also included in this vertex-centered neighborhood.An illustration of this concept is shown in Figure 7.There are 16 positionally-different types of vertex contributions of which 6 are unique subject to symmetry, as shown in Table 1.From these representations we can compute the EC contribution and we can compute the contribution to the perimeter and area of the filtered field.One advantage to this method over strictly keeping track of the EC contribution is that the perimeter cannot be computed from the edges in the set, it can only be computed from exterior edges, which would require the tracking of additional information.The contribution of these 6 types of vertex contributions is given for EC (both 4-C and 8-C), area, and perimeter in Table 1.Because there is no intersection between the contributions of each vertex, the EC can be computed by the strict addition of contributions from each vertex.In other words, (2.3) loses the χ (A ∩ B) term because it is zero.By providing a collar of values about the edge of the field, or by determining if a given vertex is a corner, edge, or central vertex, the EC of that field may be computed by iterating over all vertices for each value of the level set.Although this is clearly an improvement over checking each vertex, edge, and face, this method still is computationally expensive in large-scale fields as the order of computation remains at O (whn c ).However, by tracking only the change in contribution types (i.e., the 6 unique vertex contribution types) for a given level set, time complexity can be reduced from O (whn c ) to O (wh + n c ), as shown in Snidaro and Foresti [22].
The method implemented in CHUNKYEuler yields the same reduction in time complexity by analyzing change in EC through changes in which top-dimensional cells are active, however, only the changes in EC are tracked.Modifications would need to be made to CHUNKYEuler to consider other topological descriptors proposing new challenges when considering indirect metrics such as perimeter.GUDHI computes the so-called persistence homology of a cubical complex over filtration values using the compressed annotation matrix method [3] and currently cannot benefit from the reduction in time complexity by looking at changes instead of evaluating each filtration level.The limitations of GUDHI (in terms of computational speed) and CHUNKYEuler (in terms of ability to compute diverse descriptors) are overcome by vertex contribution algorithms.
With our implementation of the vertex contribution method, an array that includes two entries for each contribution type at each level set value is initialized at zero.Then, for each vertex neighborhood, the integer values of the pixels are used as the index to be incremented.As an example, in Figure 8 we show the central face neighborhood of Figure 3; the first face becomes active at a value of 0. Subsequently, the vertex contribution with one active face is incremented as born at index 0. Since the second face becomes active at a value of 1, the one-face vertex contribution is incremented as dying at index 1, and the two-face diagonal vertex contribution is incremented as being born.This process continues for each face value until the maximum value is reached.At this point, the neighbor-hood will be filled under the assumption that the maximum value for filtration exceeds the maximum intensity of faces in the field.The only additional step is determining if the first two face values are in a diagonal position or not as the two types of 2-face neighborhoods have different contributions to the topological descriptors.In the example in Figure 8, it can be seen that there will be a diagonal vertex neighborhood as the first two active faces are diagonally adjacent.Importantly, the output of this method is not just the EC curve, but rather an array of the quantity of vertex contributions of each type that are present at each level set.This means that one can compute EC, perimeter, area, and any other descriptors/metrics that can utilize these vertex contributions as input for each filtration value.The process of gathering vertex contributions of a 2D field considering filtration values using this method is summarized in Algorithm 1.In the following sections, we propose algorithms for computing vertex contributions and extend the method to 3D fields.

Parallel Implementation
Parallel implementation follows the same pattern as the serial implementation but requires: either (i) a vertex contribution array for each thread, or (ii) a lock-based structure for a global vertex contribution array.The length of the 2D vertex contribution array would be one more than the number of levels, and the width would be 10.Although there are 6 vertex contribution types which would result in an array width of 12, the empty type contributes nothing to perimeter, area, and EC (see Table 1); in addition, the number of empty contributions at a given level can be back-calculated using the total number of vertices.In addition, when considering lock-based methods, at the entry value, every vertex will increment the empty contribution at the initial filtration level, causing a pile-up that could exceed serial computation time.For instance, a standard 8-bit image with pixel values ranging from 0-255 would require 2570 entries in a vertex contribution matrix, requiring 10280 bytes of storage per array instance using 32-bit integers.In most CPU systems, holding these arrays in memory is not expensive and can be done easily for most modern systems, even with one storage array per core for highly distributed systems.The size of the image in memory will greatly exceed that of vertex contributions in most cases.For perspective, if we consider a standard high-definition (HD) image size of 1280x720, the storage for only one channel using arrays of unsigned 8-bit integers would require 921600 bytes of memory, which exceeds the size of approximately 90 vertex contribution arrays.If a vertex contribution array can be afforded for each thread, Algorithm 2 can and should Algorithm 1 Bitmap contribution (2D, serial) 1: Field ← read(raw data file) 2: w ← field width 3: h ← field height 4: M ← largest filtration value 5: q k,in , q k,out ← Zeros of length M+2 ∀k ∈ {0, 1, ..., 9} 6: for all i < h + 1 do 7: for all j < w + 1 do data [3] ← Field(i, j) adjacency ← distance(pos[0], pos [1]) q 2,in [data [1]]++, q 2,out [data [2]]++ q d,in [data [1]]++, q d,out [data [2]]++ q 3,in [data [2]]++, q 3,out [data [3]]++ 28: q 4,in [data [3]]++ 29: end for 30: end for be used.The lock-based method for this system will not be described in detail in this work, however in highly distributed systems, such as GPUs, a global contribution array with locks should be used where local or register memory does not exceed the size of a thread vertex contribution array plus dynamic memory required to compute vertex contributions.Instead of increasing the global array values directly, the algorithm would use a locking system for each increment operation to prevent a data race.A light-weight locking mechanism would be best for these purposes, such as a spin-lock, which would reduce overhead of waking up a thread as in a mutex-based locking system.

3D Field Processing
For processing 3D fields we need to define new types of contributions.To the best of our knowledge, there are no extensions of the contribution methods of Snidaro and Foresti to 3D fields reported in the literature.Following the definition [25], there are 22 unique types of vertex contribution subject to symmetry.All 22 unique contributors are described in Table 2. Similarly, the empty type contributes nothing to EC, perimeter, area, or volume, so is only back-calculated in post if desired.This means that with 21 unique contribution types without the empty type, there are a total of 42 array entries per level.Therefore with an 8-bit 3D data file, a storage size of 43176 bytes per vertex contribution array would be required.Again, the size is not overwhelmingly large and therefore a 3D extension of Algorithm 2 is sufficient.It should be noted that many 3D fields, especially those from computer tomography (CT) scans, can use much higher resolution (e.g., 14-bit or 16-bit integers).The size of a 512x512x512 scan with 16-bit data requires almost 24 times the memory of the vertex contribution array.
A major difference between the 2D and 3D case is that diagonal adjacency of cells in the vertex neighborhood must be determined to differentiate between contributor types in the 3D case whereas only one adjacency needs to be calculated for the 2D case.Using the sum of Euclidean distance between all pairs of points, one can completely differentiate contribution types and use a running sum to avoid computing the same number twice.We call this function f adj and this defined as: (2.4) Here, V represents the set of vertex coordinates V i .The 2-norm is taken to get the Euclidean distance between all coordinate pairs of the vertices, allowing for distinction between which type of contribution is present.For instance, with 2 active cells, they may share only one face (f adj = 1), share only one edge (f adj = √ 2), or share only one vertex (f adj = √ 3).Adjacency for all configurations of vertex neighborhoods in 3D are included in Table 2. Also, for the case of 5, 6, and 7 cells active, the position of the inactive, or empty, cells is used to reduce the number of adjacencies computed.For the sake of brevity, the serial version for computing 3D vertex contributions is shown in the appendix in Algorithm 3. When considering memory requirements, the implementation of a low-memory algorithm for 3D field processing is all the more important.For this case, the 3D algorithm can be extended exactly the same way as the 2D case, by replacing the calls to the 8-cell values in the field file in memory with calls to read the 8 cell-specific indices from the field file.

Low-Memory Processing
Reading a data file for processing often requires that the entire field is held in memory during execution.To mitigate this, the same method that is utilized for the serial and parallel cases can be implemented by forgoing reading the whole field into memory, and instead reading only the values needed from the file at a given vertex of the field.To compute a vertex contribution in two dimensions, only four values are required to be read into memory at a given time.Exploiting the file structure in the BMP file format (for bitmap images), we utilize a low memory version of Algorithm 1 (lines 9 through 18) and Algorithm 2 (lines 9 through 18) by replacing calls to the Field data (i.e., data[0] ← Field(i − 1, j − 1)) with calls to the raw data file (i.e., data[0] ← read(Field face(i − 1, j − 1))).

Computational Results
The proposed algorithms were implemented in C++ and invoked from Python.For the low-memory cases, C++ was also used to read in the partial field data.A comparison between the GUDHI software package [14] called from Python and the algorithms for 2D and 3D field analysis shown above was performed and reported in the following sections.The EC was first be computed on synthetic random fields of standard sizes for 2D and 3D fields.We then analyze data arising in real applications: microscopy (2D), molecular dynamics simulations (3D), and hyperspectral images (3D).For all case studies, data was translated from raw format (i.e., float or integer) to integer format using values from 0 to 255.For each case, benefit of parallelization was measured by computing the EC with up to 24 CPU cores.All timing results were obtained on a 24-core (Intel Xeon E5-2697-2.7 GHz) computing server with 256 GB of RAM running Red Hat Enterprise release 6.10.All data and software needed for reproducing the results are shared as open-source code and are available at https://github.com/zavalab/ML/tree/master/FastTopology.

Analysis of Synthetic Random Fields
We generated synthetic 2D random fields in order to test the scalability of the proposed algorithms in a systematic manner.Standard 2D image sizes of 1280x720 and 1920x1080 were tested along with a couple of large square field sizes of dimension 2048x2048 and 4096x4096.Each field size was generated using 500 random samples of both uniform and random noise to show run-to-run timing variation.A summary of the results is provided in Table 3.We can see that the processing time for computing the EC using the parallel Algorithm 2 is nearly 3 orders of magnitude faster than GUDHI, with serial operation already being 50 times faster.The speedup obtained using more cores can be seen in Figure 9 for a field of size 2048x2048.On the left we show that the total Megapixels (MP) processed per second exceeds 100 when using 24 cores and the parallel speedup efficiency is 62%.Processing random fields of similar size using 8 cores with CHUNKYEuler results in a processing speed of about 46.5 MP/s as reported in [27], compared to 43.3 -45.2 MP/s using Algorithm 2 with 8 cores.As such, our implementation is as scalable as that of CHUNKYEuler.The 3D algorithm was also tested on both uniform and normal noise for fields of dimensions 128x128x128 and 256x256x256.Given the larger size of the data and processing time required, only    100 samples were run for each case with both uniform and random noise.The processing results are summarized in Table 4; a similar trend is observed, as GUDHI takes nearly 3 orders of magnitude longer than Algorithm 2 using 24 cores.With fields of size 256x256x256, the parallel implementation with 24 cores speeds up execution by 19 times, leading to an efficiency of 75-80% (Figure 10).For reference, CHUNKYEuler processes voxels/cells at a speed of 26.6 MV/s on 3D random fields with similar size [27], compared to 10.4 MV/s with Algorithm 2 for 3D, both using 8 CPU cores.The reduction in speed by our implementation is likely due to the higher number of comparisons and computation required to determine adjacency for vertex neighborhood classification.However, this reduction in speed comes with the benefit of more flexibility of enabling computation of alternative topological descriptors.

Memory analysis
The low-memory version of Algorithms 1 and 2 were also tested on random 2D fields.First, random matrices of uniform and normal distribution were generated for fields of size 1920x1080.Then, the fields were saved as bitmap image files (an easily readable, raw format).Finally, the C++ algorithm directly read and computed the EC for each field.500 fields were generated with statistics on the processing speed reported in Table 5.There is a drastic processing speed loss from reading individual pixel values to populate only the 4 values from each vertex neighborhood during computation of the EC curve.However, even the serial methods are still faster than GUDHI (see Table 3) and now require the minimal memory to compute the EC curve.Also, data beyond the size of working memory can be analyzed given that this is a streaming approach, similar to that used in CHUNKYEuler [8].In addition, only 2D fields were explored here and memory management issues are even more important in the 3D case.Similar slowdown should be expected when, if not slightly more given the increase in values analyzed from 4 faces/pixels to 8 cells/voxels.

Microscopy Case Study
Liquid crystal sensors elicit optical responses in the presence of target compounds or contaminants.The concentration and environment in which the target compound binds to the liquid has significant impact on the topological state of the system.In this case study, data presented in Jiang et al. [9] was used to benchmark real data and explore the potential of real-time analysis.The system we are showcasing here is analysis on a gas-based liquid crystal detection system with varying sulfur dioxide (SO 2 ) concentrations, from 0.5 ppm to 5 ppm, in an environment with 40% relative humidity.An example of how filtration values impact the topology of the system is shown in Figure 11.
As shown in Figure 12, the optical responses at differing SO 2 concentrations have distinct EC curves.On a liquid crystal micrograph plate, anywhere from 1 to 36 fully readable grid squares may be used to predict the state of the system.Therefore, the speed in which the images are read dictate how close to real time the sensor can be monitored.
On a single liquid crystal grid, anywhere from 1 to 36 readable grid-squares may be used to predict the state of the system.Therefore, the speed in which the grid-squares from an image or video frame of the sensor are processed dictate how close to real time the sensor can be monitored.In Table 6, with image sizes that are so small (approximately 134x134 for each grid-square), parallelization does not receive the same benefit as the larger field data as explored above.There is even a performance decrease by using more than 12 cores for parallel computation; however, the data is processed about 20-30 times faster than GUDHI.

Molecular Dynamics Data
Simulations that result in 3D fields are also relevant to the Algorithms provided in this work.One such is example is molecular dynamics simulations, where chemical systems can be simulated to understand molecular-scale interactions between participating species.Here, we analyze a system shown by Chew and co-workers [4], and later with the EC by Smith and co-workers [19].The simulations capture molecular interactions of biomass reactants in cosolvent/water mixtures.As an example, Figure 13 shows the density of water molecules in a 20x20x20 grid for fructose in three different cosolvent/water systems with 10 weight % water.The topology of the solvent environment has been shown to correlate strongly with reactivity.The process of filtering one of these fields is shown in Figure 14.The small size of the data set shows much less scaling advantage for the parallel case, as shown in Table 7.Although the timing advantage is less, the proposed algorithm is still one order of magnitude faster than GUDHI.To process the entire data set, GUDHI took 56.6 seconds, while the serial algorithm took 5.7 seconds, and 12-core parallel implementation took 2.9 seconds.The EC curve of each solvent system is shown in Figure 15.

Hyperspectral Image Data
Another example relevant to the chemical sensing community is hyperspectral image analysis.In this research domain, each pixel of a 2D image has a spectral dimension, meaning each pixel of the image is associated with a unique spectra.This requires each image to be analyzed in 3D to take full advantage of the information contained in the spectral dimension.One can view each spectral wavelength as a 2D image, as shown in Figure 16.
A sample data set on the ripeness of fruits [26] is used to benchmark the performance.Hyperspectral images were taken of both the front and back of the fruit each day.Fruits were removed from the set when considered overripe.A total of 360 hyperspectral images make up the kiwi fruit data set.Some sample images translated from visible spectra to RGB are shown in Figure 17, where it is clear that using only RGB, it is difficult to tell which kiwi is ripe, overripe, or under ripe.
Statistics on the processing speed are shown in Table 8.The processing time for these images was removed as a column of the table due to the varying size of the images and instead strictly analyzed processing speed in MV/s, similar to the liquid crystal case study.For comparison, the EC curve   from filtering the hyperspectral images of ripe, overripe, and under ripe kiwis are shown in Figure 18.
Figure 18: EC Curves for each cosolvent system.

Discussion
Using Algorithms 1 and 2 in both 2D and 3D field data we find that there is a clear scaling advantage in both time and memory usage while computing EC when compared with off-the-shelf computational tools such as GUDHI.Also, when compared to state-of-the-art parallel implementations, such as CHUNKYEuler, there is comparable speed.The time complexity of CHUNKYEuler and the algorithms presented in this work are nearly identical, as both methods stem from the same principle that using change in features is more computationally scalable than counting features at each filtration value.
For the microscopy case study for liquid crystals, the average size of images in that data set was 134x134, or 17956 pixels, meaning that it would require a processing speed of 19 MP/s to fully process a liquid crystal sensor with 36 readable grid-squares at a video frame rate of 30 frames per second (FPS).Considering control applications, the serial version of the tool is capable of processing 11 grid-squares in real-time considering a 30 FPS video.However, some control applications will not require the reading of all 36 grid-squares, or reading data as quick as 30 measurements per second.Also, when implementing a complete control scheme, the time required to read and crop the raw video data to these processable grid-squares must be considered, introducing a time delay between measurement and data reconciliation.Also, the potential limitations of parallelization are noticed when considering the microscopy case.For larger random fields for benchmarking, more cores yielded significantly faster performance, whereas in the liquid crystal case study, more cores did not yield favorable speedup.This is seen with a peak efficiency at about 12 cores before dropping as more cores were used (Figure 19).This analysis does not consider simply using Algorithm 1 for each individual image, but running the dispatch of the image analysis in parallel.This highlights the importance to design a computational framework that accurately addresses data analysis and data structure from case to case.This is especially relevant in the case of designing lightweight, standalone sensors with limited computational capabilities.The molecular dynamics study shows similar trends as the microscopy study, indicating the data was small enough that parallelization did not realize the same benefit as random 3D fields of larger size.However, this framework is capable a handling molecular dynamics simulations which may include thousands or millions of molecules.The granularity of the grid (20x20x20) could be expanded to have more fine-grained analysis on larger systems and see significant speed-up similar to that found in the random 3D fields analyzed earlier.Also, any 3D field data that can be translated to a cubical lattice may be analyzed in the same way.For instance, computational fluid dynamics data with millions of finite elements could be analyzed for trends in EC which could transform large-scale, fine-grained data into a feasible size for input into a prediction algorithm without losing important topological information.For reference, a couple of computational fluid dynamics simulations were analyzed to demonstrate scalability: first, a 302x302x302 element simulation of heptane gas undergoing combustion (The University of Utah Center for the Simulation of Accidental Fires and Explosions), took 405 seconds for GUDHI, and 3.5 seconds for Algorithm 3 in parallel with 4 cores; and second, simulation of duct flow [2] of size 193x194x1000 took GUDHI 673 seconds, whereas Algorithm 3 in parallel with 4 cores took 5.5 seconds.Clearly, scaling advantages exist in these data-rich applications.
For the hyperspectral image study, the size of the images still indicated that parallelizing the identification of vertex contributions was economical.The average size of images in this data set was 6.88 MV, meaning it takes about 100 seconds to process a single hyperspectral image with GUDHI, 3 seconds in serial with Algorithm 1 in 3D mode, and only about 0.2 seconds when using 2 in 3D mode with 24 cores.Applications where sensors are developed with a hyperspectral camera may require processing speeds the allow data analysis at speeds faster than those presented in this work.However, in certain applications, such as monitoring pharmaceutical powder composition, standard sampling times for single-sample analysis (only spectral dimension; no 2D image component) near infrared sensors can be on the order of 5 -15 seconds [24].On the other hand, commercially available hyperspectral cameras can achieve frame rates that could facilitate real-time data [23].Once again, understanding the needs of the system and the dynamic response to control actions should be considered case-to-case when implementing topology-informed control schemes to hyperspectral data.
An important distinction with respect to hyperspectral imaging that we used the raw data for the case study.Performing dimensionality reduction in hyperspectral image is not uncommon, for instance, using principal component analysis to reduce the number of wavelengths used in the spectral dimension [18].Reducing the number of wavelengths through dimensionality reduction or selecting regions that are known to contain the most important topological information could reduce computation time enough to process these 3D hyperspectral images in real time as well.
Also, the output of this tool is not just the EC curve, it is the vertex contribution map over the filtration values.Thus, by simply changing the weights each vertex contributes to an overall topological descriptor, the same vertex contribution map can be used to compute any number of topological descriptors as a function of vertex contributions.Also, connectivity need not be assumed in the beginning, as the connectivity only impacts the weights, not the vertex types.This allows for analyses to identify that alternate definitions of connectivity may be more suitable for a given physical system.
With this being said, sensitivity of the system to connectivity can be explored alongside sensitivity of topological descriptors to image resolution, number of filtration values, or image preprocessing techniques.It is commonplace to blur images or treat images with another convolutional operator while utilizing neural networks in machine learning.However, in this work, we use the raw data for both applications.Since the tool is scalable and allows for large-scale analysis, performing studies to understand which combination of resolution, number of filtrations, preprocessing technique, and connectivity type may lead to more topologically-based physical intuition of chemical systems discerned through field/image analysis.

Conclusions
In this work, we presented algorithms addressing the scalable computation of vertex contributions in field data.Ultimately, the tool was tested using EC, but the analysis that can be done with vertex contribution maps is not limited just to EC.The speedup was significant, 2 to 3 orders of magnitude, when compared with a common package used for topological persistence in Python (GUDHI).Also, recent advancement of the CHUNKYEuler software show similar timing capabilities, indicating that methods that address changes in topological contributions are much more scalable than counterparts that compute topological characteristics at each level of a filtration set.
Future research directions could address the usage of vertex contribution maps for more complex topological descriptors, such as the fractal dimension.Also, generalizing these methods to larger dimensions (4D data, such as hyperspectral imaging with a temporal dimension) requires either algorithms to generate or research into vertex contribution maps in higher dimensions.In terms of low memory methods, CHUNKYEuler uses chunks of each image, whereas the low memory version of algorithms 1 and 1 reads only the minimum data required to compute the EC curve.Exploration into the optimal decomposition scheme for a cubical field/image could be an interesting direction for research, as having minimum data representations is good for memory scaling, but results in a slow down of just over an order of magnitude.Also, generalizing vertex contributions to regular, non-cubical lattices, or even more generally to Voronoi cells, with sets of vertex contributions corresponding to the degree of that vertex could be impactful for data that is not the classical cubical layout that physical fields/images possess.Ultimately, if an efficient method exists to compute vertex contribution maps of these more abstract structures, one could calculate a plethora of topological and general system descriptors to control or characterize complex and more abstract physical systems in real time.

Figure 2 :
Figure 2: Liquid crystal micrograph (image) represented as a cubical simplicial complex; this is done by assigning pixel values to faces which are connected by lower dimensional cubical simplexes, edges, and vertices.

Figure 3 :
Figure 3: Example 2D field undergoing the process of filtration over the full range of pixel intensity values.

Figure 5 :
Figure 5: Types of connectivity; on the left is vertex-adjacency (or 8-C) and, on the right, is edgeadjacency or 4-C.

Figure 7 :
Figure 7: Examining the a face-sized neighborhood about each vertex in a 2x2 binary image.

Figure 8 :
Figure 8: Visualization of the contribution of a single vertex over its face values.

Figure 9 :
Figure 9: Timing results for fields of size 2048x2048.On the left, the processing speed of faces for Algorithm 2 with 2-24 cores.On the right, the speedup due to parallelization of Algorithm 2 with 2-24 cores compared to a perfect parallel efficiency line (dashed).

Figure 10 :
Figure10: Timing results for fields of size 256x256x256.On the left, the processing speed of voxels for Algorithm 2 using the 3D modifications with 2-24 cores.On the right, the speedup due to parallelization of Algorithm 2 with 3D modifications with 2-24 cores compared to a perfect parallel efficiency line (dashed).

Figure 11 :
Figure 11: Filtration process for various liquid crystal micrographs.For some micrographs, no activity occurs until over a value of 100, but others activity begins at low filtration values.

Figure 12 :
Figure 12: Varying concentration of SO 2 present in the system changes both the optical response and the topological description of the image through the EC.The optical response and subsequent EC curve are shown for 0.5 ppm, 1 ppm, 2 ppm, and 5 ppm SO 2 systems in 40% relative humidity.

Figure 13 :
Figure 13: Water densities from simulations of fructose with various cosolvent species listed.

Figure 14 :
Figure 14: EC curve evolution with 3D binary fields at various filtration values.

Figure 16 :
Figure 16: Visualization of hyperspectral image data.Here, a near infrared image with 6 selected wavelengths (λ) is illustrated.In totality, the hyperspectral image has 252 unique wavelengths.

Figure 17 :
Figure 17: RGB equivalent images of kiwis presenting with varying ripeness levels.RGB alone is difficult to discern whether a kiwi is ripe or not without feeling the firmness of the kiwi.

Figure 19 :
Figure 19: Speedup curve for parallelization for processing liquid crystal images.Perfect speedup line shown for reference.

Table 1 :
The 2D vertex contributions to the overall EC for both 4-C and 8-C cases.Area and perimeter contributions (equivalent in both 4-C and 8-C cases) are given as well.

Table 2 :
3D vertex contributions to overall EC for the 26-C case along with perimeter, area, and volume contributions.

Table 3 :
Timing results in seconds and million pixels (faces) processed per second for random 2D fields.

Table 4 :
Timing results in seconds and million voxels/cells processed per second for random 3D fields.

Table 5 :
Timing results in million pixels processed per second for the low memory processing algorithm.

Table 6 :
Timing results in million pixels processed per second for the liquid crystal sensor case study.

Table 7 :
Timing results in million cells processed per second for the molecular dynamics case study.

Table 8 :
Timing results in million voxels processed per second for the hyperspectral imaging case study.