Total Generalized Variation regularization for multi-modal electron tomography

multi-modal electron tomography Richard Huber,∗,† Georg Haberfehlner,∗,‡,¶ Martin Holler,∗,†,§ Gerald Kothleitner,∗,‡,¶ and Kristian Bredies∗,† †Institute for Mathematics and Scientific Computing, University of Graz, Heinrichstraße 36, A-8010 Graz, Austria ‡Institute of Electron Microscopy and Nanoanalysis, Graz University of Technology, Steyrergasse 17, 8010 Graz, Austria ¶Graz Centre for Electron Microscopy, Steyrergasse 17, 8010 Graz, Austria §Centre de Mathmatiques Appliquées, École Polytechnique, 91128 Palaiseau Cedex, France.


Introduction
Electron tomography in the scanning transmission electron microscope (STEM) is a versatile tool for investigation of nanomaterials in three dimensions down to the single-atomic level 1,2 .
Insight into the 3D elemental and chemical make-up of a sample is provided by the spectroscopic signals originating from inelastic scattering in analytical tomography experiments 3 .
Also, EELS and EDXS signals can be acquired simultaneously for combined analytical tomography experiments 14 . Over the last years, analytical tomography has thereby become an important characterization method in nanoscale materials science. 13,[15][16][17][18] .
The 3D resolution of a sinogram is generally limited by the number of projections, which can be acquired in an experiment, and by the achievable tilt range. Reasons for these limitations are the overall dose the sample can sustain, the accuracy of the sample positioning mechanism for very small tilt steps, geometry of sample or sample holder, as well as limited time for an experiment. This leads to the situation that image reconstruction from an electron tomography experiment usually amounts to solving an ill-posed inverse problem with more unknown parameters than measurements. Especially for analytical tomography data, also the signal to noise ratio is very low. Therefore, a priori knowledge and regularization is essential to find better solutions for a tomographic reconstruction.
In sparse recovery the signal of interest is assumed to have sparse representation when transformed to a suitable basis. If such a transformation is possible, minimization of the 1 -norm in this domain enforces sparsity and can lead to an accurate solution.
The most important type of sparsity in electron tomography is sparsity of the image (or volume) gradient 19,23,24 . Known as Total Variation (TV) 25 minimization, 1 -minimization of the image gradient leads to an image (or volume) with locally constant regions, separated by sharp interfaces. For materials science samples, it can often be assumed that well separated phases exist within a sample, which is why TV minimization has found a range of applications in electron tomography [26][27][28][29] . It has been shown that TV minimization can be especially favorable for analytical electron tomography experiments, where usually only limited and noisy data is available 14,30 .
Two major limitations can be identified with current TV minimization approaches and their application in analytical tomography. First, the underlying model of TV minimization assumes that the sample only contains sharp interfaces between different phases, severely restricting the materials, which can be analyzed using TV minimization. All types of gradual changes between different sample regions, such as diffusion at interfaces, are typically not correctly recovered. Second, in analytical tomography, or other multi-modal tomography experiments, different signals are acquired at the same time, containing complementary information. But in most approaches this is not exploited but rather each signal is reconstructed independently. Until recently, there was only one attempt to link the information of different signals 31 , where the intensity in the HAADF signal was expressed as a linear combination of EDXS signals and reconstructed with a conventional reconstruction algorithm.
Here we present an approach for resolving these issues for multi-modal tomography. We apply coupled Total Generalized Variation (TGV) [32][33][34] regularization, which allows for both sharp interfaces as well as gradual intensity changes in the reconstruction, and exploits inter-channel correlation. This yields a joint multi-channel reconstruction of HAADF and EDXS data. We demonstrate our approach on phantom data and on an Al-5 wt.% Si alloy with 50 ppm Na and 6100 ppm Yb 14 , where precipitation of Yb and Si takes place at the nanoscale, but it is generally applicable to analytical electron tomography of all types of nanostructured materials.
We note that in two recent, independent works 35,36 , a coupled Total Variation approach was proposed to link data and create smooth reconstructions. While these works overcome the second limitation of existing approaches as mentioned above, the usage of TV regularization still limits their applicability to piecewise constant densities with sharp interfaces. However, a major benefit of three-dimensional local elemental distributions lies in analyzing local concentration variations. While the composition of well separated phases are often accessible already by 2D investigations after appropriate sample preparation, 4 local variations are always masked due to projection. A recent example is the unveiling of elemental concentration variations within spherical precipitates in an AlMgScZr-alloy 17 .
In such situations it would not be not possible to use TV regularization. Beyond overcoming this drawback, a second novelty of our approach is that we developed -and make publicly available 37 -a Python 38 /OpenCL 39 -based implementation for all components of the reconstruction algorithm, which is crucial in particular for 3D reconstructions as it significantly reduces computation time. In particular, we developed a custom, GPU-based implementation of the forward operator and its adjoint, ensuring, in contrast to other existing implementations, numerical adjointness of the two, which is in particular crucial for the convergence of iterative minimization algorithms.
We also note that TGV regularization for both single-and multi-channel data has been demonstrated to work well and to be superior to TV regularization in various applications [40][41][42][43][44][45] . In particular, in the context of joint magnetic resonance (MR) imaging and positron emission tomography (PET), multi-channel TGV regularization was shown to allow for improved recovery of quantitative measurements compared to single-channel approaches 46 . Also theoretical results on well-posedness and stability for multi-channel regularization are available 47 . Building on existing code for MR reconstruction 40 , a first application of TGV in the context of two-dimensional single-channel electron tomography was also presented in a conference proceedings paper 48 and shown to be superior to TV regularization.

Results
Main principles of multi-modal TGV reconstruction The proposed tomographic reconstruction framework is based on a convex variational method, employing volumetric coupled second order Total Generalized Variation (TGV) regularization, and the Kullback-Leibler (KL) divergence as discrepancy measure. Given some measured multi-channel 1 data f = (f 1 , . . . , f C ), a reconstruction u = (u 1 , . . . , u C ) is obtained as a solution of the convex minimization problem Here, D KL (ũ,f ) := ũ −f log(ũ) corresponds (up to constants and non-negativity constraints) to the Kullback-Leibler divergence, which is an appropriate data fidelity measure in the presence of Poisson noise. The operators T c map each image channel to the corresponding data space and comprise a slice-wise Radon transform and a possible change of resolution. The parameters µ c are positive and balance, for each channel, between data fidelity and regularization. The functional TGV 2 α is given as where ∇u denotes the Jacobian of the multi-channel image u, w is a matrix field with the same dimension as ∇u and Ew denotes a symmetrized Jacobian of w. The norms | · | frob are pointwise Frobenius matrix and tensor norms and · 1 denotes the one-norm, i.e., the sum of moduli over all image voxels. For solving the minimization problem (1) we employ a duality-based algorithm for which we developed a Python 38 /OpenCL 39 implementation.
Further details on the algorithm and the reconstruction approach are provided in the section

Materials and Methods.
We can identify two main features of the proposed coupled TGV regularization functional: First, the functional itself optimally balances between a penalization of first and second order derivatives. Choosing the auxiliary variable w to be locally zero or equal to ∇u, it is possible to penalize either the one-norm of ∇u or the one-norm of E∇u, where E∇u corresponds to the Hessian of u. For this reason the proposed method can be expected to recover both sharp interfaces as well as gradual intensity changes. Second, the pointwise matrices and tensors resulting from differentiation of u and w are coupled with a Frobenius norm, i.e., the two-norm over all entries. This enforces joint sparsity of the corresponding quantities, in particular, the edge set of each channel can be expected to have small support and to largely coincide with the ones of the other channels.

Reconstruction of phantom data
To illustrate the capability of the joint TGV algorithm, we first test it in an artificial example. For this purpose we created a multi-channel phantom object. The phantom is To investigate how a full 3D implementation of the reconstruction algorithm compares to a 2D implementation, we replicate the same slice 50 times rotating each slice slightly compared to the previous to get a 3D volume with changes along all three dimensions. Fig. 1 shows the phantom and the sinograms with and without noise. For each slice, sinograms are calculated with a size of 305 pixels over a range of 180 • with steps of 5 • (see Materials and Methods for more details regarding the phantom).
For reconstruction, we compare the standard method SIRT 49 with TV and TGV regularization, where for the latter two we provide results for 2D slice-wise and volumetric 3D regularization as well as uncoupled and coupled regularization of the different channels.
The SIRT reconstructions were obtained using the Astra toolbox 50,51 and for the TV and  Table 1.
In Fig. 2 we provide the reconstruction results, where three orthogonal slices for each method for both the HAADF and the EDXS signals are compared to the original phantom ( Fig. 2a). The SIRT reconstruction (Fig. 2b) of the low-noise HAADF data is quite acceptable, showing only some typical artifacts stemming from the limited number of projections. Due to the low signal of the EDXS data however, these reconstructions are dominated by the Poisson noise of the sinograms and hardly anything can be discerned.
Introducing variational regularization reduces the impact of noise on the reconstruction, 8 but it can be observed that the results strongly depend on the chosen regularizer. In an uncoupled 2D TV reconstruction (Fig. 2c) severe staircasing artifacts are present, these artifacts are removed by usage of the TGV 2 norm (Fig. 2d), but in both these reconstructions small features in the sample can completely disappear due to the low signal to noise ratio of the sinograms.
This situation can be significantly improved by coupling of the reconstructions for both TV (Fig. 2e) and TGV 2 (Fig. 2f). With transfer of information between the channels the presence and shape of smaller features becomes clear also in the reconstructions of EDXS data. In the TGV 2 reconstruction both sharp and gradual concentration changes can be recovered and also in the TV reconstruction staircasing artifacts are reduced.
By linking all slices in the third spatial dimension, the 3D implementation of the algorithm is even more robust to noise for both the uncoupled (Fig. 2g&h) and the joint reconstruction (Fig. 2i&j). The 3D joint reconstruction methods make full use of data correlation, along all three spatial dimensions as well as between the different channels and therefore gives the best reconstruction for both TV and TGV 2 . With TGV 2 constant areas and hard transitions are equally well reconstructed as with TV, but, by promoting piecewise linear functions in addition to sharp interfaces, TGV 2 is superior to TV when it comes to gradual intensity changes. In the TGV 2 reconstruction all but some of the very smallest features can be recovered.
These trends in improvement can also be quantified when considering the corresponding peak signal to noise ratios between the original phantom images and the reconstructions, as shown in Table 1. We see for example that the PSNR of the Ytterbium reconstructions strongly benefits from the coupling of channels both in 2D and 3D. Also, all channels greatly benefit from the change from 2D to 3D. Note that, even tough visually one can observe qualitative improvements with TGV compared to TV, both methods behave similarly in terms of PSNR. This may be explained by the fact that PSNR is not well suited to quantify such qualitative differences, but also by the fact that the parameters for the reconstructions of Fig. 2 were selected for visually optimal results. Indeed, reducing the amount of regularization slightly can be expected to yield an improvement in terms PSNR for all methods, but would decrease visual image quality. Nevertheless, we see that also in terms of PSNR, the improvement obtained with the proposed coupled 3D TGV reconstruction is quite significant compared to 2D uncoupled reconstructions or the SIRT method.

Experimental reconstruction
In an analytical tomography experiment we acquired tilt series from a needle-shaped sample prepared from an Al-5 wt.% Si alloy with 50 ppm Na and 6100 ppm Yb. HAADF STEM data and EDXS spectrum images were acquired every 4 • with a pixel size of 0.76 nm.
HAADF data and elemental maps of Yb, Si and Al at different tilt angles are shown in It should be noted that for both the TV and TGV 2 reconstruction, regularization parameters have be chosen to weight the impact of the norms. A comparison of the effect of different regularization parameters for experimental data is given in Fig. 5. In Fig. 5a the regularization parameter for the HAADF data is varied, while in Fig. 5b

Discussion
In both simulations and experiments, we could observe that TGV 2 reconstruction is well suited for reconstructing gradual transitions inside a sample, in addition to sharp interfaces.
Thereby it extends TV reconstruction, making sparse reconstruction applicable to a wider range of samples. In the investigated sample such gradual transitions are present between Yb-rich and Si-rich precipitates.
Linking different signals in the reconstruction significantly improves the quality of analytical STEM reconstructions, by transferring morphological information present in the HAADF STEM data to the analytical data and in between the analytical datasets.
Since this transfer is a key feature of our approach, we want to quickly illustrate its inner workings. For simplicity, assume we have only two channels and consequently the reconstruction approach reduces to i.e. we look for u with low corresponding value, improperly referred to as "cheap". Note that due to the nature of TGV 2 α it is cheaper for TGV 2 α to align edges or slopes in the two different channels in common positions. However this might come at the expense of the reconstruction fitting the data less, making the data terms more expensive. Thus a compromise between fitting and regularity might be ideal, and the concrete behavior therefore depends on various factors.
Let us assume f 1 contains far less noise than f 2 . Then one would naturally choose µ 1 greater than µ 2 , as then f 1 is more reliable and thus we make deviation from f 1 more expensive. In particular, this incentives the approach to use u 1 fitting f 1 well, and rather adjust u 2 to obtain common features. This is exactly the effect we observed between the HAADF and Yb reconstructions in the experimental data, where Yb was strongly adjusted to HAADF, but HAADF remained unchanged and in particular was not negatively impacted.
On the other hand, if both channels have the same noise level, and consequently the weights are chosen of similar magnitude, deviation from data is equally expensive for both channels leading to a more balanced compromise in adjusting the reconstructions to obtain common features. This for example happens in the reconstruction of the experimental data at the edges between Al and Si, leading to better defined edges for both channels.
Of course, if attaining common features comes at too high cost, in particular if there actually are no common features in truth, the algorithm will not go through forcing common features. So no adjustment appears where it makes no sense to adjust.
For the material, the observed gradual transition at the interface between Yb-rich and Si-rich precipitates indicates that a side by side nucleation of the Yb and eutectic Si particles might happen. The diffusion interface between Yb and Si leads to stoichiometry changes (i.e. gradual transition) of both phases depending on solute and/or solid diffusion of Yb atoms.

Conclusion
Without the TV-induced bias towards piecewise constant solutions, TGV 2 reconstruction achieves high-quality results -and is therefore applicable -for wide range of materials and will help in understanding their morphology, composition and functionality. The presented linkage of different signals is an ideal approach for getting most information out of the available data and can be used for any type of signal used for tomographic reconstruction in a TEM, but the approach is not limited to electron tomography, but can be applied to every type of projection-based tomography. In analytical electron tomography, optimized usage of the available data is mandatory, as quality and quantity of data which can be acquired in an analytical tomography experiment is severely limited by potential beam damage and acquisition time.

Numerical simulations
In this chapter we give a more detailed description of the phantom data shown in Section Reconstruction of phantom data. To demonstrate the functionality of the joint TGV reconstruction algorithm, we created a multi-channel phantom object. This phantom consists of three channels, which we attribute to the atomic fraction c A of Yb, Al and Si, and has an original size of 2048 × 2048 pixels, with the pixel size being set to 0.1 nm. To get a realistic estimation of the number of detector counts for both the EDXS signal and the HAADF signal we calculated local signal creation probabilities.
For the EDXS signal we used with the local density ρ, the sample thickness t, the local concentration of element A, m A (mass-fraction), the electron dose D e and the zeta-factor for element A, ζ A 52 . To calculate the local density we weighted the elemental densities of Yb (ρ Y b = 6.9 g cm 3 ), Al (ρ Al = 2.7 g cm 3 ) and Si (ρ Si = 2.  To get an estimate for the HAADF signal we used tabulated values of the differential cross section ∂σ ∂Ω for elastic scattering from the NIST database 53 integrating them over the angular detector range [θ min , θ max ] with 54,55 For the inner and outer detector angles we used θ min = 94.9 mrad and θ max = 188 mrad, which was measured at our system for the Fischione HAADF detector at a camera length of 58 mm.
For each element A the scattering factor f A was calculated as with the Avogadro constant N A = 6.022 × 10 23 mol −1 , the atomic mass m A , the local atomic fraction c A and the scattering cross section σ A for element A as well as with the local density ρ and the pixel size t.
The total HAADF signal is calculated using the scattering factors for all elements and the electron dose D e according to The electron dose is calculated from Equ. (4) with I e = 2 nA and a dwell time of t px = 2 ms.
To account for detector efficiency we reduce the HAADF signal by a factor of 0.7.
Finally the phantom data is downscaled from 2048 × 2048 pixels to 305 × 305 pixels using bilinear interpolation, corresponding to a pixel size of 0.67 nm, close to the experimental pixel size. After downscaling the pixel values are multiplied by a factor of 0.67 to account for the larger thickness of each pixel.

Electron tomography experiments
In this section, we give a more detailed description of the experimental data used in the

Data preprocessing
For the variational reconstruction to work efficiently, the raw sinogram data f needs to be consistent with the model. Therefore, the data requires suitable preprocessing, and failing to do so could result in artifacts in the reconstructed images. In the Supporting Information we also discuss the effect of model inconsistencies, which may arise for HAADF STEM data.
To correct for sample drift during spectrum image acquisition, the spectrum image

Variational modeling
In this section we provide details on the proposed variational model for electron tomography reconstruction, which comprises the solution of (1). For an easier understanding, we describe the reconstruction of a single signal first. Then, the imaging plane is rotated and the experiment is repeated for several tilt angles.
Mathematically, this setup can be described as follows: Given an angle θ ∈ S 1 (the unit sphere in R 2 ), the imaging plane is defined as P θ = {(x, y, z) ∈ R 3 (x, y) · θ = 0} and, for given parameters s and z describing the horizontal and vertical offset, respectively, the measurement line L s,θ,z orthogonal to this imaging plane is given as L s,θ,z := {(x , y , z ) ∈ where the sum is taken over all positions (x, y) in the discrete pixel grid. We note that, naturally, this is not the only possible discretization 61 , but we chose this one since it comprises a suitable balance between approximation accuracy and computational effort. In particular, it allows a very efficient computation of the numerically exact adjoint operation (backward projection) which will be required in the optimization algorithm. This is of particular importance, as a non-exact adjoint can significantly hinder optimization both in convergence speed and accuracy.
Noise model. For both the HAADF and the EDXS signal the main contribution to noise, especially at low count rates, comes from the counting of electrons or X-rays at the detector, which follows Poisson statistics 62,63 . Additionally, the signal can also be affected by Gaussian noise stemming from the electronics. Omitting Gaussian noise (see also the section Data preprocessing for related preprocessing steps), the noise on the data is typically assumed to be Poisson distributed.
Hence, from a Bayesian perspective, the appropriate data discrepancy term in a variational model is the Kullback-Leibler divergence 64 , which is for non-negative densitiesũ, f (up to constants) given as where we use the convention 0 log(0) = 0 and −a log(0) = ∞ for a > 0. For discrete measurements, the integral is replaced by a sum.
Regularization. In order to achieve a regularized inversion of the Radon transform corresponding to the forward operator of an electron tomography experiment, we incorporate the Total Generalized Variation (TGV) 32,33 as regularization term. The TGV functional realizes an 1 -type penalization of higher order derivatives via a cascadic decomposition of the unknown. With that, it maintains the strength of the popular Total Variation (TV) functional 25 of allowing to reconstruct sharp interfaces, while overcoming its defects, particularly its tendency to foster piecewise constant density distributions (the staircasing effect). TGV can be defined for an arbitrary order of differentiation k ∈ N and for k = 1 coincides with the TV functional. In this work, we employ the second order TGV functional for regularization (k = 2), which for a single-channel image (or volume) u and parameters α 0 , α 1 ∈ (0, ∞), is given as where w is a vector field and Ew = 1 2 ∇w + (∇w) T denotes the symmetrized Jacobian matrix. The pointwise norms | · | and | · | frob are the standard Euclidean vector norm and the Frobenius matrix norm, i.e., the root of the sum of squares of all entries of the vector or matrix respectively. Thus, TGV 2 α minimizes the norm of the gradient while allowing subtraction of a function w, but penalizing this via the norm of the symmetrized Jacobian of w. In particular, in areas with constant u, w can be chosen 0, and in areas with constant ∇u, meaning u is linear, w can be chosen equal to ∇u, resulting in no contribution of such areas to the TGV 2 α functional. Therefore, in reconstructions, TGV 2 α promotes piecewise linear solutions, which allows sharp edges to remain intact while allowing linear density gradients, hence maintaining advantages of TV while overcoming its defects.
A discretization of TGV 2 α is defined according to (10) with the discrete 1 norm, the pointwise Frobenius norms of a vector and a matrix given in the standard way, and discretized differential operators. Denoting by U = R N 1 ×N 2 ×N 3 the space discrete volumetric densities, the discretized gradient ∇ : U → U 3 and symmetrized Jacobian E : U 3 → U 6 are defined for u ∈ U and w ∈ U 3 , respectively, as ∇u = ((∇u) 1 , (∇u) 2 , (∇u) 3 ), where and δ 1+ , δ 2+ , δ 3+ : U → U and δ 1− , δ 2− , δ 3− : U → U being the standard forward and backward finite difference operators with respect to the 3 space dimensions, using pixel replication at the boundary. Note that for Ew we store each off-diagonal entry only once and, to account for that, count off-diagonal entries twice in the evaluation of the Frobenius norm of Ew.
Discrete single-channel model. With the TGV functional in (10), the forward operator in (8) and the discrepancy term as in (9), we formulate the reconstruction of a single channel volumetric image as the following convex minimization problem: Problem (11) balances the discrepancy between the Radon transform T u and the data f with the TGV 2 α (u) regularization, weighted by the regularization parameter µ > 0.
Discrete multi-channel model. In order to jointly reconstruct density information and elemental maps and to take advantage of complementary information in these datasets, we propose a joint TGV regularization method for multi-channel data. To this aim, we define extension of TGV for multi-channel data, as introduced in previous works 34,65 , by regarding multiple images with common resolution as one multi-channel image u = (u 1 , u 2 , . . . , u C ) and setting Note that here, the operators ∇, E are applied channel-wise, hence ∇u − w is a matrix field, Ew is a tensor field and we use pointwise Frobenius norms for both of them. This corresponds to an 1 / 2 -type penalization and promotes joint sparsity of both ∇u − w and Ew, and in particular promotes joint sparsity of the edge sets of all channels. We note that in other works 46 , a similar regularization has been proposed for the joint reconstruction of MR and PET data, where a nuclear-norm based coupling of first order derivatives was carried out. While the authors reported a slight improvement by using the nuclear norm, the performance was comparable to a Frobenius-norm based penalization and for the sake of simplicity, we employ the latter.
With this multi-channel extension of TGV, we propose the following joint reconstruction model for multi-channel data as extension of (11): Here, different data discrepancies are used for the different, measured data sets denoted by f c and each µ c weights data fidelity and regularization for the c-th channel. Solving (13) balances the joint regularity of all channels imposed by TGV 2 α with the individual channel discrepancies, thus promoting joint edge sets while jointly reconstructing the individual channels. The operators T c model the measurement operation for each channel and comprise a Radon transform as described above. In this context, we also note that in our experiments 26 all data sets were acquired and reconstructed with the same resolution. Since acquisition of HAADF data is typically much faster than EDXS, an alternative would be to acquire the HAADF signal with a higher resolution and reconstruct all channels at this high resolution.
This can be realized in our model by including an additional subsampling operator in the EDXS channels after the Radon transform and would allow a joint reconstruction and zooming of EDXS data.

Numerical solution
In this section we present the derivation and implementation of a solution algorithm for (13), of which the single-channel setting (11) is a special case. For further reference, we note that an Python/OpenCL 38,39 implementation of the algorithm as well as a script reproducing all experiments of the paper is available online 37 . Due to the non-differentiability of the objective functional, we employ a first order primal-dual algorithm 66 to solve (13). In order to do so, the problem needs to be reformulated as a saddle-point problem to which the primal-dual algorithm is applicable. To this aim, we first note that TGV 2 α can be re-written with (v, ξ) being the scalar product, i.e., the sum of the pointwise products of all entries of v and ξ, we can then rewrite (13) as ⇔ min Here, the variables p ∈ U C×3 , q ∈ U C×6 are the dual variables corresponding to TGV 2 α , r = (r 1 , . . . , r C ) ∈ V C is a dual variable corresponding to the data terms, and Hence, Problem (13) where F * summarizes the conjugate functionals in (14) and H(x) = H(u, w) = I D (u). With the notion of proximal mappings, i.e., prox g,γ (z) = argmin the iteration steps of the primal-dual algorithm 66 employed to the saddle-point problem Here, K * denotes the adjoint operator of K and, as a direct computation shows, the proximal mapping of F * decouples component-wise and can be given as where for any ν > 0 and any discrete spatial coordinate x (proj ν (r))(x) = r(x)/ max(1, |r(x)| frob /ν), which corresponds to a projection to the set {r | |r(x)| frob ≤ ν for all x}, and the proximal mapping of the data terms is given as The proximal mapping of H corresponds again to a projection and is given as where the maximum is taken point-and component-wise. We also refer to a related work 46 for further details on these derivations.
In summary, we see that, due to our particular reformulation, all iteration steps of Algorithm (17) can be computed explicitly and fast. Furthermore, convergence of the iterate x to a global optimum of (13) can be guaranteed 66 . A more explicit formulation of the algorithm can be found in Algorithm 1 below.

Algorithm 1 TGV regularized ET reconstruction
1: function TGV-ET reconstruction(f, α, µ) 2: Choose σ > 0 and τ > 0 such that στ K 2 < 1 5: repeat 6: p ← proj α 1 (p + σ(∇ū −w)) 7: q ← proj α 0 (q + σEw) 8: 12: (u, w) ← (u + , w + ) 13: until Stopping criterion fulfilled 14: return renormalize(u) There, the normalization of the operator T and the corresponding data f in line two aims to obtain an operator with norm approximately 1 and data on similar scales, as we experienced this to improve convergence speed. In order to do so, we estimate the  In order to further accelerate the computations, we developed an implementation in Python 38 via OpenCL 39 for multi-core CPUs and Graphical Processing Units (GPUs), which allows for highly parallel execution of all major computations. This is of particular relevance for the forward and backward projections, which are major contributors to the computational effort needed for Algorithm 1 and can be executed completely in parallel, leading to significant speedups. Parameter choice. Finally, we remark on our parameter choices for the proposed approach.
For the TGV parameters α = (α 0 , α 1 ), we note that only the ratio α 0 /α 1 is relevant and balances between first and second order smoothness. Here, the choice α = (4, 1) is known to be rather robust and yield reasonable results, hence we used this choice for all experiments of the paper.
The choice of the regularization weights, however, is more challenging and can significantly impact reconstructions. We used visual comparison of results to identify suitable weights µ = (µ 1 , . . . , µ C ). To do so, consider Fig. 5 showing the principle that low regularization leads to noise remaining in the reconstruction which progressively decreases with more regularization (i.e. smaller µ), while features and edges conversely get more blurred. Hence, as a rule of thumb one tries to find parameters with as low regularization as possible which are noise free. The proposed method is stable with respect to parameter choice, small changes of parameters have barely any effect on the reconstruction and consequently, a reasonably fine search for parameters suffices. For this visual comparison, reconstructions with a rather low number of iterations are sufficient to identify noise or blurring in reconstructions, and consequently parameters can be adjusted.
For the presented experimental data, the HAADF data contains far less noise than the EDXS data, for this reason we could observe that the choice of regularization parameters for the EDXS data does not have a strong impact on the HAADF reconstruction. Therefore, first the regularization parameter for the HAADF data was fixed to µ HAADF = 0.1 (see Fig. 5a for a comparison of different parameters).
For the choice of µ c for the EDXS channels, we remember that these values should 32 roughly correspond to the (unknown) noise levels present in each channel and that channels with similar amount of noise, e.g., with similar electron counts, should be handled with similar parameters. Motivated by this observation, a heuristic that we employ to reduce the amount of parameters is to choose only one parameter ν and set µ i = νξ i where ξ i = max(f i ).
Note that an alternative choice is ξ i = x f i (x) and for further improvement, one could also use these heuristics to quickly identify a range in which reasonable parameters lie. For our case setting ξ i = max(f i ) gave a ratio of µ Al : µ Si : µ Yb = 2.4 : 1.4 : 1 and worked reasonably well. Making a good choice of ν by visual inspection finally let to parameters for the EDXS maps given as (µ Al , µ Si , µ Yb ) = (0.0024, 0.0014, 0.001). A comparison of different settings for these parameters is given in Fig. 5b. For the parameters used in the reconstruction of the phantom data shown in Figure 2 we refer to Supporting Table 1 where we used visual comparison of results to determine the parameters.