The causality principle in the reconstruction of sparse NMR spectra†

The invention of multidimensional magnetic resonance (MR) experiments 40 years ago led to the success of the modern MRI and NMR spectroscopy in medicine, chemistry, molecular structural biology, and other fields. This approach, however, has an important weakness: the detailed site-specific information and ultimate resolution obtained in two and higher dimensional experiments are contingent on the lengthy data collection required for systematic uniform sampling of the large multidimensional space spanned by the indirectly detected spectral dimensions. A fundamental solution to this problem stems from an observation that upon appropriate transform, e.g. from the NMR time to frequency domain, the MR signal becomes nearly-black or sparse, i.e. essentially zero in the vast majority of points and thus largely redundant. Darkness of the MR images and NMR spectra is a key for the remarkable success and rapid development of the non-uniform sampling (NUS) methods. The darker an object is, the less experimental measurements are needed for its recovery. The transform that brings data into a dark presentation is called sparsifying transform. In NMR, the Fourier transform connects the complex free induction decay (FID) signal in the time domain and frequency spectrum. A properly phased spectrum consists of the real absorption part used for the analysis and the redundant imaginary dispersion part. Since an absorption signal is much narrower than the dispersion, the latter contributes the most to the total spectrum brightness. The main result of this paper is the notion that for most of the currently used algorithms, e.g. compressed sensing, SIFT, maximum entropy, MINT, etc., it is the dispersion part that sets the lower limit for the amount of measured data required for the high quality spectrum reconstruction from the NUS signal. We show that the causality property of the NMR signal can be used to construct a sparsifying transform, which eliminates the spectral dispersion part and, thus, allows spectrum reconstruction with better fidelity and from fewer measurements. In NMR, the causality reflects the fact that the FID signal is only observed after excitation of the spin system, e.g. by a radiofrequency pulse, and is zero before the excitation. It is well known that the Fourier transform of a causal time signal S(t) leads to a spectrum, whose real and imaginary parts can be produced from each other using the Kramers–Kronig relations also known as the Hilbert transform. The Kramers– Kronig relations are illustrated in Fig. 1. Signal SFID(t) (Fig. 1a)

The invention of multidimensional magnetic resonance (MR) experiments 40 years ago led to the success of the modern MRI and NMR spectroscopy in medicine, chemistry, molecular structural biology, and other fields. This approach, however, has an important weakness: the detailed site-specific information and ultimate resolution obtained in two and higher dimensional experiments are contingent on the lengthy data collection required for systematic uniform sampling of the large multidimensional space spanned by the indirectly detected spectral dimensions. 1 A fundamental solution to this problem stems from an observation that upon appropriate transform, e.g. from the NMR time to frequency domain, the MR signal becomes nearly-black or sparse, i.e. essentially zero in the vast majority of points and thus largely redundant. Darkness of the MR images and NMR spectra is a key for the remarkable success and rapid development of the non-uniform sampling (NUS) methods. [1][2][3][4][5][6][7] The darker an object is, the less experimental measurements are needed for its recovery. 8 The transform that brings data into a dark presentation is called sparsifying transform. In NMR, the Fourier transform connects the complex free induction decay (FID) signal in the time domain and frequency spectrum. A properly phased spectrum consists of the real absorption part used for the analysis and the redundant imaginary dispersion part. Since an absorption signal is much narrower than the dispersion, the latter contributes the most to the total spectrum brightness. The main result of this paper is the notion that for most of the currently used algorithms, e.g. compressed sensing, 4,5 SIFT, 9 maximum entropy, 2,7 MINT, 6 etc., it is the dispersion part that sets the lower limit for the amount of measured data required for the high quality spectrum reconstruction from the NUS signal. We show that the causality property of the NMR signal can be used to construct a sparsifying transform, which eliminates the spectral dispersion part and, thus, allows spectrum reconstruction with better fidelity and from fewer measurements. In NMR, the causality reflects the fact that the FID signal is only observed after excitation of the spin system, e.g. by a radiofrequency pulse, and is zero before the excitation.
It is well known that the Fourier transform of a causal time signal S(t) leads to a spectrum, whose real and imaginary parts can be produced from each other using the Kramers-Kronig relations also known as the Hilbert transform. 10 The Kramers-Kronig relations are illustrated in Fig. 1. Signal S FID (t) (Fig. 1a) and the corresponding spectrum in Fig. 1b are related via the Fourier transform. The spectrum in Fig. 1d is produced from the one in Fig. 1b by zeroing its imaginary part. The inverse Fourier transform of the real spectrum in panel d gives a complex time domain signal (Fig. 1c), whose real and imaginary parts are essentially even and odd parts of the real and imaginary components of the FID (Fig. 1a), respectively. Thus, the signal in Fig. 1c can also be produced by the time reversal and complex conjugate of the FID.
In the following, we call the S VE (t) signal in eqn (1) virtualecho (VE). The original signal S FID (t) can be obtained from S VE (t) by zeroing the signal for negative time. Direct transition from panel d to panel b in Fig. 1 is done by the Hilbert transform. In practice, the Hilbert transform algorithm takes the detour dcab ( Fig. 1) in order to use the computationally efficient fast Fourier transform.
The spectrum (Fig. 1d) obtained from the VE representation ( Fig. 1c) consists of the traditionally looking real part and zero imaginary part. Depending on the signal phase, the real part can contain absorption, dispersion, or a mixture of the both modes. Given a priori, the phase, eqn (1) allows us to obtain the time domain signal corresponding to the pure absorption spectrum and, thus, to construct a sparsifying transform that produces a significantly darker spectrum than the traditional Fourier transform of the original FID.
Obtaining NMR spectrum from a time-domain signal is a typical example of the mathematical inverse problem. When all data points in the signal are present, the solution of the problem is trivial and is given by the Discrete Fourier Transform (DFT). In the case of NUS, most of the data in the timedomain signal are missing and the unconstrained inverse problem has an infinite number of solutions (i.e. spectra). A unique and ''correct'' spectrum is obtained by introducing additional assumptions such as minimal power, maximum entropy, maximal sparseness, etc. The VE presentation is equally applicable to traditional fully sampled and NUS signals. When the former is processed using DFT, FID and VE presentations lead to the equivalent spectra as illustrated in Fig. 1. However, when reconstructing spectra from the NUS signal and in some other cases, 11 use of the Kramers-Kronig relations, namely path acd in Fig. 1, represents a significant advantage over the traditional processing, which is abd. Fig. 2 demonstrates the benefits of the VE signal for two modern spectra recovering algorithms used for the NUS signal: spectroscopy by Integration of Frequency and Time Domain (SIFT) 9 and Compressed Sensing by Iterative Reweighted Least Squares (CS-IRLS). 4,12 Similar results for the alternative CS algorithm, Iterative Soft Thresholding (CS-IST), 4,13,14 are presented in Fig. S3 (ESI †). Both CS algorithms and SIFT can be applied without modifications to either the traditional FID or VE signal. With SIFT making use of the prior knowledge about positions of dark regions in a spectrum and CS searching for the darkest among all possible spectra consistent with the measured data, both methods are expected to benefit from the darker representation of the spectrum provided by VE.
For a given number of NUS measurements, quality of the SIFT reconstruction improves, when the larger fraction of the spectrum area is free from signals and contains only the baseline noise. In our calculations, the signal-free area is defined by a mask, which excludes rectangles of defined size around all peaks in the spectrum. This corresponds, for example, to a setup in relaxation and kinetics studies, 15 where the peak positions are known and only their intensities or integrals need to be defined. Fig. 2a and b show reconstructions of a 2D 1 H-15 N . The residuals are defined as an RMSD of the difference between the reference spectrum and the corresponding CS-IRLS reconstruction measured over the signal regions (AE50 Hz in all spectral dimensions around every peak in a complete manually verified peak list). As the reference we use 6% NUS HNCO averaged over the reconstructions obtained with and without VE.
HSQC spectrum of human alpha-synuclein obtained using only 15% of the data from the full experiment.
By avoiding broad dispersion peaks, the VE signal ensures that a larger fraction of the spectrum is ''dark'' and thus SIFT produces a much better spectrum (Fig. 2b and Fig. S4, ESI †) and more accurate peak intensities in comparison to the reconstruction from the original FID ( Fig. 2e and Fig. S5, ESI †). Fig. 2e (inset) illustrates that prior information about the signal phase does not have to be exact. For the SIFT example, the peak intensities in the VE reconstruction obtained for the uncorrected up to 151 phase are still better reproduced than those measured in the spectrum calculated for the traditional FID representation. A similar behaviour is also observed for the CS algorithms. For most of the multidimensional experiments, zero order phases for the indirect spectral dimensions are known and thus can be corrected in the time domain to values close to zero prior to the spectrum reconstruction.
Similarly to SIFT, CS also assumes that the major part of a spectrum is dark. However, no assumption is made about the exact location of the dark regions, which creates an apparently unsolvable combinatorial problem. Yet, it has been recently reformulated as a relatively simple task of spectral l p -norm (0 o p r 1) minimization: 16 where F and S are the frequency spectrum and time domain signal, respectively; A is the matrix derived from the inverse Fourier transform matrix; and l p -norm is defined as: In the present paper p = 1 is used for the IST algorithm 13 and l p -norm with p iteratively approaching 0 for the IRLS algorithm. 4,17 The use of the CS method in NMR spectroscopy has been commented recently by many authors, 4,5,18,19 with important conclusions on the limited applicability to non-random sampling 20 and superior performance of non-convex l p -norms ( p o 1). 19,21 Here we apply the CS IRLS algorithm 4 to reconstruct a 3D HNCO spectrum sampled at the level of 0.7%, without VE (Fig. 2c) and with VE in both indirect dimensions (Fig. 2d). It can be seen that VE improves the reconstruction significantly by providing better line shapes, more accurate peak intensities (Fig. 2f), and revealing low intensity signals. Fig. S3 (ESI †) shows a notable improvement for the 2D 1 H-15 N HSQC spectrum of intrinsically disordered protein alpha-synuclein processed with CS-IST.
The effect can be explained using the basic CS theorem, binding the number of properly reconstructed spectral points, which is essentially a measure of spectrum darkness, with the sampling level. 16 With the VE, fewer points contribute to each peak in the spectrum and thus relatively low sampling level is sufficient to fulfil the condition for the successful CS reconstruction. It should be emphasized that the striking advantage of the VE demonstrated in Fig. 2 and Fig. S3-S5 (ESI †) is mostly due to the very low sampling level. Without the VE, high quality reconstructions by CS and SIFT are also possible, but require at least twice as many sampling points for the presented spectra (inset in Fig. 2f and Fig. S4, ESI †).
As pointed out by Donoho et al., 8 there is an unambiguous relationship between the darkness of the NMR spectrum and the quality of the spectral reconstruction by the maximum entropy or minimum l 1 -norm minimisation. It is therefore likely that most of the related methods including FMreconstruction, 22 MINT, 6 hmsIST, 14 QME, 7 etc. will also benefit from the VE signal.
We show that the causality property of the NMR signal can be exploited to dramatically enhance the performance of the CS, SIFT and probably many other algorithms commonly used for the reconstruction of NUS spectra. Our findings open a way for significant reduction in measurement time and improvement of the quality of NUS spectra and thus should increase the power and appeal of multidimensional NMR spectroscopy in multitude of its existing and future applications. The method is particularly useful for short living systems, time resolved measurements, and high-dimensional experiments on intrinsically disordered proteins.