Anomalous diﬀusion models and their properties: non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking

Modern microscopic techniques following the stochastic motion of labelled tracer particles have uncovered significant deviations from the laws of Brownian motion in a variety of animate and inanimate systems. Such anomalous diﬀusion can have diﬀerent physical origins, which can be identified from careful data analysis. In particular, single particle tracking provides the entire trajectory of the traced particle, which allows one to evaluate diﬀerent observables to quantify the dynamics of the system under observation. We here provide an extensive overview over diﬀerent popular anomalous diﬀusion models and their properties. We pay special attention to their ergodic properties, highlighting the fact that in several of these models the long time averaged mean squared displacement shows a distinct disparity to the regular, ensemble averaged mean squared displacement. In these cases, data obtained from time averages cannot be interpreted by the standard theoretical results for the ensemble averages. Here we therefore provide a comparison of the main properties of the time averaged mean squared displacement and its statistical behaviour in terms of the scatter of the amplitudes between the time averages obtained from diﬀerent trajectories. We especially demonstrate how anomalous dynamics may be identified for systems, which, on first sight, appear to be Brownian. Moreover, we discuss the ergodicity breaking parameters for the diﬀerent anomalous stochastic processes and showcase the physical origins for the various behaviours. This Perspective is intended as a guidebook for both experimentalists and theorists working on systems, which exhibit anomalous diﬀusion.


Introduction and historical perspective
It is possible that the movements to be discussed here are identical with the so-called ''Brownian molecular motion''; however, the information available to me regarding the latter is so lacking in precision, that I can form no judgment in the matter. This statement is part of the introduction of Albert Einstein's first and seminal 1905 paper on the theory of diffusion. 1 It refers to the observations reported by Robert Brown in 1828 of small granules (or Molecules, as I shall term them) of 1 4000 th to 1 5000 th of an inch extracted from larger pollen grains. Brown found these particles evidently in motion. 2 He made meticulously sure that the motion he observed was not the effect of living matter, and he even studied the motion of such molecules as of a bruised fragment of the Sphinx. 2 In his second, 1906 paper Einstein then quotes the experimental proof by Gouy 3 that indeed the motion perpetuated by Robert Brown is caused by the irregular thermal movements of the molecules of the liquid, and thus described by Einstein's theory. 1 As remarked by Marian Smoluchowski in his equally seminal 1906 article, 4 Einstein reinvigorated the interest in Brownian motion. Since then, the interest in the molecular phenomenon of diffusion is unbroken.
Considering the local concentration difference and the counteracting flux of microscopic particles with a typical mean free path, Einstein derived the diffusion equation † 1 @ @t Pðx; tÞ ¼ K 1 @ 2 @x 2 Pðx; tÞ (1) with the coefficient of diffusion K 1 for the probability density function (PDF) P(x,t) to find the particle under observation at position x at time t. This equation is indeed equivalent to Fick's second law for the concentration of a chemical substance originally presented by Adolf Fick from a combination of the continuity equation and the constitutive equation (Fick's first law). 5 If the particle is released at the origin at time t = 0 in an unbounded space, the solution of the diffusion equation (1) is the normalised Gaussian PDF Einstein remarks that this solution is that of the fortuitous error, which was to be expected. 1 From the PDF (2) we immediately obtain the variance the so-called mean squared displacement (MSD). From the dynamic equilibrium of suspended particles Einstein (and later independently Smoluchowski) derived the celebrated relation between the diffusion coefficient, thermal energy k B T, the mass m of the observed particle, and the friction coefficient Z of unit 1 s À1 . In the second equality of eqn (4) we replaced the Boltzmann constant k B by the ratio of the gas constant R, quite precisely known at the time, and Avogadro's number N A .

Ralf Metzler
Ralf Metzler studied physics at Ulm University, where he also received his doctorate. As a postdoc he worked at Tel Aviv University and MIT. After his first faculty appointment at NORDITA in Copenhagen he moved to the University of Ottawa as Canada Research Chair in Biological Physics. In 2007 he became a professor at Technical University of Munich, since 2011 he has been the chair of Theoretical Physics at Potsdam University. Ralf works on stochastic processes, non-equilibrium statistical physics, crowded soft matter systems, and biological physics of gene regulation. Ralf received the Amos de Shalit Minerva, Feodor Lynen Humboldt, and Emmy Noether DFG fellowships. He is currently also Finland Distinguished Professor at Tampere University of Technology.

Jae-Hyung Jeon
Jae-Hyung Jeon received his PhD from Pohang University of Technology and Science, Korea. He then moved for postdoctoral studies to the Technical University of Munich, Germany and later to Tampere University of Technology in Finland. At present he is an assistant professor of statistical physics at the Korea Institute for Advanced Study in Seoul. His research interests include diffusion and transport processes in living cells and complex fluids, stochastic theory of anomalous diffusion models, DNA and membrane biophysics, as well as polymer physics.

Andrey G. Cherstvy
Andrey G. Cherstvy received his PhD in theoretical biophysics from Düsseldorf University in 2002. After several PostDoc terms, he currently conducts an independent DFG-funded research at Potsdam University. Andrey Cherstvy published about 50 papers, with the main focus on theoretical and computational biophysics, statistical mechanics, DNA, proteins, polyelectrolytes, membranes, and viruses. Recently, he has also become active in the fields of anomalous stochastic processes, tracer diffusion in biological cells, and ergodicity breaking.

Eli Barkai
Eli Barkai graduated from Tel Aviv University and currently is a professor of physics at Bar-Ilan university in Israel. In his postdoc studies with Bob Silbey he developed the theory of single molecule spectroscopy, which is also the subject of the current manuscript. His main research is focused on statistical physics, in particular weak ergodicity breaking and infinite ergodic theory. He together with Ralf Metzler and Yossi Klafter contributed to the field of anomalous diffusion in the development of fractional kinetic equations. He received the Yeshiahu Horowitz Science of complexity fellowship, the Krill prize for excellence in scientific research selected by the Wolf foundation, the Michael Bruno memorial award funded by Yad Hanadiv, and the Friedrich Wilhem Bessel award selected by the Humboldt foundation.
Yet another derivation of Brownian motion with the MSD (3) was published in 1908 by Paul Langevin using the concept of a stochastic force. The Langevin equation combines Newton's second law with the white Gaussian noise x(t) of zero mean and autocorrelation function hx(t)x(t 0 )i = 2K 1 d(t À t 0 ). 6,7 In its overdamped form relevant for the single particle tracking experiments we will refer to below, it reads 6,7 dxðtÞ dt ¼ xðtÞ: The Langevin formalism represents a very intuitive physical picture for Brownian motion. From the Langevin eqn (5) it is easy to get back to the MSD (3). Likely prompted by discussions with his friend Paul Langevin, 8 the fundamental  led Jean Perrin at the Sorbonne in Paris to conduct the first extensive and systematic measurement of the diffusion of single microscopic particles to determine Avogadro's number N A . ‡ 9 While Perrin was confined to short measured trajectories and the need to use ensemble averages over not perfectly identical particles, to the best of our knowledge it was Ivar Nordlund at the University of Uppsala in Northern Sweden, who in 1914 came up with the innovative idea to record the motion of individual sedimenting mercury droplets on a moving film plate, 13 see Fig. 1. Nordlund managed to produce impressively long individual time series of the droplet position. From separate analysis of each single trajectory he determined the diffusion coefficients of the traced droplets. The mass of the droplets was deduced from the sedimentation speed using Stokes' formula. 13 In the sense of the combination of single particle tracking with the time series analysis of single recorded trajectories first performed by Nordlund, we celebrate this year the centenary of modern single particle tracking.
Nordlund's experimental setup, the MSD from a single trajectory, as well as a sample trajectory are shown in Fig. 1 and 2. Nordlund advocated in his paper that the principle of the method of measurement consists in the automated recording of the Brownian displacements of the particles in exactly identical time intervals, free of personal errors. 13 Perrin's and Nordlund's studies prompted a string of diffusion experiments to determine the value of N A ever more precisely in the years to come, culminating in the high precision torsional diffusion experiments by Eugen Kappler, who in his PhD thesis at the University of Munich found the remarkable result N A = 60.59 Â 10 22 AE 1%. § 14 We show the experimental shape of the torsional Brownian motion measured by Kappler in Fig. 3.
Hundred years after Nordlund's conception of single particle tracking by the analysis of individual particle traces, modern microscopic technology is routinely used by experimentalists to record the motion of fluorescently labelled single molecules or submicron tracer particles. ¶ [15][16][17][18] In these experiments the recorded time series x(t 0 ) is evaluated in terms of the time averaged MSD8 The time series x(t 0 ) of length t (the measurement time) is thus evaluated in terms of squared differences of the particle position Top: Nordlund's experimental setup with the light source, an infrared absorbing water-filled cylinder, and the clock-controlled, electromagnetic shutter (on the table to the left) constituting a stroboscope. Mounted on the separate optical table on the rack to the right, the object chamber with the mercury droplet, as well as the objective and the camera are the heart of the experiment: the camera is connected to an electric motor moving the photographic plate with constant velocity. Bottom: example for the time averaged mean squared displacement versus time (in seconds) from a single recorded mercury droplet. Images taken from ref. 13. ‡ For completeness we mention that theories of Brownian motion appeared earlier than Einstein's studies. In particular, the Dane Thorvald Thiele set up a theory for independent and normally distributed increments in his 1880 work on the least squares method. 10 Louis Bachelier in Paris applied a stochastic process to model the dynamics of stock markets in 1900. 11 Concurrently with Einstein, the Melbourne physicist William Sutherland followed similar lines to Einstein in his 1905 paper on diffusion. 12 § Immerhin dürfte die Bestimmung der Loschmidtschen Zahl mit dieser Methode auf AE1 Proz. erreicht sein.-After all, with this method the determination of the Loschmidt [Avogadro] number should be achieved within AE1 per cent. 14 ¶ We note that the concept of motion analysis of synthetic active matter is also a topic of high current interest. 19 8 Usually, for data analysis a discrete version is used in which the integral is replaced by a sum. We here use the equivalent continuous notation. In what follows, we denote ensemble averages of an observable O with angular brackets, hOi, and time averages with an overline, O. Note that the definition (6) is not unique, however, it represents the most standard choice used in the literature. separated by the so-called lag time D which defines the width of the window slid along the time series x(t 0 ). Typically, d 2 ðDÞ is considered in the limit D { t to obtain good statistics. It is easy to show that for Brownian motion d 2 ðDÞ ¼ 2K 1 D as long as the measurement is sufficiently long. Compared with eqn (3) we observe the equivalence and therefore call the process ergodic: ensemble averages and long-time averages are equivalent in the limit of long measurement times. Apart from direct imaging of the motion of a tracer particle using a microscope, optical tweezer setups can be used to obtain an improved temporal and spatial resolution of a suitable tracer. 20 Such single particle tracking can be used to measure the motion of tracers in quite complex media such as living biological cells. 17,18 Alternatively to single particle tracking, which provides the time series x(t 0 ) of the particle position, the diffusion of labelled molecules can be measured by methods such as fluorescence correlation spectroscopy (FCS), 21 fluorescence recovery after photobleaching (FRAP), 22 or fluorescence (Förster) resonant energy transfer (FRET). 17,23 While these latter methods have many advantages-for instance, that they can measure the motion of smaller tracers-they have the intrinsic disadvantage that the quantity they measure is not the particle position but averages over the position such as the blinking correlation function of fluorescent particles entering and leaving the illuminated focal spot in FCS. Due to the underlying averaging, these latter methods thus do not provide the same full information as direct single particle tracking. We note that the MSD of stochastic systems may also be determined with techniques such as dynamic light scattering 24 or laser Doppler velocimetry. 25 Already in 1926 an exception to the linear time dependence (3) of the MSD of Brownian motion was analysed by Lewis Fry Richardson. For the relative diffusion of two tracer particles in a turbulent flow he observed strongly non-Brownian behaviour. 26 He introduced the notion of non-Fickian diffusion and used a diffusion equation with separation dependent diffusivity @q @t ¼ const Â @ @l l 4=3 @q @l ! for the PDF q(l,t) of the relative displacement l to find the power-law MSD hl 2 (t)i C t 3 with the characteristic Fig. 2 Example for the recorded motion of a 'submicroscopic' mercury droplet using the clock-driven stroboscope and a moving film plate in the setup shown in Fig. 1. The mass of the droplet could be determined from the droplet radius deduced from the sedimentation speed by use of Stokes' formula. 13 Time is increasing from left to right. The stochastic, Brownian motion around the deterministic sedimentation with constant velocity can be clearly distinguished. Image taken from ref. 13. cubic scaling. Today anomalous diffusion typically refers to the power-law form** of the MSD with the anomalous diffusion exponent a and the generalised diffusion coefficient K a of physical dimension cm 2 (s a ) À1 . This is what we refer to in the following, distinguishing subdiffusion (0 o a o 1) and superdiffusion (a 4 1). The conditions assumed by Einstein in his derivation of the diffusion equation are (i) the independence of individual particles, (ii) the existence of a sufficiently small time scale beyond which individual displacements are statistically independent, and (iii) the property that the particle displacements during this time scale correspond to a typical mean free path distributed symmetrically in positive or negative directions. These assumptions, by the help of the central limit theorem, a forteriori lead to the Gaussian PDF (2) and thus to the diffusion equation (1). The model described by Einstein may therefore be viewed as a random walk or drunkard's walk, a concept introduced in the same year 1905 by Karl Pearson in his famed letter to Nature. 30 The connection of the diffusion law to the random walk process was rendered more precisely by Smoluchowski. 4 In anomalous diffusion processes, at least one of these fundamental assumptions is violated, and the strong convergence to the Gaussian according to the central limit theorem broken. In particular, by departing from one or more of the assumptions (i)-(iii), we find that there exist many different generalisations of the Einstein-Smoluchowski diffusion picture. Here we examine the properties of several popular and widely used anomalous diffusion models which are important for the evaluation and physical interpretation of single particle tracking data. These models are conceptually very different from each other, their only common ground being the non-Brownian form (8) of the MSD. We pay particular emphasis on their ergodic behaviour, that is, whether within the model the ensemble averaged MSD (8) has the same form as the time averaged MSD (6) or not, an important information for the evaluation of single particle tracking time series in terms of physical theories, which are usually formulated in terms of ensemble averages. We also check the ageing properties of the processes, that is, the potential dependence of physical observables such as the MSD on the time span between initialisation of the system and the start of the measurement. Both ergodicity breaking and ageing of a process are two sides of the same medal and are intimately connected to the (non-)stationarity properties. 31,32 As we will see, numerous experiments using the above techniques demonstrate the non-Brownian diffusion of tracked biological cells as well as of tracer particles inside those very cells. Similarly, anomalous diffusion is often revealed for the motion of passive particles in complex liquids. One of the breakthroughs came with the study of Golding and Cox, who used single particle tracking of labelled messenger RNA (mRNA) molecules of some 100 nm in size in living bacteria cells to demonstrate that the motion of the molecule is subdiffusive, 33 shown in Fig. 4. Even more interesting and thoughtprovoking was the fact that the time traces d 2 ðDÞ of individual trajectories showed a massive scatter of amplitudes, similar to those shown in Fig. 8. The question whether this behaviour could be due to the intrinsic non-ergodicity of the anomalous diffusion performed by the mRNA molecules was in fact one of the ignition points for the research presented herein. We note that non-ergodicity in the sense discussed in the following is not restricted to the spatial diffusion of particles, but similar principles hold for certain processes revealing non-exponential dynamics, such as the blinking behaviour of individual quantum dots 34 or laser cooling. 35 To physically interpret such measurements we need to understand the time averages of individual time series. As we will see, this requires information beyond the conventional ensemble averages for a variety of anomalous diffusion processes.

A short navigation chart through this Perspective article
The main focus of this Perspective is two-fold. It is meant as an introduction to the theory of anomalous diffusion processes but also as a toolbox for the data analysis of anomalous stochastic dynamics. Consider the experimental results of single particle tracking experiments on fluorescently labelled messenger RNA molecules in a living E. coli bacteria cell shown in Fig. 4. Despite the fact that individual trajectories explore a large portion of the entire cell volume, the amplitude and the local slope of individual molecule traces, apart from the common subdiffusive trend, vary massively. Is this apparent irreproducibility an artefact or the result of the physical mechanism governing the molecule's motion? What is the exact physical origin of the variations in the slope? Will longer measurement times improve the statistics? Can we interpret the time averaged MSD shown here in terms of the ensemble results known for anomalous diffusion? Such questions will be pursued in what follows.
More concretely, this Perspective summarises a large variety of stochastic processes yielding anomalous diffusion of the power-law form hx 2 (t)i C K a t a . We mostly focus on subdiffusion with 0 o a o 1 but also consider superdiffusion characterised by a 4 1 as well as ultraslow diffusion with a logarithmic form of hx 2 (t)i. The fact that we consider this large range of anomalous diffusion processes is the non-universal nature of anomalous diffusion itself. Once we leave the realm of Brownian motion, we lose the confines of the central limit theorem forcing the processes to converge to the Gaussian behaviour predicted by Einstein. For this reason we address the most common processes effecting anomalous diffusion and compare their basic properties. The latter are important for the second purpose of this Perspective, namely, to provide a toolbox for the analysis of anomalous stochastic time series x(t). Quite commonly such analyses of time series from experiment or simulations are performed in terms of time averaged observables, in particular, the time averaged MSD d 2 ðDÞ. We point out that the physical interpretation of the obtained behaviour of such time averages in terms of the typically available ensemble approaches may be treacherous: many of the anomalous diffusion processes discussed herein lead to a disparity between the ensemble and the time averaged observable, for instance, between the ensemble and time averaged MSDs even in the limit of long measurement times. Moreover, it turns out that individual results for time averages such as d 2 ðDÞ appear to be irreproducible, despite long measurement times.
Such strange kinetics 32 was in fact observed in a number of experiments mentioned below. Instead of insufficient statistics, we show that such weakly non-ergodic behaviour reflects the physical nature of the exact mechanisms effecting the observed stochastic dynamics. The degree of the disparity between time and ensemble averaged observables and their apparent irreproducibility differ between the anomalous diffusion processes discussed hereafter. For each time series it is important to identify the exact underlying stochastic process-or combinations thereof-in order to deduce the correct physical behaviour of the system, to obtain meaningful values of fitted parameters, and to predict secondary processes such as rates for diffusion limited reactions. We therefore discuss the behaviour of time and ensemble averaged MSDs and other observables for the different processes. In addition we also address the ageing behaviour of such processes, that is, the dependence of physical observables on the time span, which may elapse between the initial preparation of the system and the start of the measurement.
Conceptually, weak ergodicity breaking was originally introduced by Jean-Philippe Bouchaud for systems, whose phase space is not separated into mutually inaccessible domains as for strong ergodicity breaking. 36 Instead Bouchaud was concerned with systems such as physical glasses in which the exploration time of the phase space is infinite and thus the particle occupancy in subdomains becomes non-ergodic in a single trajectory sense. This is the situation that we will encounter in Section 3. We will see, however, that the situation is somewhat more subtle in that also a number of seemingly simple stochastic processes feature similar weak ergodicity breaking.
The reader may approach this Perspective in two ways. One is to simply read the article sequentially. The second is to select specific sections after consulting the basic properties of the various processes listed in Table 1. In what follows we first concentrate on continuous time random walks in Section 3, starting from the classical Scher-Montroll-Weiss picture and then turning to more recent variants of this model. We then present the properties of the Gaussian models of fractional Brownian motion and the closely connected fractional Langevin equation in Section 4. Section 5 focuses on scaled Brownian motion, while Section 6 is devoted to heterogeneous diffusion processes. In Section 7 we discuss the stochastic motion on a fractal support, and Section 8 covers strong anomalous diffusion processes. Complementary statistical measurables to analyse recorded data are presented in Section 9, before a general discussion in Section 10. The weakly non-ergodic behaviour of the various processes discussed in the text are summarised in Table 1. All important symbols are collected in Table 2.

Continuous time random walks
We begin with the continuous time random walk (CTRW) model introduced by Montroll, Weiss, and Scher. 37,38 It became originally recognised for its successful quantitative description of charge carrier motion in amorphous semiconductors. 37 The CTRW model can be viewed as a direct generalisation of the Pearson drunkard's walk: consider a particle, which starts at the origin. It has to wait for a random waiting (trapping) time t drawn from the waiting time PDF c(t), before it makes a jump to left or right. The length of the jump can also be chosen to be a random variable, dx, distributed in terms of the PDF l(dx). After the jump, a new pair of waiting time and jump length are drawn from the PDFs c(t) and l(dx). An important ingredient of the CTRW process is its renewal character: after each jump values of the new pair of random variables t and dx are fully independent of their previous values. For unbiased CTRW processes, the jump length distribution is symmetric, l(Àdx) = l(dx) such that hdxi = 0.
We distinguish the sub-and superdiffusive versions of CTRWs described in the following subsections. These cases arise depending on whether the characteristic waiting time and the variance of the jump length are finite or infinite, respectively. In the case of diverging moments hti or hdx 2 i the anomalous character of the resulting stochastic process is effected due to the Lévy-Khintchine generalised central limit theorem (Lévy statistics), [39][40][41] according to which sums of independent and identically distributed random variables with diverging moments are stable distributions, compare also Section 8. Simply put, this means the occurrence of power-law tails of the waiting time or jump length PDFs. In particular, we may find non-exponential relaxation patterns and non-Gaussian spatial distributions. 42 Apart from renewal CTRW processes, in this section we also mention two non-renewal versions of CTRWs. Let us first briefly consider the case when both hti and hdx 2 i are finite. In the diffusion limit this process corresponds to that of regular Brownian motion with MSD (3), i.e., a = 1 in eqn (8), where the diffusion constant is defined as K 1 = hdx 2 i/(2hti) in the limiting sense of a random walk. 42,43 Note that apart from the finiteness of the moments hti and hdx 2 i the details of the PDFs c(t) and l(dx) are irrelevant for the diffusive properties of the CTRW process. In this case of normal diffusion we also immediately evaluate the integrand in the time averaged MSD (6). Namely, we know that as long as the lag time D is much larger than the characteristic time hti for a single jump, the average number of jumps during this time span equals D/hti. Thus, the kernel [x(t 0 + D) Àx(t 0 )] 2 in eqn (6) on average is given by Identifying the lag time D with the regular time t in the MSD (8), we indeed find the equivalence  between the MSD hx 2 i and the time averaged MSD d 2 which is expected in an ergodic system. 32,44-47

Subdiffusive continuous time random walks
We first study the case when the jump lengths are sufficiently narrowly distributed such that hdx 2 i is finite. For instance, this could correspond to a Gaussian form for the PDF l(dx) or the motion on a lattice of spacing a, with l(dx) = 1 2 [d(dx À a) + d(dx + a)], where d(Á) denotes the Dirac d-function. Concurrently, successive jumps do not occur at equal time steps but are assumed to face waiting times t, which are distributed in terms of the asymptotic power-law waiting time PDF in the limit t -N with 0 o a o 1, effecting the divergence of the typical waiting time hti. Here the constant t 0 is a scaling factor corresponding to some fundamental time scale of the process. Power-law distributed waiting times following the asymptotic law (14) with 0 o a o 1 are observed directly in various systems. These include the diffusion of tracer microbeads in reconstituted, cross-linked actin networks, 48 the motion of functionalised colloidal particles along a complementarily functionalised surface, 49 or the stochastic pathway of potassium channels diffusing in the plasma membrane of living human cells. 50 Fig. 5 shows the asymptotic power-law behaviour of the waiting times in the potassium channel data from ref. 50. The relatively large value of a E 0.9 is significant, as can be seen from the measurement time dependence in the same system shown Time dependent diffusivity for SBM, K(t) = aK a *t aÀ1 cm 2 s À1 K a * Generalized diffusivity for FBM cm 2 s Àa m a Fraction of mobile traces for CTRW 1 T, T g Absolute and glass temperature K V(x), k B T External potential and thermal energy erg o Strength of external harmonic potential V(x) = mo 2 x 2 /2 s À1 d, d f , d w Spatial, fractal, and walk dimensions 1 R, N A Gas constant and Avogadro's number erg K À1 , 1 Z Friction coefficient s À1 Z*, g* Noise amplitude and generalised friction coefficient in FLE g s À1 , g s Àa 1/k Relaxation time to stationary solutions s n(t) and N(t) Counting process and number of jumps 1 } a ðtÞ Probability density of first passage times s À1 Laplace transformation of the function f (t) G(y) Gamma function l a (x) L a,0 (x) One-sided and symmetric Lévy-stable law Infinite density of time averaged velocity Mittag-Leffler and Kummer function g C 0.5772 Euler constant in Fig. 7. In a fashion similar to the above experiments, powerlaw distributed waiting times characterise the interruption in the motion of tracer particles in weakly chaotic flows representing persistent sticking to invariant surfaces (stable islands, Cantori), 51 and power-law transition times are often measured for the on-off times of blinking quantum dots. 34 Such information can be retrieved from single particle observations, 32 while for the more common measurements of particle ensembles, only indirect evidence for scale-free CTRW dynamics is possible, notably in the seminal study reported by Scher and Montroll. 37 To identify CTRW dynamics or any other stochastic mechanism in a given set of data without having the possibility to trace individual test particles, complementary measures need to be applied, see Section 9.
In the theory of CTRWs one can readily show that the MSD with waiting time PDF (14) is subdiffusive and governed by eqn (8), where the generalised diffusion constant is defined via 42,43 What is the dynamic equation connected to this CTRW process? On the stochastic level, the regular Langevin equation dx(s)/ds = x(s) driven by the white Gaussian noise x(s) is augmented with a second equation subordinating the number of steps s to the real process time t. [52][53][54][55] After averaging over the noise, in the diffusion limit we obtain the fractional diffusion equation, 42,56 where we introduced the Riemann-Liouville fractional operator defined by 57,58 In the limit a = 1 we recover the normal diffusion eqn (1). The fractional diffusion eqn (16) can equivalently be formulated in terms of the Caputo operator. 58 In eqn (17) we see that the process is dominated by the slowly decaying memory given by the integral over the power-law kernel. In the presence of an external potential, the dynamics is described in terms of the fractional Fokker-Planck equation. 42,59 This fractional Fokker-Planck equation fulfils a generalised form of the Einstein-Stokes relation as well as linear response. 42,59 We note that reactions in such a subdiffusive setting are discussed in ref. 60. An interesting generalisation to evanescent CTRW subdiffusion was discussed recently. 61 As illustrated in Fig. 6, during the evolution of the process longer and longer waiting times emerge. Due to the lack of a characteristic time scale of c(t), extreme individual waiting times t arise which are of the same order as the measurement time. In particular, there is no longer a scale hti separating a single or a few jumps from the limit of many jumps. This effects a disparity between the MSD and the time averaged MSD, the so-called weak ergodicity breaking 31,32,36,62-67 More specifically, as this subdiffusive process is no longer selfaveraging such as the normal Brownian motion, to be able to obtain analytical results we introduce the additional averaging over sufficiently many (more correctly, N -N) individual trajectories. From a data analysis point of view, this procedure ensures smooth curves for d 2 D E as function of the lag time D.
Using the known fact that for subdiffusive CTRWs the average number of jumps from t = 0 up to time t scales like hn(t)i C t a , it is quite straightforward to show that in the limit D { t the result for the time averaged MSD is 32,44,63,64  This is the first example of weak ergodicity breaking that we analyse in the following. Remarkably, the linear lag time dependence is different from the t a -scaling of the MSD (8).
Simultaneously, the length of the time series (measurement time) t occurs explicitly in expression (20). The latter echoes the ageing dependence of the subdiffusive CTRW process, to be addressed shortly in more detail: the longer the process lasts, the smaller the time averaged MSD becomes. Physically, this corresponds to the above observations that for the scale-free waiting time distribution (14) longer and longer trapping times occur, stalling the progress of x(t). The linear form (20) of the time averaged MSD was shown for the subdiffusive motion of lipid granules in living yeast cells, 68 and the ageing dependence d 2 ' 1=t 1Àa was observed for the motion of insulin granules in the cytoplasm and of potassium channels in the plasma membrane of living human cells. 50,69 For the channel motion in the plasma membrane the data are shown in Fig. 7. † † Note that the linearity of the time averaged MSD (20) for subdiffusive CTRWs may be deceiving: when such a linear scaling is observed for the time averaged MSD in experiments, it may easily be concluded that the observed process is normal.
Without testing other quantities, such as the dependence of d 2 on the measurement time t, such a conclusion may in fact be wrong. Note that in the limit a = 1 eqn (20) reduces to the ergodic Brownian form, independent of the trace length t.
Another a priori surprising result is that for confined subdiffusive CTRW motion the time averaged MSD does not converge to the thermal plateau. Instead, after engaging with the confinement it continues to grow in the power-law form 45 where the prefactor involves the first two moments of the Boltzmann distribution, of the confining potential V(x). The normalisation factor is given in terms of the partition function Z ¼ Only in the Brownian limit a = 1, we observe a turnover from the free diffusion behaviour to a plateau with ). ‡ ‡ The behaviour (21) was found from simulations in ref. 70 and experimentally corroborated from optical tweezers tracking data in ref. 68.
Each individual, sufficiently long time series of this subdiffusive CTRW process is characterised by a number of extremely long waiting times. However, in each realisation different numbers and lengths of such waiting periods occur. This gives rise to the fact that time averages remain random even for very long averaging times, and time averaged physical observables are thus irreproducible. 45,64,73 This situation is shown in Fig. 8, where the simulation data show variations in the slope in individual d 2 traces as well as a distinct amplitude scatter between different d 2 . Qualitatively these are reminiscent of the observations made in the experiments of Golding and Cox. § § 33 Similar amplitude variations are observed in the aforementioned single particle tracking studies in living cells 50,68,69,71 and in the simulation of associated water in the vicinity of lipid bilayers. 72 We quantify this randomness of the time averaged MSD in terms of the dimensionless variable 64 The associated distribution is 64,73 for sufficiently long measurement times t. Here l a is a onesided, completely asymmetric Lévy stable law with the Laplace image L{l a (t)} = exp(Àu a ). ¶ ¶ 74 Note that the variable x is in the denominator of the argument of l a in eqn (24), and thus moments hx n i exist. In particular, for a = 1/2, the distribution of f(x) is the half Gaussian f(x) = (2/p)exp(Àx 2 /p), whose maximum occurs at x = 0, i.e., in very long trajectories the most likely case is that the amplitude d 2 vanishes. For a fully ergodic process  (20), as the observed stochastic motion has an additional noise source. 50,69 ‡ ‡ Note that the universal factor 2sin(pa)/[(1 À a)ap] varies between 2 (for the limits a -0 and a -1) and 8/p E 2.55 (for a = 1/2). § § As argued in ref. 64 the fact that the average slope of d 2 ðDÞ is smaller than unity in the data of ref. 33 may be due to the fact that the motion of the RNA in the cell is confined.
with a = 1 the distribution has the sharp form f(x) = d(x À 1), which implies that individual trajectories are completely reproducible and there is no scatter in the relative amplitude x around the ergodic value x = 1. The variance of the amplitude fluctuations of d 2 is measured in terms of the ergodicity breaking parameter 64,75 which monotonically varies from EB = 1 for a -0 to EB = 0 for a -1. The latter mirrors the ergodic behaviour found for Brownian motion by Nordlund. How can scale-free forms of the waiting time distribution (14) come about? Scher and Montroll explained this in terms of energetic traps in a quenched energy landscape as that shown in Fig. 9. When the depths of individual energy wells are exponentially distributed, the motion of a particle on this landscape is dominated by individual thermal escapes from these traps characterised by the Kramers/Arrhenius activation, only to be trapped again in the next well. As the motion of the particle progresses, it typically encounters ever deeper wells, effecting the subdiffusive behaviour. 39 This scenario indeed gives rise to the waiting time PDF (14). 88 37,76 The correspondence of the Arrhenius-type escape in the energy landscape to power-law waiting time distributions renders the CTRW an extremely successful mathematical model for the description of the mechanisms of anomalous diffusion in a number of important physical systems.
Alternative sources for the power-law form (14) are spatial traps. One example is the Havlin-Weiss comb model originally designed to mimic the spatial trapping of particles in the dangling ends of a percolation cluster: 77 a particle moves along the x axis but can get transiently trapped in perpendicular onedimensional channels, a structure similar to the teeth of the comb. As the returning probability scales like t À1/2 , the effective motion along the x axis is governed by the waiting time PDF (14) with a = 1/2. This exponent is indeed observed in typical experiments of tracer dynamics in subsurface aquifers and may correspond to the trapping of tracer molecules in thin cracks off the main water artery. 78 Further examples of systems leading to a power-law form of c(t) are dynamic maps [79][80][81][82][83][84] and the sticking of tracer particles around stable islands of weakly chaotic systems.*** 51,85 Ageing effects of the subdiffusive CTRW. Power-law distributed waiting times lead to ageing in a wide variety of systems. Suppose you observe the on-off blinking of a single, illuminated quantum dot between a light emitting and a dark state. 34 While such an experiment will show many rapid transitions between the on and off states, occasionally very long on or off periods will appear. Over a sufficiently long observation period t, the duration of these long events typically increases with t. 34 A similar effect is observed for the motion of potassium channels  (20), local deviations of the slope are visible. Moreover, the variation (scatter) of the amplitudes between different time traces is distinct. Both effects are due to the occurrence of long waiting times between jumps, due to the scale-free nature of c(t), eqn (14).  39 We show a realisation of the trapping landscape with exponential distribution p(E) = T g À1 exp(ÀE/T g ) of trap depths E, where we set the Boltzmann constant to unity. The system specific ''glass'' temperature T g (here, T g = 1) sets the width of p(E). We also include a unit tilting force. On its way through this landscape, a random walker successively falls into traps and faces escape times given by Arrhenius' law t C exp(E/T), where T is the temperature of the thermostat. Due to the tilt, revisits to a given trap are unlikely, and the model thus corresponds to an annealed (biased) continuous time random walk with power-law waiting time PDF (14) with a = T/T g . When the system temperature is below T g , subdiffusion is effected. 88 More precisely, the quenched trap model in one and two dimensions leads to correlations when the particle revisits traps. The resulting process is thus different from the annealed subdiffusive CTRW. Such correlations can be avoided when the process is embedded in three or more dimensions, as now the random walk is transient. Alternatively, an additional bias as shown in Fig. 9 can be used to eliminate correlations. The concept of trapping landscapes was extensively broadened by Bouchaud to introduce ageing and weak ergodicity breaking in glasses. 39,76 *** In the latter example of weakly chaotic systems the waiting times interrupt ballistic phases, so that the overall motion is anomalous with exponent 1 o a o 2, see Section 3.5. in the plasma membrane of living cells 50 or for the diffusion of submicron tracers in a cross-linked actin mesh: 48 longer and longer immobilisation periods occur in the course of the measurement. We already alluded to this phenomenon in the discussion of the CTRW trajectory in Fig. 6 and from Fig. 7. Such strongly non-stationary, out-of-equilibrium behaviour is indeed well known from glassy systems, 76,86 a field in which the term ageing was originally coined. Subdiffusive CTRWs are nonstationary, as we could see from the explicit dependence of d 2 on the measurement time t in eqn (20), and they have been proposed to capture the observed dynamics in glassy systems. 87 What if we start to probe such an ageing system only at some (ageing) time t a after the system was initially prepared at time t = 0? As the subdiffusive CTRW evolves, on average longer and longer waiting time events appear in the trajectory (compare the time trace in Fig. 6). When we start to observe the particle we will typically find it within one of the extremely long waiting periods. The occurrence of the first step in this scenario is no longer determined by the statistics of the waiting time distribution c(t) from eqn (14) but occurs at the forward waiting (recurrence) time t 1 (see Fig. 10), which is governed by the PDF 73,88-90 At longer ageing times (t a c t 1 ), the scaling h(t 1 ,t a ) C t Àa 1 in terms of t 1 of the forward waiting time PDF is thus broader than that of the original waiting time PDF (14), c(t) C t À1Àa . Due to the memory of the CTRW process-directly visible in the fractional diffusion eqn (16)-strong ageing persistently affects the time evolution of the process.
The behaviour of the MSD for free, unconfined CTRW motion in this ageing scenario then exhibits the crossover 73,89 which for strong ageing (t a c t) shows an apparent linear scaling with time t. Only when the process evolves for much longer than the original ageing time, t c t a , the scaling with t reflects the subdiffusive nature of the process.
Remarkably, the behaviour of the associated time averaged MSD is much more transparent in the presence of ageing. Namely, in the limit D { t the result is 73 where the right hand side involves the non-aged quantity d 2 .
The physical scaling with the lag time D is not affected, and all information on the ageing enters as the ratio t a /t into the universal algebraic prefactor As shown in ref. 73, this ageing depression also occurs for other time averaged physical observables. In contrast to the ensemble average, the scaling with D is the same for any ageing time t a , and thus time averages are in that sense more suitable measurables for aged systems. Finally, we remark that in the limit of strong ageing t a c t c D, we even obtain an equivalence of the limiting behaviour of the regular MSD and the time averaged MSD, 73 Another important lesson to be drawn from ageing renewal theory 73,89 is the fact that in a growing fraction of trajectories no jump occurs within the observation window from the ageing time t a to t a + t. It was shown in ref. 73 that the probability to have a nonzero number of steps during this observation window decays as m a C (t/t a ) 1Àa for strong ageing t a c t. Concurrently the discrete probability for not moving at all during the observation grows. Observing a series of individual particles, one therefore finds a population splitting into mobile and immobile particles. 73 This fact is important when one wants to extract the amplitude-for instance, the anomalous diffusion coefficient K a -from a given set of time averaged data. 73 Notably, also the scatter distribution f(x) is significantly altered. 73 The splitting into mobile and immobile fractions may be underlying the experimentally observed population splitting in molecular biological systems. 91 Fig. 11 demonstrates for the same number of simulated trajectories how ageing suppresses the mobile fraction of particles within the observation window t a . . .t a + t. 73 Ageing also affects other quantities of the subdiffusive CTRW process, for instance, the first passage behaviour. Thus, for unbiased subdiffusion on a semi-infinite domain the density of first passage times acquires the three distinct scaling regimes 92 with the time scale t* C t 1+a/2 a K 1/2 a /x 0 containing the initial distance x 0 between the particle and the absorbing boundary. As expected, as long as ageing is weak, the scaling of the first passage time PDF with t features the exponent (À1 À a/2) of the non-aged system. 42,93,94 However, as ageing becomes  (14) is started at t = 0. Events (jumps of a random walker or on-off transitions of a blinking quantum dot) are symbolised by the blue impulses. When we start to follow the system's dynamics after the ageing period at t a , the forward waiting time until the first jump occurs is t 1 . more severe, there exists a competition between the magnitudes of the ageing time t a and the measurement time t, with the intermediate scaling exponent (À1 À a) and the fully aged exponent (Àa). In particular, the intermediate scaling is steeper than for both the non-aged and the fully aged system. This observation, in principle, offers the possibility to determine the age t a of the system from observation of the first passage behaviour, albeit sufficiently many and long trajectories are needed to evaluate } a ðtÞ. 92 We so far discussed the case with 0 o a o 1, when the characteristic waiting time hti diverges. What happens when hti is finite but the fluctuations around it diverge, i.e., for the case 1 o a o 2? It can be shown that indeed the process in many facets is different from the naively expected Brownian behaviour. Instead, different ageing features characterise the process. 95,96 To fully understand the consequences on quantities such as the time averaged MSD d 2 ðDÞ or its amplitude scatter, more work is needed.

Noisy continuous time random walks
In the subdiffusive CTRW the particle becomes fully immobilised with respect to the co-ordinate x(t) in between successive jump events (see Fig. 6). For charge carriers in an amorphous semiconductor or for tracer particles firmly stuck to much larger objects or solid surfaces this assumption appears to be reasonable. However, imagine a submicron tracer particle in a cross-linked network consisting of semi-flexible actin filaments. 48 In this case the particle is stuck in cages for waiting times distributed like the power-law (14). The actual trajectory shows distinct fluctuations of approximately constant amplitude around a mean location. 48 This behaviour stems from the thermal nature of the system, that is, the cages in the mesh are typically somewhat larger than the tracer particle and/or the actin filaments making up the mesh are themselves subject to thermal agitation, compare the results of recent simulations of tracer motion in a flexible gel. 97 In the noisy CTRW the superimposed noise is combined with the fully immobilised periods of the native CTRW. 98 This model is therefore relevant for the quantitative description of the stochastic particle motion in a large range of systems. In particular, a detailed analysis of recorded data in terms of the noisy CTRW may unveil an underlying power-law waiting time despite the fact that no clear stalling events feature in the measured trajectory.
In the above scenario of approximately constant amplitude noise around the horizontal immobilisation periods in the x(t) diagram, it is a natural choice to add Ornstein-Uhlenbeck noise in the position space to the native subdiffusive CTRW process (see Fig. 12). For the MSD this leads to the additive terms 98 where the first term represents the contribution of the native CTRW. The second term contains the noise strength Z 2 D made up of the diffusivity D of physical dimension cm 2 s À1 and the empirical noise amplitude Z. This dimensionless noise strength should not be confused with the friction coefficient used earlier.
Moreover, k is an inverse time scale governing the relaxation of the Ornstein-Uhlenbeck process to stationarity. The Ornstein-Uhlenbeck component in the MSD (33) after the time scale 1/k becomes merely an additive constant, whose relative amplitude becomes progressively smaller compared to the first term. The effect on the trajectory itself is displayed in Fig. 12: for increasing noise amplitude the stalling periods of the native CTRW become more and more masked and resemble the experimental trajectories of the submicron tracers in the semi-flexible polymer network. 48 The results for the MSD are shown in Fig. 13. The associated time averaged MSD becomes 98 in absence of ageing (t a = 0). In contrast to the MSD (33), the time averaged MSD (34) contains the factor t aÀ1 in the first term representing the native CTRW contribution, while the amplitude of the noise in the second term on the right hand side is independent of t. While for small noise amplitude Z the observable d 2 will essentially be indistinguishable from the native CTRW, for larger Z we observe a distinct crossover behaviour for d 2 . Namely, for shorter lag times the time averaged

MSD shows contributions from both the native CTRW and the
Ornstein-Uhlenbeck noise. Writing At longer lag times, solely the native CTRW contribution is visible and D app C K a t aÀ1 /G(1 + a).
In between these two regimes, a crossover behaviour is observed, as shown in Fig. 13. However, when the measurement time t is much longer than the lag time D, the Ornstein-Uhlenbeck term is dominant. Again the time average has a clear advantage over the ensemble average, as it reveals additional detail of the behaviour.
A different scenario can also be envisaged. 98 For instance, when the observer is interested in the motion of a tracer inside a living cell and the attachment of the cell to the cover slide in the microscope turns out to be broken, the data will show the additional Brownian noise stemming from the random cell motion superimposed to the anomalous motion with respect to the reference frame of the cell. In that case the MSD reads 98 which exhibits a turnover from the subdiffusive scaling with t a to the linear Brownian growth Ct in the long time limit. The associated time average is always linear, 98 We note that the superposition of Poissonian and non-Poissonian noise was also discussed in a biologically inspired reaction rate model. 99

Ultraslow diffusion of continuous time random walks in an ageing environment
The subdiffusive CTRW process discussed so far is a renewal process. That is, after each step the waiting time t is randomly chosen from the same PDF c(t). Physically, this corresponds to an annealed environment. 39 More formally, one can view this process as if the random walker carried his own clock around whose random ticks trigger the occurrence of the jumps. As we discussed above, if we want to describe an aged subdiffusive CTRW initiated at t = 0 and evolving for the ageing time t a , the statistics for the occurrence of the first jump in an observation beginning at t a is modified. The first jump occurs after the forward waiting time t 1 , which is distributed with the probability density function (26). All subsequent waiting times are then again drawn from the standard law (14).
Here we consider a different scenario, in which each jump event depends on the present age of the system. Imagine a toy scenario for a rupture model, in the spirit of Zener's famous work on stress relaxation in solids: 100 a crack in a twodimensional material is propagating along the discrete n axis, as sketched in Fig. 14. The crack is represented by the bold black zig-zag line. As indicated by the arrow, this crack has just propagated from site n to n + 1. Crack propagation is triggered by the red circles (the 'vacancies'), which diffuse along the perpendicular x axis. When the vacancy at site n hit the x = 0 line the crack tip was allowed to extend to site n + 1. To propagate to site n + 2, the vacancy at n + 1 has to diffuse to x = 0, etc. If the vacancies can only diffuse along a finite interval of length l on the x-axis, they return to x = 0 on time scales t x C l 2 . This t x then is the average time for the crack propagation from one site to the next, and we will find the crack propagation law hn(t)i B t/t x .
What happens if the length l becomes very large and the vacancies can venture far away? As known from the theory of comb models, 77 the probability density of return to x = 0 is of power-law form, proportional to t À1 À 1/2 . Typically, when the tip of the crack reaches a new site, the vacancy will be away from x = 0, and the triggering event for the crack propagation to the next site then corresponds to the forward waiting time t 1 distributed according to eqn (26) with a = 1/2. In contrast to the previously discussed renewal ageing CTRWs, however, the next propagation step of the crack tip again occurs with the forward waiting time, characterising the arrival of the next vacancy at x = 0, and so forth. In other words, every step occurs Fig. 13 Ensemble averaged (top) and time averaged (bottom) MSDs for the noisy CTRW process. 98 In each case we present results for a = 0.5 and a = 0.8 for three different noise strengths. The trajectory lengths are t = 10 5 , and the symbols represent an average over 10 3 trajectories. with the forward waiting time t 1 . The probability that the tip arrives at an extremely long forward waiting time t 1 increases considerably. † † † This fact significantly alters the dynamics of the process. Formally, this scenario corresponds to a random walker, which is updated by stationary, site-specific clocks.
If we generalise the CTRW model and consider a process in which every jump occurs with the waiting time PDF (14) with general 0 o a o 1, it can be shown that the crack propagation dynamics is reduced to the much slower logarithmic law 101 where m = ÀG 0 (a)/G 0 (a) À % g in terms of the complete G function and its derivative G 0 , % g = 0.5772. . . is Euler's constant, and t 0 is a cutoff time to avoid divergencies at t = 0. The counting process n(t) is deterministic in the sense that the relative fluctuations albeit slowly, decrease during the progress of time. 101 For a 4 2 the process is normal and statistically equivalent to a Poisson update, which is equivalent to the above scenario with finite length l for the vacancy diffusion leading to the linear time dependence hn(t)i C t. However, similar to our observations above, the case with a finite characteristic update time hti but diverging variance of waiting times with 1 o a o 2 displays the power-law anomaly hn(t)i C t aÀ1 . 101 Interestingly, the time average over the time series n(t), is linear in the lag time D, in analogy to the results (20) for the regular renewal subdiffusive CTRW process. The inverse dependence on the measurement time t with the logarithmic correction observed here, in a rough way can be viewed as the a -0 behaviour of the power-law relation in eqn (20). Above we constructed the crack propagation model such that the motion of the tip is fully biased and each step is directed to higher n values. What if we interpret the update rule for the counting dynamics n(t) as jumps of a random walk process in real space? To avoid correlations when the random walker revisits the same spatial point and its next update is governed by the same clock as during the previous visit, in analogy to the discussion of the quenched trap model we could include a spatial bias of the random walk. Alternatively, we could embed the random walk in three dimensions. Due to the transient nature of this process, revisits are significantly reduced, and the MSD of the walker is then proportional to hn(t)i, while the corresponding time averaged MSD scales like nðDÞ D E . Such a random walk process thus exhibits weakly non-ergodic behaviour.
The random walk process in an ageing environment corresponds to a non-renewal process in dimension one and two. In dimension three it is a renewal process, however, here the waiting time distribution (14) is replaced by the PDF of the forward (recurrent) waiting time. In other words, due to the logarithmic nature the process, eqn (37), can be shown to be governed by the limiting distribution for the product of independent random variables, the log-normal distribution. 101 This approach may thus be of relevance to a large range of applications in which this distribution is identified. 102 In the regular, renewal subdiffusive CTRW ageing affects the statistics of the first jump, given in terms of the forward waiting time t 1 . All subsequent jumps occur with the regular waiting time PDF (14). The system remembers the first step, due to the slowly decaying memory inherent to the process, seen in the non-local time operator of the associated fractional diffusion eqn (16). Once the process time exceeds the ageing time significantly, i.e., t c t a , the ageing effects are no longer visible. ‡ ‡ ‡ In the non-renewal scenario discussed here the system has a high likelihood to encounter atypically long waiting times at every step and thus every single step includes ageing. This causes the massive retardation of the motion, giving rise to the emerging logarithmic law. Such time dependencies occur in a large variety of systems, inter alia, the crumpling of paper, 103 compactification of grains, 104 or record statistics. 105 Recently, it was shown that the long time behaviour of a tracer particle in a single file system, in which individual particles repel each other and may stick to a functionalised channel with powerlaw waiting times, can indeed be described in terms of the logarithmic time dependence derived here within the nonrenewal ageing process. 106 Other ultraslow diffusion processes. In the theory of stochastic processes, the logarithmic time evolution has a prominent representative, namely, Sinai diffusion. 107 In this special case of Temkin's model, 108 the random walker moves in the quenched energy landscape created by a seed random walk. Thus, locally the walker experiences a force of the same amplitude, randomly to the left or the right. The walker can become trapped significantly when the bias in a number of adjacent sites points in the direction of the walker's current location. To get to a distance x from its starting point the particle needs to cross an energy barrier of the typical order ffiffiffi x p , corresponding to an activation time scale t ' t 1 exp c ffiffiffi x p ð Þ, where t 1 is a fundamental time scale and c a dimensional constant. The typical distance covered by the walker during time t then scales according to † † † Remember the fact that for long ageing times the probability density function (26) of the forward waiting time decays with the power Àa and is thus significantly broader as the regular waiting time density (14). ‡ ‡ ‡ This is true for the ensemble averaged MSD (27) as well as for the corresponding time averaged MSD. In the latter, the ageing depression L(t a /t) converges to unity, compare eqn (28) and (29). the ultraslow, logarithmic law x 2 C log 4 (t/t 1 ), 39 compare also the discussion in ref. 109 where the tilde denotes the disorder average. Interestingly, also here the time averaged MSD increases linearly with the lag time and exhibits a strong sensitivity to the measurement time.
A generalisation of the Sinai model with strongly correlated disorder 113 and a periodic Sinai model 111 were reported recently. The splitting probability of the Sinai model is determined in ref. 112.
In terms of a renewal CTRW ultraslow processes can be established by using a waiting time PDF of the form c(t) C 1/(t log 1+g t), 110,[114][115][116] which is normalised but does not possess finite moments of any power ht q i with q 4 0. It produces an MSD of the form i.e., for g = 4 the MSD scales identically to that of the Sinai diffusion. The weakly non-ergodic behaviour of ultraslow CTRWs is analogous to eqn (42) for Sinai diffusion, apart from the general exponent g and the prefactor, The time averaged MSD, the localisation of the diffusion particle, as well as the ergodic properties of both Sinai and ultraslow CTRW diffusion are analysed in ref. 110, discussing some of the fundamental differences between time averages recorded in annealed versus quenched environments. Finally, ultraslow diffusion can also be effected by iterative dynamics maps, as shown by Dräger and Klafter. 116 Instead of the power-law maps with a single exponent z discussed in ref. 79, however, ultraslow diffusion emerges when an entire hierarchy of exponents is considered. In very dense two-dimensional lattice gas systems, ultraslow diffusion emerges, as well. 117

Correlated continuous time random walks
Another way to break the renewal character of the standard CTRW process is to introduce correlations between successive waiting times. Correlations appear naturally in the motion behaviour of higher animals or humans, or in the dynamics of financial markets. They are also present for particles diffusing in quenched disorder, compare the above discussion of the quenched trap model or Sinai diffusion. It is therefore consequent to consider non-renewal CTRW processes with built-in correlations. This can be achieved by extension of the subordination of the physical time to the number of steps of the process. 118,119 An alternative approach is the following.
Assume that successive waiting times are correlated in a way that waiting time t i is given by waiting time t iÀ1 modified by a small increment, dt i , that is, t i = t iÀ1 + dt i . The increments dt i may be positive or negative. Successive waiting times are thus correlated: a short waiting time is followed by a similarly short one, and vice versa for a long waiting time. This approach corresponds in fact to a random walk in the space of waiting times, and we can write the current waiting t i time as the sum of increments [120][121][122][123] The absolute value occurs here as waiting times always have to be positive. These increments dt i are then chosen to follow a given probability distribution. We may, for instance, consider the symmetric Lévy stable law defined in terms of its Fourier transform as exp(Àc g |k| g ). The process can then be shown to produce a power-law MSD of the form (8) with anomalous diffusion exponent whose range spans from zero (for g = 0) to 2/3 (for g = 2) and thus leaves a gap to normal diffusion. 120,121 Brownian motion with a = 1 in this model can only be restored by completely breaking the correlations. 120 In the limit g = 2 the mode relaxation is of stretched exponential form, P(k,t) C exp(Àct 1/2 ), and for the range 0 o a o 2 it is of power-law shape, Ct Àa . 122 Fig. 15 compares the CTRW model with correlated waiting times (45) with the regular subdiffusive CTRW. In the correlated case the gradual increase of the waiting times is distinct from the occasional very long waiting times of the uncorrelated model. Also the trajectories of the two models are very different: without correlations, the long waiting times effect distinct immobilisation events, while for the correlated waiting times, the motion appears almost Brownian, albeit with a gradual increase of the waiting times.
In this process the waiting time on average is an increasing function and diverges in the limit of many steps. The correlated CTRW process indeed exhibits weakly non-ergodic behaviour, 122 such that the range of the ageing exponent 1 À g/(1 + g) is in between 1/3 and one. Moreover the process ages, as shown via the decaying response of the process to a sinusoidal driving force. 122 Similarly, when the waiting times are exponentially distributed but the jump lengths correlated the process leads to the exact form 120 of superdiffusive behaviour with the asymptotic cubic Richardson form 26 when the distribution of jump length increments is Gaussian. When the jump length increments are drawn from a Lévy stable law with index m, the MSD diverges. From fractional oder moments one can derive the scaling relation x 2 B t (1+m)/m . For the case m = 2 the weakly nonergodic form of the MSD was obtained exactly, and the expansion is valid for D { t. 120

Superdiffusive continuous time random walks and ultraweak ergodicity breaking
For completeness we also consider superdiffusive renewal CTRW processes. To that end we note that the introduction of a waiting time distribution into a standard random walk process at most leads to a subdiffusive behaviour when the first moment of the waiting time PDF c(t) diverges. Superdiffusion cannot be achieved within the approach of a generalised waiting time concept. There exist, however, two pathways to extend the CTRW model to superdiffusion. The first way is to modify the distribution of jump lengths. All CTRW processes considered so far (apart from the case of correlated jump lengths in the preceding section) correspond to the motion on a lattice, or in continuous space with a jump length PDF that possesses a finite variance hdx 2 i and zero mean hdxi. What if we choose a jump length distribution l(dx), for which the variance hdx 2 i diverges? Consider a Lévy stable form with the asymptotic power-law behaviour l(dx) C 1/|dx| 1+m of the jump lengths with the stable index 0 o m o 2. When the waiting time PDF has finite moments, this process was called a Lévy flight by Mandelbrot. 124 The divergence of the jump length variance translates into the divergence of the second moment of the PDF P(x,t), 125 and only fractional order moments h|x| k i with 0 o k o m exist. 42 The trajectory of a Lévy flight is fractal (see below) of Hausdorff dimension m. A single trajectory therefore never fully covers an embedding space whose dimension is larger than m. This is particularly relevant in the twodimensional world, in which effectively most human and animal motion occurs. There exist also several studies considering the combination of a diverging characteristic waiting time hti with a Lévy stable distribution of jump lengths, either in terms of fractional diffusion equations 126 or via using subordination arguments. 127 Due to its fractality a single Lévy flight trajectory cannot visit all points in space when the stable index m is smaller than the embedding dimension d.
Under confinement to a finite area, Lévy flights are ergodic, 128 and the convergence to the ergodic state can be analysed in terms of the apparent fractal dimension or in terms of the first passage dynamics. 129 The divergence of hdx 2 i as well as the ensuing non-ergodicity of Lévy flights can be rectified by a cutoff in the jump length PDF 130 or by dissipative non-linearities. 131 Such stochastic processes behave like a Lévy flight until the regularisation of the jump length PDF comes into effect.
The alternative approach is to introduce a coupling between jump lengths and waiting times. In subdiffusive CTRWs described previously the waiting time and jump length PDFs enter in the multiplicative form c(dx,t) = c(t)l(dx). 43 Introducing a functional dependence between waiting times t and jump lengths dx, this spatiotemporal coupling preserves the renewal properties of CTRW processes but due to penalising long jumps-associating them with long waiting times-yields a finite MSD. 43,132 The simplest choice is the coupling c(dx,t) = 1 2 c(t)d(|dx| À nt), in which the velocity v is introduced. It bestows a propagating horizon to the process in the form of two travelling d peaks with decaying amplitude. For waiting time PDFs c(t) C t À1Àa with 1 o a o 2, in between these peaks, a Lévy stable distribution is building up. 133 Also Lévy walks are non-ergodic, albeit in a way, that is different from the above discussed non-ergodic behaviour.
To see this, we first recall that for a waiting time PDF of the power-law form c(t) C t À1Àa their MSD scales 134,135 The associated time averaged MSD in the ballistic phase with 0 o a o 1 scales like with a higher order correction scaling with D 2 (D/t) 2Àa . 136,137 In the enhanced diffusion phase 1 o a o 2 the result is 136-138 In both the ballistic and enhanced diffusive phases the MSD differs from the time averaged MSD merely by a factor of 1/|a À 1|. This phenomenon may be referred to as ultraweak ergodicity breaking. 138 Note that an analogous result was obtained by Zumofen and Klafter for Lévy walks with stationary and non-stationary initial conditions, 139 compare the discussion in ref. 138. To leading order, the time averaged MSDs (51) and (52) do not exhibit ageing in the sense that the measurement time t does not appear explicitly, in contrast to the corresponding forms for the subdiffusive CTRW processes discussed above. Further physical properties of Lévy walks, in particular, the amplitude scatter of the time averaged MSD, are studied in ref. 138 and 140 Additional recent studies of Lévy walks analyse their response to an external bias and the power spectral properties. [136][137][138]140 Lévy flights and walks are used as statistical models in many fields, for example, to quantify blind search processes of animals for sparse food. [141][142][143] In the science of movement ecology, the so-called Lévy foraging hypothesis has become widely accepted. 141 Recently this model was qualified for human motion behaviour and when different search criteria and external forcing are considered. 144 These stochastic processes also describe the propagation of visible light in disordered optical media 145 and the dynamics of quantum dots. 34 In optical lattices the divergence of the position of single ions was shown to follow Lévy statistics. 146 For more details compare also Section 7.

Fractional Brownian and Langevin motion
Next to the CTRW model, fractional Brownian motion (FBM) and the motion governed by the fractional Langevin equation (FLE) represent the second major stochastic models for the description of anomalous diffusion processes both in the presence and in the absence of external potentials. Physically, these types of motion arise when we observe the effective motion of a single tracer particle in a coupled many-body system such as a single file of excluded volume particles, see below.

Fractional Brownian motion
We believe that FBMs do provide useful models for a host of natural time series and wish therefore to present their curious properties to scientists, engineers and statisticians, argue Mandelbrot and van Ness in their defining work on FBM. 147 The motivation for this study and Mandelbrot's earlier work 148 were Hurst's laws for the discharge of the Nile and other rivers. 149 In the Russian literature Kolmogorov introduced an analogous process already in 1940 150 which was then analysed further by Yaglom. 151 Mathematically, FBM indeed has curious properties, 147 as it is not a semimartingale and cannot be interpreted in terms of a random walk process. 152,153 Despite the many studies on FBM and its wide application in various fields of science, engineering, and beyond, many fundamental properties of FBM remain elusive, such as the first passage properties. 154,155 At the same time FBM is distinguished by the fact that it is the only self-similar Gaussian process with stationary increments. 147 Note that for notational consistency in the following we use the anomalous diffusion exponent a in the formulation of the fractional Gaussian noise which is related to the Hurst exponent H commonly used in the FBM literature via a = 2H.
Mandelbrot and van Ness define FBM in terms of the stochastic integral 147 where B(t) is ordinary Brownian motion. A more intuitive representation uses the Langevin equation which is fuelled by the fractional Gaussian noise x fGn (t). The latter has a standard normal distribution for any t 4 0 but is power-law correlated, § § § hx fGn (t 1 )x fGn (t 2 )i = a(a À 1)K a Ã |t 1 À t 2 | aÀ2 (55) for t 1 , t 2 4 0 and t 1 a t 2 . The physical dimension of x fGn (t) is thus [x fGn ] = cm s À1 . Due to the factor (a À 1) the fractional Gaussian noise is persistent or positively correlated for the case 1 o a o 2, and it is antipersistent or negatively correlated for 0 o a o 1. The PDF for free FBM is given by the Gaussian The position autocorrelation of FBM is and reduces to the MSD (8) for t = t 1 = t 2 . From this quantity we can obtain the time averaged MSD 156 showing that FBM is ergodic in the sense that is equivalent to the MSD (8). As the process is self-averaging, for sufficiently long measurement times we even obtain the single trajectory equality, formally, where t Ã is a binning time scale. Fig. 16 shows the reproducibility of unconfined FBM. Note that the scatter for longer lag times Dt is due to insufficient statistics of the time average.

Fractional Langevin equation motion
The FLE for the position co-ordinate x(t) of a particle with mass m is written as 157,159-161 where g* is a friction coefficient of physical dimension [g*] = g s Àa and x fGn (t) represents the fractional Gaussian noise (55) with 1 o a o 2.
The FLE (61) is a special form of the Hänggi-Kubo generalised Langevin equation 160,162,163 driven by noise, which is not white but correlated. In contrast to the d-correlated white noise encountered in the Langevin eqn (5), that is, the correlation function of the noise at time t explicitly depends on past times t 0 o t with a given weight function. Concurrently, the friction term becomes a convolution integral such that the noise kernel balances the non-local noise to fulfil the generalised fluctuation dissipation relation. 160,162,163 Such equations with memory arise in the Mori-Zwanzig projection operator framework. 164 The FLE corresponds to the special case for which the noise autocorrelation is given by the power-law decay of the fractional Gaussian noise x fGn (t). The friction kernel is then equally of power-law form.
The Kubo generalised fluctuation dissipation relation fixes the noise amplitude in the form Typically, in single particle tracking experiments one observes the overdamped motion of the tracer. Such overdamped motion corresponds to neglecting the inertia term in eqn (61), producing the overdamped FLE The convolution integral can be replaced with the Caputo time fractional derivative ¶ ¶ ¶ d 2Àa xðtÞ or the corresponding overdamped FLE. Note that due to the coupling of the friction kernel and the fractional Gaussian noise via the fluctuation dissipation relation a large instantaneous value of the noise couples to a high effective friction. For this reason fractional Gaussian noise with 1 o a o 2 effects subdiffusion behaviour, as shown in eqn (66) and (68). The fact that the friction increases with how strongly we push a substance is indeed an everyday experience when we deal with viscoelastic substances such as toothpaste, honey, or liquid concrete. 165 Poking gently with our finger into toothpaste, it yields like a liquid. If we hit it hard or jump onto the toothpaste tube, the response is that of a very highly elastic substance, causing the tube to explode. The MSD described by the underdamped FLE becomes 166 and is thus ergodic. From the series expansions around z = 0 and N of the generalised Mittag-Leffler function we thus obtain the limiting behaviours of short time ballistic motion, 167 which eventually crosses over to the overdamped subdiffusion with exponent 2 À a for 1 o a o 2.
In this regime, that is, the motion is subdiffusive for persistent noise.
The FLE can be shown to govern the effective dynamics of a tagged particle in a single file 168 or the motion of a monomer in a long polymer chain. 169 The FLE was used to model the internal dynamics of proteins. 170 It occurs naturally for the description of particle motion in a viscoelastic environment, 157 and is related to generalised elastic models 171 as well as hydrodynamic interactions. 172,173 FLE-governed motion was also identified from the motion of individual lipid molecules from large scale simulations of lipid membranes. 174,175 Viscoelasticity controlled subdiffusion was reported for the motion of messenger RNA molecules and chromosomal loci in living E. coli cells. 71,176,177 In ref. 68, the long time motion of lipid granules in living yeast cells was shown to cross over from nonergodic CTRW motion to viscoelastic-type subdiffusion, consistent with observations in a different strain of yeast cells. 20 In complex fluids, viscoelastic subdiffusion was, inter alia, revealed in ref. 179 and 178 Based on microrheology data of endosomes in living cells, 180 stochastic models for active transport in the molecularly crowded cytosol of living cells were recently discussed. These include the viscoelastic nature of crowded fluids 178 in terms of the FLE, and it can be shown that depending on the size of the cargo or the biochemical turnover rate of the motor molecule, normal (hx(t)i C t) or anomalous (hx(t)i C t a with 0 o a o 1) transport can be effected. 181 We finally note that the FLE exhibits dynamic transitions with different critical exponents of the driving fractional Gaussian noise for free and forced motion. 182

Transient non-ergodicity and transient ageing
While we saw that asymptotically the unconfined motion of both FBM and FLE motion is fully ergodic, we mention the following caveat lector for these processes in the confines of an harmonic external potential V(x) = 1 2 mo 2 x 2 of strength o 2 4 0, such that the dimension of o is s À1 .
Namely, consider the subtle difference in the equilibration behaviour between the MSD for FBM, 166,183 which features an exponential relaxation to the stationary value hx 2 i st , and the time averaged analogue 166 The relaxation dynamics of the time averaged MSD d 2 ðDÞ is algebraically slow. Note that the stationary value 166,184 for FBM explicitly depends on the exponent a as the noise is external and thus not coupled to the friction constant, i.e., no fluctuation dissipation relation is fulfilled here. Moreover the factor of two in front of the stationary value hx 2 i st appears in the time averaged MSD (70) due to the very definition (6) involving two times the MSD at times t 0 þ D and t 0 and a decaying cross-term. 166 A behaviour similar to that of eqn (71) is observed for FLE motion. Here, however, the noise is internal, i.e., the fluctuation dissipation theorem is fulfilled. Thus, the thermal value is reached for any a. The power-law relaxation for the time averaged MSD in a viscoelastic environment was indeed observed experimentally by optical tweezers single particle tracking in wormlike micellar solution, 179 as shown in Fig. 17. What about ageing, the dependence of some physical observable on the time span t a between initial preparation of the system at t = 0 and start of the measurement (see Fig. 10)? For regular Brownian motion physical observables are independent of the ageing time t a . In the following cases for the FBM/FLE models, however, we find transient ageing. For both FBM and FLE motion the time averaged MSD splits into two additive terms, 185 The stationary term f st depends solely on the lag time D, while the ageing term f age is an explicit function of t and t a . As long as the initial velocity distribution is not thermal, f age C 1/t for long t. For free FLE motion at sufficiently long D, the stationary term is subdiffusive, f st C D 2Àa . When additionally we are in the strong ageing regime t a c t, the scaling is derived. 185 For confined FLE motion, the ageing term now features the power-law dependence, 185 For confined FBM, however, the ageing term decays exponentially, 185 Thus, the ageing behaviour for FBM is negligible, while for strongly aged FLE motion the transient ageing may be observable under specific conditions.

Scaled Brownian motion
A popular model for the description of anomalous diffusion is that of scaled Brownian motion (SBM), which is based on the time-dependent diffusivity K(t). [186][187][188] The associated Langevin equation with the white Gaussian noise x(t) of unit intensity, hx(t)x(t 0 )i = d(tÀt 0 ), and zero mean then becomes For the power-law form of the diffusion coefficient the MSD of the process is given by eqn (8) with K a = G(1 + a)K a *. Note that while the physical dimension of K(t) is cm 2 s À1 , that of the constant K a * is cm 2 s Àa . In SBM the scaling exponent is allowed to vary in the range 0 o a o 2, so that the process describes subdiffusion as well as sub-ballistic superdiffusion. The associated time averaged MSD can be calculated exactly, yielding 188,189 In the limit D { t, we obtain the linear lag time dependence 189,190 familiar from subdiffusive CTRW processes, eqn (20). However, expression (80) for SBM is valid in the whole range 0 o a o 2. When the lag time D approaches the measurement time t, the limiting form 189 describes a cusp at around D = t: as shown in Fig. 18 the time averaged MSD (81) converges to the value of the MSD hx 2 (t)i. However, while the disparity x 2 ðDÞ ad 2 ðDÞ between the MSD and its time averaged analogue renders SBM weakly non-ergodic in the sense defined above, the amplitude scatter of d 2 ðDÞ around the trajectory-to-trajectory average d 2 ðDÞ measured in terms of the dimensionless variable x is approximately of Gaussian form with a relatively narrow width. For sufficiently long trajectories, that is, the randomness in single trajectories of SBM is deterministically decreased with time and the process becomes practically reproducible. 189 SBM therefore belongs to a non-ergodicity class, that is fundamentally different from subdiffusive CTRW, for which the randomness of time averages is present no matter how long the process is followed. Equipping the Langevin eqn (77) for SBM with an additional external potential force F(x), one can derive the associated Fokker-Planck equation. For the special case of a confining harmonic potential V(x) = 1 2 mo 2 x 2 we find 189 @ @t Pðx; tÞ ¼ @ @x ox þ KðtÞ @ @x Pðx; tÞ; where o = o 2 /Z includes the friction coefficient Z of dimension s À1 . In the limit of unconfined motion, o = 0, the resulting dynamic equation was used by Batchelor 191 to describe the relative diffusion in turbulence as a complementary approach to Richardson's mentioned above (see also Section 6). The forcefree propagator encoded by eqn (82) is exactly that of free FBM, eqn (56), despite the fundamental difference between the two processes. However, in the presence of the confinement, o 4 0, we obtain the MSD hx 2 (t)i = 2K a *t a e À2 ot M(a,1 + a,2 ot) (83) in terms of the Kummer function M(a,b,z). 192 The limiting behaviour at short times t { 1/ o is that of free anomalous diffusion, hx 2 (t)i = 2K a *t a , i.e., when the particle starts in the vertex of the potential it initially moves force-free. At long times, we observe that the motion does not become stationary but the MSD exhibits the scaling law 189 The motion is thus influenced by the effective strength o of the potential. However, as the diffusion coefficient is explicitly time dependent, this implies that the system is characterised by a time dependent temperature, see also the discussion of Fuliński. 188 Alternatively, one could view this as an effect of a time dependent mobility. Clearly this corresponds to a far from thermal equilibrium state. For its interesting behaviour, we mention the associated time averaged MSD in the harmonic confinement. In the limit D { t we find the result 189 where we observe an apparent plateau at D c 1/ o, as demonstrated in Fig. 19, in which we compare the full analytic solution (83) for the MSD and the exact form for the time averaged MSD with results from simulations. In a way, the behaviour is opposite to that of subdiffusive CTRWs, for which we observe the thermal plateau for the MSD and a continuing power-law growth for the time averaged MSD, 45 confirmed by experiment. 68 Experiments observing the apparent plateau (86) for the time averaged MSD of SBM may misinterpret this for a sign of confinement, contradicting the result (83). In view of these results SBM represents a very simple model for sub-and superdiffusive anomalous diffusion. However, its physicality is somewhat questionable for most experimental settings, in which the system is connected to a heat bath, or when the system is stationary. Its non-ergodic properties are certainly interesting, and may be used to model active processes in the superdiffusive range 1 o a o 2. SBM-type dynamics in fact occurs in free granular gases. 289

Heterogeneous diffusion processes
Single particle tracking experiments usually employ relatively large tracers. The above mentioned granules and artificial tracers are all in the range of several hundreds of nanometres in size. [48][49][50]68,69,179,193,194 Considerably smaller tracer proteins were recently employed to sample much larger subvolumes of living cells, producing cell-wide mobility maps. The results show significant and systematic variations in the position-dependent cytoplasmic diffusivity K(x) with a growing distance from the nucleus. 195 Similar approaches using space-dependent diffusivities are routinely used in various fields, for instance, in the mathematical modelling of tracer dispersion in subsurface hydrology. 196 We also recall Richardson's approach to the description of his measurements of the relative dispersion of two tracer particles in a turbulent flow, in terms of a diffusivity depending on the relative tracer position, 26 mentioned in the Introduction.
Diffusion processes with position-dependent diffusivity, K(x), are described by a simple Markovian Langevin equation with multiplicative noise 197 where x(t) represents white Gaussian noise. In the Stratonovich sense, the diffusion equation for this heterogeneous diffusion process (HDP) has the symmetric form @ @t Pðx; tÞ ¼ @ @x A diffusing particle tends to accumulate in regions of low diffusivity. The HDP is fundamentally different from the CTRW approach: standard CTRWs are running off in an annealed environment and thus constitute renewal processes. HDPs represent a deterministic (in contrast to random) quenched environment. The particle, that is, has the same diffusivity K(x) each time it returns to the point x.
Interestingly, despite the different nature of HDPs they share some common features such as weak ergodicity breaking with renewal CTRW processes. Even when the space dependence of the diffusivity becomes annealed, many of these effects are preserved. 198 Let us first consider the power-law forms for the diffusivity. The amplitude K 0 has physical dimension cm 2Àb s À1 . The offset x off in eqn (89) is introduced to avoid either divergencies of K(x) (b o 0) or stalling (b 4 0) of the particle around x = 0 in the simulations. In the analytical calculations we use the bare scaling form K(x) B K 0 |x| b . The HDP based on the diffusivity (89) and d-initial condition for the PDF at the origin has the MSD 197 shown in Fig. 20. It is thus of the generic power-law form (8) with the anomalous diffusion exponent a given in terms of the scaling exponent b from eqn (89) as 197  such that we observe superdiffusion in the range 2 4 b 4 0 and subdiffusion for b o 0. We note that when b approaches the critical value b = 2, the anomalous diffusion exponent a diverges. In that case the MSD assumes an exponential time dependence. The PDF of the HDP with power-law diffusivity (89) is given by the exponential 197 In the subdiffusive range, b o 0, this PDF is of compressed Gaussian shape and exhibits a dip to zero at the origin, that is, it is bimodal. For superdiffusion with 2 4 b 4 0, the PDF is a stretched Gaussian and has a non-differentiable cusp at x = 0. This is opposite to the behaviour observed for subdiffusive and superdiffusive fractional diffusion equations. 42,199,200 For the HDP with the diffusion coefficient (89) we obtain the weakly non-ergodic behaviour 197 show pronounced amplitude scatter around this mean. The initial deviation of hx 2 (D)i from the expected behaviour is due to the offset x off used in the simulations. 197 The fluctuations of can be fitted by a Gamma distribution for both sub-and superdiffusive HDPs. 197 In contrast to subdiffusive CTRW processes the amplitude scatter distribution f(x) decays to zero at x = 0, that is, the process never leads to complete stalling of the diffusing particles during the observation time window. Concurrently, the ageing behaviour of HDPs with power-law form of K(x) was numerically shown to be in good agreement with the form (30) of the ageing depression of the CTRW approach. 201 From a data analysis point of view the similar behaviour of various quantities requires some care not to confuse HDPs from CTRW processes. We note that different forms for the position dependent diffusivity K(x) such as an exponential and logarithmic dependence were studied in ref. 202. The exponential case with K(x) = (K 0 /2)exp(À2x/x*), where x* sets the length scale, leads to the MSD 202 which belongs to the range of ultraslow processes discussed earlier. The associated time averaged MSD becomes 202 and exhibits an interesting square root scaling in the lag time D for initially highly mobile particles, with initial position x 0 on the negative semi-axis. For intermediate x 0 , however, a profound splitting of the tracer populations is observed, exhibiting D 1/2 and D 1 scaling forms of the time averaged MSD, respectively, for highly mobile and rather trapped fractions of particles in the ensemble. This fact is reminiscent of the population splitting effect observed for CTRW processes, Section 2.1. For more details  203 In Fig. 21 we demonstrate how under confinement the ensemble and time averaged MSDs converge to the plateau value where n C 0.6 and 2L is the length of the interval with reflecting boundaries. 201 The convergence to the plateau for both ensemble and time averaged MSDs (with the aforementioned factor of two for the time average) sets the HDP process apart from the subdiffusive CTRW, in which no long time plateau occurs for Finally, Fig. 22 shows the result for a typical trajectory in the recent study of ref. 198, in which a random walker travels on a landscape with randomly switching local diffusivity. As can be seen from the graph, on finer scales the motion appears to be more like normal diffusion, due to the broad distribution of K(x) magnitudes, on a coarser resolution the trajectory appears to be similar to that of a subdiffusive CTRW with a diverging characteristic time scale shown in Fig. 6. It will be interesting to compare these results to the behaviour of stochastic HDPs in annealed and quenched environments. 204

Fractals
Finally we come to the third major model for anomalous diffusion, namely, the transport on a fractal support. Like fractional Brownian motion and Lévy flights, fractals were popularised by Mandelbrot, in whose book The fractal geometry of nature he came up with the epitomised phrase Clouds are not spheres, mountains are not cones, coastlines are not circles, and bark is not smooth, nor does lightning travel in a straight line. 124 Instead, Mandelbrot argues that many natural phenomena are statistical fractals, i.e., objects which, in a statistical sense, do not have a scale and thus one cannot judge at what resolution a picture of the object was taken. Or, in other words, their length (or area, etc.) depends on the applied scale in a measurement, as in the celebrated coastline of the Britain paradox: with a smaller yardstick finer details can be measured and the length of the coastline is larger than when we apply a larger yardstick. Mandelbrot accredits the discovery of this effect to Lewis Fry Richardson. 205,206 We distinguish mathematical fractals with their strict building rules and exact self-similarity from statistical fractals, for which self-similarity is present only in a certain average sense. Let us briefly address these two concepts. First, mathematical fractals are constructed by iteration. For instance, see the Sierpińsi gasket in Fig. 23. An equilateral triangle is divided into four equilateral triangles and the central one removed. This subdivision rule is repeated, ideally an infinite amount of times. From the iteration scheme: scale the original triangle down by a factor of 2 and keep three of the resulting four objects, we obtain the similarity dimension log 3/log 2 C 1.585, which gives the same result as the formal Hausdorff approach. 124,207 We would obtain the same result if we had started by arranging three copies of the original triangle into an object with twice the original edge length. The Sierpińsi gasket is more space filling than a line but does not fully fill an area. In particular, we see that the generated object contains empty triangles on all scales. To come from one given sector to another, a random walker on such a geometry needs to first locate and then traverse narrow causeways, as sketched in Fig. 24. This considerably slows down the particle propagation in the embedding space.
As said, natural objects are not exact mathematical fractals. However, for example, the coastlines of Britain or Norway are statistical fractals: while their shape does not repeat exactly on a smaller scale, their overall length fulfils a scaling law L(e) C e 1Àdf , where e is the length of the yard stick applied to measure the coastline. 124,149,208,209 If the coastline were a perfect line with d f = 1, its length L would be independent of  the length e of the yard stick (resolution) applied for the measurement. However, within a certain upper length scale-of the order of the extension of the country-and a lower length scale-e.g., the finest features appearing on a map-the fractal dimension of coastlines typically differs from unity. Thus, while the South African coastline with d f = 1.02 is almost a perfect line, the West coast of Britain has d f = 1.25, and is thus significantly more ramified. For finer measurement resolutions (shorter yard stick length e) the measured length increases, while coarser measurements (longer yard stick length e) lead to shorter apparent L. For general fractals with a fractal dimension d f embedded in a d-dimensional space, it is often useful to think in terms of the mass of the fractal object, which, on average, grows like M(R) C R df as a function of the radius R.
As by necessity d f o d, the mass density therefore shrinks with R as R dfÀd , as fractals are characterised by 'holes' on all scales (compare the Sierpińsi gasket in Fig. 23).
An important approach to the description of porous or crowded media is the percolation model. In site percolation each point on a lattice is occupied with probability p and remains empty with probability 1 À p. At the critical occupation probability p = p c ( p c C 0.59. . . in two 210 and p c C 0.31. . . in three 211 dimensions for a square and cubic lattice, respectively) the correlation length of the system diverges and an infinite cluster is formed. The percolation cluster then has a fractal dimension d f = 91/48 C 1.896. . . in two 212 and d f C 2.52. . . in three dimensions. 213 A random walker placed on the fractal, incipient infinite cluster allowed to move between nearest neighbour occupied sites performs anomalous diffusion with an anomalous diffusion exponent a = 2/d w related to the walk exponent d w , which is larger than d f . 214,215 According to the Alexander Orbach conjecture d w = 3 2 d f , 215 which is close to experimentally observed values, compare ref. 216. Note that when the averaged motion of random walkers placed on all clusters, a different scaling exponent characterises the MSD, for more details see ref. 217. Fig. 25 shows the critical percolation cluster on a square lattice used for random walk simulations in ref. 218. The results for both the two-dimensional MSD hr 2 (t)i and time averaged MSD d 2 for the motion on the infinite cluster are shown in Fig. 26. Both overlap perfectly, corroborating the ergodicity of this anomalous diffusion process. The straight line in Fig. 26 indicates the expected slope to guide the eye. In ref. 219, the non-Gaussian nature of the diffusion on the critical percolation cluster is analysed. Fractal percolation clusters are often used for simulations of free diffusive processes 220 as well as facilitated diffusion processes 221 in the crowded cytoplasm of living biological cells. A fractal support was also diagnosed to be superimposed onto the subdiffusive CTRW motion for the diffusion of potassium channels in the plasma membrane of living human cells in ref. 50. It is important to note that when we consider the motion on all clusters the motion is a forteriori no longer ergodic: a walker moving on a finite, disconnected cluster cannot explore the entire phase space. 217   While the increments of random walk processes on fractal structures are stationary 31 and the infinite percolation cluster simulations of ref. 218 indicate that diffusion on fractals is ergodic, this point needs further investigation, in particular, for different types of fractals; compare also the discussion in ref. 222. A second open question is what happens in the presence of a topological bias, for instance, a bias away from the backbones 208 of a diffusion cluster. In that case at least transient non-ergodicity would be expected.

Strong anomalous diffusion and infinite densities
So far we have focused our attention on the ensemble averaged MSD hx 2 (t)i and the corresponding time average d 2 . More generally, one may characterise stochastic processes by their fractional moments h|x(t)| q i with q Z 0. Strong anomalous diffusion deals with processes, which in the long time limit satisfy 223 where n(q) is not a constant. For Brownian motion n(q) = 1/2 and for FBM n(q) = a/2 so that these processes do not exhibit strong anomalous diffusion. For unbiased Gaussian processes like FBM the MSD characterises the width of the PDF P(x,t) when the particles start at the origin x = 0. For this reason, from the starting days of the Gaussian central limit theorem the variance of the stochastic process has attracted special attention. Experimentally, information on h|x(t)| q i is used to support or dispute the Gaussian nature of an underlying diffusion process or, more generally, the mono-scaling assumption of a set of data (see below). When dealing with complex transport of particles, for example tracer particles in living cells when periods of active motion contribute to the motion, the spectrum of exponents qn(q) may in fact exhibit non-Gaussian and strong-anomalous statistics. 224 Recent experimental studies on the active transport of polystyrene beads in living cells exhibit a particular type of strong anomalous diffusion. The data analysis exhibits piecewise linear behaviour with with the positive constants q c , m, a, and b. Such a behaviour is sometimes called bi-fractal scaling, as the simplest case of multi-fractal behaviour. More importantly this bi-linear behaviour is widespread and found in many models of non-linear dynamics, 223,[225][226][227][228][229] transport in optical lattices, 230,231 models of transport in disordered Lévy glasses, [232][233][234] and other stochastic models. 88,235,236 The parameters q c , m, a, and b are non-universal and hence give specific information on the underlying model or process. As emphasised by Vulpiani and coworkers 223 strong anomalous diffusion implies the breakdown of mono-scaling theories which predict P(x,t) B t Àn f (x/t n ) in terms of a scaling function f (Á). For example, FBM, FLE, sub-diffusive decoupled CTRW, and fractional diffusion equations, 42,56 do not predict the piecewise bi-linear scaling (98) of the moments, in other words these popular stochastic models considered above cannot describe the active transport found in ref. 224. In experiments a C 0.8 (see discussion below). Roughly speaking, the piecewise linear scaling of the spectrum qn(q) implies that lower order moments q o q c still follow a diffusive, possibly anomalous process if m a 1, while the linear increase of the spectrum qn(q) B q for q c 1 implies a ballistic scaling, since if x scales like t, h|x| q i scales with t q . This implies that the process is actually a mixture of both diffusive process x p t m and ballistic element x p t, and hence characterising the process as an anomalous diffusion process is in some sense misleading. In particular, we should not universally accept the special role of the second moment, beyond the fact that it indicates certain deviations from normal behaviour. A typical situation occurs when P(x,t) for small x scales diffusively (n = 1/2), however, for certain large x the scaling becomes ballistic (n = 1). Here we focus on one stochastic model of strong anomalous diffusion, the Lévy walk model mentioned in Section 3.5. For more details on the mathematical treatment of the following, see ref. 237.
Lévy walks represent a widely applicable model describing superdiffusion. 51,81,[238][239][240][241][242][243][244] In its simplest one dimensional version, a particle starts at the origin at time t = 0, and travels with velocity v 1 , drawn from the PDF F(v). The duration of the travelling event is t 1 which is drawn from the PDF c(t). The position of the particle at time t 1 is x = v 1 t 1 . The process is then renewed, until time t is reached: a new velocity v 2 and waiting time t 2 are independently drawn from F(v) and c(t), and this process is repeated. The position of the particle is then simply xðtÞ ¼ Ð t 0 vðtÞdt. If the PDF c(t) of the flight duration is exponential and the velocity distribution Gaussian with zero mean, we recover the famous Drude model. 245 Ergodic properties of Lévy walks were analysed in ref. [136][137][138], see also Section 2.5.
We assume that the velocity PDF is symmetric-F(v) = F(Àv), such that hvi = 0-and that all moments of F(v) are finite, for instance, a Gaussian PDF. The main ingredients of the stochastic model are the power-law distributed waiting times (14). These can be justified from first principles models or from observations, at least in some systems, compare the discussions in ref. 39, 51 and 241. We here limit our discussion to the case 1 o a o 2 which in turn implies that the average sojourn time hti is finite, however, its variance diverges. Rather generally, it is easy to understand that the MSD is bounded by hx 2 (t)i o hv 2 it 2 . Roughly speaking, the ith jump in the process is given by dx i = v i t i , and due to the assumed power-law PDF of waiting times the PDF of this increment is a symmetric distribution (due to the symmetry of the distribution of v) with long tails The typical number of jumps in the process is N C t/hti. Hence a hand-waving argument yields the PDF of the position of the particle using the generalised central limit theorem. Namely, x ' P N¼t=hti i¼1 dx i with the variance of dx i being infinite, such that we expect that the PDF of particles is given by (in dimensionless units) here L a,0 (x) is the symmetric Lévy distribution, whose Fourier pair is exp(À|k| a ), such that the case a = 2 corresponds to a Gaussian process. Taken seriously, this central limit theorem implies These hand-waving arguments provide the correct scaling of the lower order moments, however, the results for higher order moments, including the second moment, are obviously wrong: the particle cannot travel faster than ballistic motion.888 Our argument is flawed since we have neglected the correlation between the jump sized and time in the problem. A more precise mathematical analysis leads to the result 237 In terms of the parameters introduced in eqn (98), we thus identify m = 1/a, q c = a, a = 1, and b = a À 1. We see that the process exhibits strong anomalous diffusion, 223,237,246 the MSD exhibits enhanced diffusion hx 2 i B t 3Àa which is faster than normal but slower than ballistic, 1 o 3 Àa o 2. The lower order moments q o a exhibit Lévy scaling, i.e., what we call anomalous diffusive scaling. In contrast, the higher order moments are ballistic in the sense that if q c 1 we have qn(q) = q + 1 À aq, the ballistic scaling. The power law tail of the Lévy PDF is cut off at x C t since particles cannot travel faster than the typical velocity permits, compare ref. 133. This Lévy walk picture thus cures the divergence of the moments beyond q = a of the original Lévy flight. The fact that experimentally 224 one finds a = 0.8 and therefore qn(q) B 0.8q and not qn(q) Bq shows that purely ballistic motion between turning points is not a sufficient model to describe the measured behaviour. At least on the stochastic level this may imply that non-linear relations between jump size and waiting times are important. The experimental finding of non-ballistic scaling of large-q moments (in experiment the largest q was 8) is an indication for the insights one can achieve by analysis of time dependence of moments.

Infinite densities
At the heart of the mathematical theory of diffusive phenomena stand the Gauss and Lévy central limit theorems. A piecewise linear scaling of the moments in our example implies that the Lévy central limit theorem is a valid approximation at the central part of the PDF of particles but not in the tails. The fact that we have only two scaling behaviours of the moments may suggest that in addition to the central limit theorem there exists another general method to describe the fluctuations. Such a theory was recently obtained 237 and the problem related to infinite densities. These are a class of non-normalised densities that have only recently attracted attention from physicists.
According to eqn (102) the moments h|x| q i with q o a are given by the Lévy density. The higher order moments with q 4 a are also calculated from a density denoted I D (Á), 237 is called an infinite density, in the sense that it yields the moments h|x| q i with q 4 a but at the same time is not normalised, (see below for the physical meaning of À v). The density I D (Á) is complimentary to the Lévy density. The infinite respective Lévy densities fail to provide statistical information on the moments q o a or q 4 a, but are useful for q 4 a respective q o a. The identity of the observable of interest is thus crucial, e.g., hx 2 (t)i versus h|x(t)|i, in the sense that not only they yield different scaling behaviours with time (strong anomalous diffusion) but they are calculated from two different scaling functions. More specifically, the infinite density has the small À v behaviour 237 which is non-integrable and hence non-normalisable. Note that this non-integrability is cured when we calculate, for example, the second moment, since À v 2 À v À1Àa is integrable close to À v -0. This is the reason why this function can give information on the higher order moments q 4 a.
However, is the infinite density merely a mathematical construction with which we obtain statistical information on the moments of the process, or does it actually contain information on the particle PDF? Since Ð 1 À1 Pðx; tÞdx ¼ 1 at all times, one may wonder why a non-normalised solution emerges? The infinite density and the density P(x,t) are related according to 237 where v ¼ x=t ¼ Ð t 0 vðt 0 Þdt 0 =t is the time averaged velocity. Since Ð 1 À1 t a Pðx; tÞdx ¼ t a ! 1, the integral over the infinite density diverges when t -N. Importantly, eqn (106) implies that if we plot the density of particles (normalised to unity, with an initial condition at the origin) according to t a P(x,t) versus x/t and t 1/a P(x,t) versus x/t 1/a , the data in both cases will collapse onto a master curve. In the first way of plotting this curve will be the infinite density, and thus we can estimate this density from numerical or experimental data. In the second plot we get the well known Lévy density. 237 This dual scaling is obviously related to the bi-linear behaviour of the spectrum qn(q). Since the latter is very common we believe that infinite densities also have some general validity. In mathematics, infinite densities, briefly discussed here, are a subject of research for many years, in the context of infinite ergodic theory. [247][248][249] This branch of pure mathematics is in fact related to the phenomenon of weak ergodicity breaking discussed here. 250 For the Lévy walk model one can obtain explicit formulae for the infinite density in terms of the parameters of the model. 237 It was shown that this density depends on three measurables: F(v), a, and the anomalous diffusion constant which characterises the width of the Lévy density. 237 In that sense the infinite density yields statistical information not contained in the central limit theorem, namely it contains more fingerprints of the underlying process: the velocity distribution (which can, in principle, be measured independently). The small is not sensitive to the shape of F(v) (provided it is symmetric) and in that sense it is universal. For example, for a Gaussian F(v), the infinite density is plotted in Fig. 27. We believe that further work on infinite densities is required since only recently these have attracted some attention in statistical physics 83,230,[251][252][253][254]

Measurables
Normal and anomalous diffusion are key to many biological signalling processes and a vast number of biochemical reactions in cells, or to the spreading dynamics in inanimate complex systems. To be able to analyse diffusion measurements in a physical meaningful way and to make predictions for secondary processes such as reaction rates or signalling cascades, it is absolutely necessary to know the exact nature of the stochastic process driving the particle motion. For instance, the first passage behaviour is vastly different between the processes reviewed here. In this section we describe some experimentally relevant observables whose complementary character enables one to attain a fair degree of certainty that a given set of data is based on a concrete stochastic process. Analyses based on the application of different measures are, for instance, presented in ref. 44, 50, 68, 69, 175, 176 and 255-260. As demonstrated in the literature, 50,68,69,72 we note that there is a priori no good reason to assume that in a complex system one of the above processes is sufficient to adequately describe all the observed dynamic features: sometimes it is necessary to combine at least two of the processes, which may influence the particle motion simultaneously or at different time scales. When passive diffusion is combined with intermittent active motion additional challenges to the data analysis arise. 261

MSD
The MSD is unarguably the most common way to analyse stochastic data. Depending on the kind of measurement the quantity to evaluate is the MSD hx 2 (t)i or the time averaged MSD . The latter is the typical approach to the analysis of the time series obtained from single particle tracking. If the time averaged MSD exhibits anomalous scaling of the form d 2 ðDÞ ' D a and d 2 ðDÞ ' x 2 ðDÞ we are dealing with an (asymptotically) ergodic process, and when we want to use the data to identify the underlying stochastic process we can even eliminate CTRW motion and diffusion processes with time or space dependent diffusivity as sole contributions. To be more specific, additional complementary measures need to be evaluated.

Scatter of time averages
The statistics of the scatter of the amplitude d 2 ðDÞ of the time averaged MSD for a set of individual trajectories at a given lag time D is a useful indicator for the classification of the anomalous diffusion process. We quantify this scatter by the distribution f(x) For subdiffusive CTRW motion the trajectory-to-trajectory fluctuations are asymptotic, that is, the ergodicity breaking parameter has the finite limiting value (25) varying with a between unity and zero. As in this process the fluctuations are statistically given by the number of jumps performed during its time evolution, 73 the PDF f(x) remains unchanged when the process becomes confined. The distribution f(x) has a finite value at x = 0 for any given D, a strong characteristic of the weakly non-ergodic CTRW motion. For aged CTRW processes f(x) has a discrete immobile contribution proportional to d(x) and a continuum part whose distribution is qualitatively similar to the non-aged process, albeit there occurs a redistribution of this continuous part to larger x values. 73 The PDF f(x) for noisy CTRW processes with superimposed Ornstein-Uhlenbeck or Brownian Fig. 27 Infinite density for a Gaussian velocity distribution F(v). Notice the divergence of the density at the origin. This divergence is non-integrable, hence these functions are non-normalisable. Still, they describe the statistical properties of physical particles. 237 noise narrows down for increasing noise strength. 98 Finally, we note that subdiffusive CTRW processes with a cutoff of the power-law waiting time distribution exhibit an MSD with all features of the process with a diverging waiting time scale roughly up to the cutoff time, however, due to the lack of extreme waiting events the scatter distribution f(x) appears to be significantly more ergodic, 68 compare also the discussion in ref. 262 and 263. For FBM and FLE motion the scatter distribution at sufficiently short lag times is Gaussian and becomes somewhat asymmetric but preserves the property f(0) = 0 for larger D. 158 The ergodicity breaking parameter tends to zero with the ratio D/t. 156 For the transiently non-ergodic behaviour under confinement see ref. 166 and 179. In general, the scatter distribution and the ergodicity breaking parameter of both processes are relatively similar to those of regular Brownian motion.
SBM was shown to be a weakly non-ergodic process but its ergodicity breaking parameter tends to zero with the ratio D/t. 189,190 The PDF f(x) is approximately bell-shaped albeit wider than for the corresponding FBM with identical anomalous diffusion exponent a.
For HDPs the form of the PDF f(x) follows an asymmetric Rayleigh-like or generalised Gamma distribution. 197 The width of f(x) for HDPs with power-law form (89) grows from the minimal Brownian value attained for the scaling exponent b = 0 of K(x) and diverges at the critical point b -2. For confined HDPs the width of f(x) decreases drastically for all b values, often reaching only a minute scatter. The ergodicity breaking parameter tends to zero as 1/t for fixed D. 201 We note that in some cases, in lieu of the ergodicity breaking parameter defined above-which represents a sufficient condition for (non-)ergodicity-one uses the parameter EB ¼ d 2 ðDÞ D E.
x 2 ðDÞ , 138 representing a necessary condition for ergodicity.

First passage time statistics
When sufficient statistics are available one may use the first passage time statistics to distinguish different kinds of anomalous diffusion processes. In single particle tracking experiments the first passage can simply be measured as the moments in time when the tracer passes a certain distance from its original point of release. As shown in ref. 264 the scaling of the mean first passage time obtained from a statistical number of repeats of such an experiment with the distance from the origin may be a good indicator for the underlying diffusion process. Moreover, the PDF of first passage times can be a good indicator, especially for confined systems. While subdiffusive CTRW processes with their scale-free waiting times still exhibit a power-law decay under confinement, 42,94 other processes have an exponential form of the first passage PDF. For semi-open intervals, we note that the dependence of the scaling exponent for the first passage PDF on the stochastic process may either be increasing or decreasing**** with the anomalous diffusion exponent a, and could thus also serve as an indicator when a is varied.

Mean maximal excursions and higher order moments
Apart from the regular PDF P(x,t) a stochastic process may be characterised by another, related quantity, the PDF of the maximal excursion. This PDF measures the likelihood that at some time t after its initial release at the origin, the particle has not travelled farther than the distance x. 256,265 This distribution may be reconstructed from the measured single particle traces, and then higher order moments calculated from the data. For CTRW and FBM the scaling behaviour of the second moment of the mean maximal excursion as well as the fourth moments are known. It can be shown that the ratios of the regular moments hx 4 i/hx 2 i 2 and the corresponding quantities of the mean maximal excursions obey certain inequalities. 256 The behaviour of the other processes with respect to this method remain to be analysed. However, we mention that the method of the mean maximal excursion has a clear advantage over the regular PDF as the mean maximal excursion dynamics is less dispersed and the associated moments therefore more reliable for finite data sets. 256

Distribution of local diffusivity
An interesting tool to analyse single particle tracking data is to measure the distribution of the local anomalous diffusion coefficient as a function of the (lag) time from the ratio of the MSD (time averaged MSD) versus the (lag) time to some positive power. These distributions for a weakly non-ergodic process are different according to whether the MSD or the time averaged MSD is evaluated. A detailed discussion for Brownian processes with spatially varying diffusivity and for CTRW processes can be found in ref. 266 and 267. This method still needs to be analysed for the other anomalous stochastic processes considered herein.

Non-Gaussianity measure
Similar to the ergodicity breaking parameter EB, the non-Gaussianity measure G involves higher order moments. In terms of the experimentally relevant time averaged MSD we define the non-Gaussianity as 217 in dimension d, where the fourth time averaged moment is defined via For Brownian motion G = 0, while this parameter deviates from zero for progressively non-Gaussian diffusion. The value of G provides a sensitive measure for the type of diffusion process under consideration. For instance, based on G measurements for diffusion of fluorescent nanobeads in complex crowded fluids, the Gaussian FBM-like process was recently proposed as a suitable mathematical model rather than a CTRW process. 268 **** Notably, this occurs for FBM. 154,155 The p-variation method The p-variation method is based on the evaluation of the pth power of a partial sum of increments of a given stochastic process. 177,269 It was applied as a measure to distinguish the non-Gaussian subdiffusive CTRW from the Gaussian FBM process. 177,269 An attractive feature of the p-variation test is its applicability to both unbounded and confined systems, even when based on a single sufficiently long experimental trace. When the recorded time traces are affected by a highly noisy environment, the p-variation test may become inconclusive. 98

Velocity auto correlation
The experimentally accessible correlation of increments along some time trace x(t) can be probed in terms of the covariance 44 C (e) n (t) = e À2 h(x(t + e) À x(t))(x(e) À x(0))i.
Although it is based on increments of the position rather than on the real velocity of the particle, the quantity (109) is often referred to as the velocity auto-correlation function of the process x(t). For free, unconfined CTRW processes this function drops to zero algebraically as C (e) n (t) B 1 À (t/e) a and vanishes at t 4 e due to the independence of successive steps. 44 With this property CTRW motion is easily distinguishable from unconfined FBM, the latter being characterised by a crossover to negative values (a signature of the antipersistence) and a powerlaw recovery back to zero. The autocorrelation can be successfully used to analyse the nature of an anomalous diffusion process. 166,197 However, for confined motion also the CTRW process exhibits some form of antipersistence due to the reflections at the boundaries or the rising flanks of the confining potential. In that case the behaviour of the function C (e) n (t) becomes empirically indistinguishable between confined CTRW and FBM. 44

Discussion and conclusions
Brownian diffusion with its Gaussian propagator has an appealing beauty in its universality. No matter what the exact details of the underlying process are, the limiting behaviour is completely determined by the MSD and its linear growth with time. Concurrently, it is ergodic, so all quantities measured as time averages of sufficiently long single trajectories can be safely interpreted in terms of the readily available theoretical results in terms of ensemble averages. At the same time one could also perceive Brownian diffusion as somewhat too restrictive: many experimental observations are much richer and cannot be explained by the Gaussian propagator emanating from the central limit theorem. We note that while it may be true that apparent anomalous diffusion may in fact be due to transient crossovers of Brownian motion in confined geometries [270][271][272][273][274] the opposite may also be true: some diffusion processes failed under Brownian motion may in reality be anomalous. 64 One of the reasons may be the weakly non-ergodic behaviour discussed in this review.
Especially since more refined measurement techniques such as space resolved fluorescence recovery after photobleaching measurements and, in particular, high resolution single particle tracking have become available, anomalous diffusion has been widely observed. Most importantly, information beyond the time scaling of the MSD can be extracted from the data. These observations demonstrate that in different systems anomalous diffusion has different diffusive, first passage, and ergodic characteristics. In particular, disparities between ensembles and time averaged observables have been reported. Accommodating the features of anomalous diffusion and non-ergodicity poses the challenge to come up with a pluralistic range of stochastic models for the description of non-Brownian diffusion processes.
We here summarised the state of the art in the study of the properties of the most popular anomalous diffusion processes. In view of their importance in the analysis of experimental or simulations data we paid specific attention to the time and ensemble averaged MSDs. For ergodic processes both quantities are identical for sufficiently long measurements and ensembles. In the opposite case, when time and ensemble averaged MSDs are asymptotically disparate, we speak of a weakly non-ergodic process. The rich range of behaviours is listed in Table 1. In the sense of the ensemble averaged MSD, the considered models span from cubic time scaling down to ultraslow, logarithmic time evolution. Considering the lag time dependence of the time averaged MSD, the variation is much narrower, from quadratic scaling to a square root dependence. In most weakly non-ergodic cases a linear lag time dependence is observed and may be falsely interpreted as normal diffusion.
From a statistical physics point of view the variety of behaviours listed in Table 1 poses a number of questions, in particular, for a classification scheme of anomalous diffusion processes with respect to their (non-)ergodic behaviour and how fundamental mathematical concepts have to be generalised, for instance, the Khinchin theorem. 45 Moreover, it is of principle interest whether we can construct new processes, which break the linear lag time scaling of d 2 . At the same time there are still a number of open questions concerning the processes reviewed here, for instance, the exact form of the ergodicity breaking parameter EB beyond the CTRW case. Another question is to come up with additional methods to diagnose the underlying stochastic process from a given, limited set of data from experiments or computer simulations. In particular, Bayesian inference methods are expected to be developed further. The latter should also work well when the observed process in fact represents a blend of different stochastic processes.
It will also be of interest to extend the study of the ergodic behaviour and the features of ageing from the stochastic processes considered here to more specific systems. The latter include, for instance, the Lorentz gas model with its rich behaviour of crossovers and density effects, 280 the motion in periodically structured environments such as elastic gels, 97 or the folding dynamics of proteins. 170,281 Other interesting current questions concern the understanding from a stochastic point of view of Fickian yet non-Gaussian diffusion processes, [282][283][284] compare also the discussion in ref. 285. Finally we mention that similar concepts to those summarised here could be relevant for active transport processes.
From a more practical point of view, the discussion of the question as to what extent anomalous diffusion may impact biological function has just begun. 32,33,217,221,[286][287][288] Abbreviations and symbols Symbols used in the text are summarised in Table 2