Real-space grids and the Octopus code as tools for the development of new simulation approaches for electronic systems

Real-space grids are a powerful alternative for the simulation of electronic systems. One of the main advantages of the approach is the flexibility and simplicity of working directly in real space where the different fields are discretized on a grid, combined with competitive numerical performance and great potential for parallelization. These properties constitute a great advantage at the time of implementing and testing new physical models. Based on our experience with the Octopus code, in this article we discuss how the real-space approach has allowed for the recent development of new ideas for the simulation of electronic systems. Among these applications are approaches to calculate response properties, modeling of photoemission, optimal control of quantum systems, simulation of plasmonic systems, and the exact solution of the Schr\"odinger equation for low-dimensionality systems.


I. INTRODUCTION
The development of theoretical methods for the simulation of electronic systems is an active area of study. This interest has been fueled by the success of theoretical tools like density functional theory (DFT) [1,2] that can predict many properties with good accuracy at a relatively modest computational cost. On the other hand, these same tools are not good enough for many applications [3], and more accurate and more efficient methods are required.
Current research in the area covers a broad range of aspects of electronic structure simulations: the development of novel theoretical frameworks, new or improved methods to calculate properties within existing theories, or even more efficient and scalable algorithms. In most cases, this theoretical work requires the development of test implementations to assess the properties and predictive power of the new methods.
The development of methods for the simulations of electrons requires continual feedback and iteration between theory and results from implementation, so the translation to code of new theory needs to be easy to im-plement and to modify. This is a factor that is usually not considered when comparing the broad range of methods and codes used by chemists, physicists and material scientists.
The most popular representations for electronic structure rely on basis sets that usually have a certain physical connection to the system being simulated. In chemistry, the method of choice is to use atomic orbitals as a basis to describe the orbitals of a molecule. When these atomic orbitals are expanded in Gaussian functions, it leads to an efficient method as many integrals can be calculated from analytic formulae [4]. In condensed-matter physics, the traditional basis is a set of plane waves, which correspond to the eigenstates of a homogeneous electron gas. These physics-inspired basis sets have, however, some limitations. For example, it is not trivial to simulate crystalline systems using atomic orbitals [5], and, on the other hand, in plane-wave approaches finite systems must be approximated as periodic systems using a super-cell approach.
Several alternatives to atomic-orbital and plane-wave basis sets exist [6][7][8][9][10]. One particular approach that does not depend on a basis set uses a grid to directly represent fields in real-space. The method was pioneered by Becke [11], who used a combination of radial grids centered around each atom. In 1994 Chelikowsky, Troullier and Saad [12] presented a practical approach for the solution of the Kohn-Sham (KS) equations using uniform grids combined with pseudo-potentials. What made the approach competitive was the use of high-order finite differences to control the error of the Laplacian without requiring very dense meshes. From that moment, several real-space implementations have been presented .
Discretizing in real-space grids does not benefit from a direct physical connection to the system being simulated. However, the method has other advantages. In first place, a real-space discretization is, in most cases, straight-forward to perform starting from the continuum description of the electronic problem. Operations like integration are directly translated into sums over the grid and differential operators can be discretized using finite differences. In fact, most electronic structure codes must rely on an auxiliary real-space discretization used, for example, for the calculation of the exchange and correlation (xc) term of DFT.
Grids are flexible enough to directly simulate different kinds of systems: finite, and fully or partially periodic. It is also possible to perform simulations with reduced (or increased) dimensionality. Additionally, the discretization error can be systematically and continuously controlled by adjusting the spacing between mesh points, and the physical extension of the grid.
The simple discretization and flexibility of the real space grids makes them an ideal framework to implement, develop and test new ideas. Modern electronic structure codes are quite complex, which means that researchers seldom can write code from scratch, but instead need to resort to existing codes to implement their developments.
From the many codes available, in our experience the real-space code Octopus [22,34] provides an ideal framework for theory-development work. To illustrate this point, in this article we will explore some recent advances that have been made in computational electronic structure and that have been developed using the Octopus code as a base. We will pay special attention to the most unusual capabilities of the code, and in particular to the ones that have not been described in previous articles [22,34,35].

II. THE OCTOPUS CODE
Octopus was started around 2000 in the group of professor Angel Rubio who, at that moment, was as the University of Valladolid, Spain. The first article using Octopus was published in 2001 [36]. Today, the code has grown to 200,000 lines of code. Octopus receives contributions from many developers from several countries and its results have been used for hundreds of scientific publications.
The original purpose of Octopus was to perform realtime time-dependent density functional theory (TDDFT) calculations, a method that had been recently proposed at the time for the calculation of excited-state properties in molecules [37]. Beyond this original feature, over the time the code has become able to perform many types of calculations of ground-state and excited-state properties. These include most of the standard features of a modern electronic-structure package and some not-socommon capabilities.
Among the current capabilities of Octopus are an efficient real-time TDDFT implementation for both finite and periodic systems [38,39]. Some of the research presented in this article is based on that feature, such as the simulation of photoemission, quantum optimal control, and plasmonic systems. The code can also perform molecular-dynamics simulations in the Born-Oppenheimer and Ehrenfest approximations. It also implements a modified Ehrenfest approach for adiabatic molecular dynamics [40,41] that has favorable scaling for large systems. Octopus can perform linear-response TDDFT calculations in different frameworks; these implementations are discussed in sections III and V. For visualization, analysis and post-processing, Octopus can export quantities such as the density, orbitals, the current density, or the time-dependent electron localization function [42] to different formats, including the required DFT data to perform GW/Bethe-Salpeter calculations with the BerkeleyGW code [43].
Octopus is publicly and freely available under the GPL free/open-source license, this includes all the releases as well as the development version. The code is written using the principles of object oriented programming. This means that the code is quite flexible and modular. It provides a full toolkit for code developers to perform the operations required for the implementation of new approaches for electronic-structure calculations.
In order to control the quality of the package, Octopus uses continuous integration tools. The code includes a set of tests that checks most of the functionality by verifying the calculation results. After a new change is commited to the main repository, a set of servers with different configurations compiles the code and runs a series of short tests. This setup quickly detects most of the problems in a commit, from syntax that a compiler will not accept, to unexpected changes in the results. Every night a more comprehensive set of tests is executed by these same servers. The test-suite framework is quite general and is also successfully in use for the BerkeleyGW [43] and APE [44] codes.

III. THE STERNHEIMER FORMULATION OF LINEAR-RESPONSE
In textbooks, perturbation theory is formulated in terms of sums over states and response functions. These are useful theoretical constructions that permit a good description and understanding of the underlying physics. However, this is not always a good starting point for nu-merical applications, since it involves the calculation of a large number of eigenvectors, infinite sums over these eigenvectors, and functions that depend on two or more spatial variables.
An interesting approach that avoids the problems mentioned above is the formulation of perturbation theory in terms of differential equations for the variation of the wave-functions. In the literature, this is usually called the Sternheimer equation [45] or density functional perturbation theory (DFPT) [46]. Although a perturbative technique, it avoids the use of empty states, and has a favorable scaling with the number of atoms.
Octopus implements a generalized version of the Sternheimer equation that is able to cope with both static and dynamic response in and out of resonance [47]. The method is suited for linear and non-linear response; higher-order Sternheimer equations can be obtained for higher-order variations. For second-order response, however, we apply the 2n + 1 theorem (also known as Wigner's 2n + 1 rule) [48,49] to get the coefficients directly from first-order response variations.
In the Sternheimer formalism, we consider the response to a monochromatic perturbative field λδv(r) cos (ωt). This perturbation induces a variation in the timedependent KS orbitals, which we denote δϕ n (r, ω). These variations allow us to calculate response observables, for example, the frequency-dependent polarizability.
In order to calculate the variations of the orbitals we need to solve a linear equation that only depends on the occupied orbitals (atomic units are used throughout) Ĥ − n ± ω + iη δϕ n (r, ±ω) = −P c δĤ(±ω)ϕ n (r) , (1) where the variation of the time-dependent density, given by needs to be calculated self-consistently. The first-order variation of the KS Hamiltonian is P c is a projection operator, and η a positive infinitesimal, essential to obtain the correct position of the poles of the causal response function, and consequently to obtain the imaginary part of the polarizability and remove the divergences of the equation for resonant frequencies. In the usual implementation of DFPT,P c = 1 − occ n |ϕ n ϕ n | which effectively removes the components of δϕ n (r, ±ω) in the subspace of the occupied ground-state wave-functions. In linear response, these components do not contribute to the variation of the density.
We have found that it is not strictly necessary to project out the occupied subspace, the crucial part is simply to remove the projection of δϕ n on ϕ n (and any other states degenerate with it), which is not physically meaningful and arises from a phase convention. To fix the phase, it is sufficient to apply a minimal projector P n = 1 − m = n m |ϕ m ϕ m |. We optionally use this approach to obtain the entire response wavefunction, not just the projection in the unoccupied subspace, which is needed for obtaining effective masses in k · p theory. While the full projection can become time-consuming for large systems, it saves time overall since it increases the condition number of the matrix for the linear solver, and thus reduces the number of solver iterations required to attain a given precision.
We also have implemented the Sternheimer formalism when non-integer occupations are used, as appropriate for metallic systems. In this case weighted projectors are added to both sides of eq. (1) [50]. We have generalized the equations to the dynamic case [51]. The modified Sternheimer equation is β n,m =θ F,nθn,m +θ F,mθm,n + α mθ F,n −θ n,m n − m ∓ ωθ m,n , σ is the broadening energy, andθ ij is the smearing scheme's approximation to the Heaviside function θ (( i − j ) /σ). Apart from semiconducting smearing (i.e. the original equation above, which corresponds to the zero temperature limit), the code offers the standard Fermi-Dirac [52], Methfessel-Paxton [53], spline [54], and cold [55] smearing schemes. Additionally, we have developed a scheme for handling arbitrary fractional occupations, which do not have to be defined by a function of the energy eigenvalues [51]. In order to solve eq. (1) we use a self-consistent iteration scheme similar to the one used for ground-state DFT. In each iteration we need to solve a sparse linear problem where the operator to invert is the shifted KS Hamiltonian. For real wavefunctions and a real shift (as for the static case), we can use conjugate gradients. When the shift is complex, a non-Hermitian iterative solver is required. We have found that a robust and efficient solver is the quasi-minimal residual (QMR) method [56].
We can solve for linear response to various different perturbations. The most straight-forward case is the response of a finite system to an electric field E i,ω with frequency ω in the direction i, where the perturbation operator is δv =r i [47]. In this case the polarizability can be calculated as The calculation of the polarizability yields optical response properties (that can be extended to nonlinear response) [47,57] and, for imaginary frequencies, van der Waals coefficients [58]. It is also possible to use the formalism to compute vibrational properties for finite and periodic systems [46,59]. Currently Octopus implements the calculations of vibrations for finite systems. In this case the perturbation operator is an infinitesimal ionic displacement ∂Ĥ/∂R iα = ∂v α /∂R iα , for each direction i and atom α. The quantity to calculate is the dynamical matrix, or Hessian, given by The contribution from the ion-ion interaction energy is where Z α is the ionic charge of atom α. We have found that an alternative expression for the perturbation operator yields more accurate results when discretized. This is discussed in section VI. Vibrational frequencies ω are obtained by solving the eigenvalue equation where m α is the mass for ion α. For a finite system of N atoms, there should be 3 zero-frequency translational modes and 3 zero-frequency rotational modes. However, they may appear at positive or imaginary frequencies, due to the finite size of the simulation domain, the discretization of the grid, and finite precision in solution of the ground state and Sternheimer equation. Improving convergence brings them closer to zero. The Born effective charges can be computed from the response of the dipole moment to ionic displacement: The intensities for each mode for absorption of radiation polarized in direction i, which can be used to predict infrared spectra, are calculated by multiplying by the normal-mode eigenvector x The Born charges must obey the acoustic sum rule, from translational invariance For each ij, we enforce this sum rule by distributing the discrepancy equally among the atoms, and thus obtaining corrected Born charges: The discrepancy arises from the same causes as the nonzero translational and rotational modes. The Sternheimer equation can be used in conjunction with k · p perturbation theory [60] to obtain band velocities and effective masses, as well as to apply electric fields via the quantum theory of polarization. In this case the perturbation is a displacement in the k-point. Using the effective Hamiltonian for the k-point the perturbation is represented by the operator including the effect on the non-local pseudopotentials. The first-order term gives the band group velocities in a periodic system, Inverse effective mass tensors can be calculated by solving the Sternheimer equation for the k · p perturbation. The equation is not solved self-consistently, since the variation of k-point is not a physical perturbation to the system; a converged k-grid should give the same density even if displaced slightly. The perturbation ∂Ĥ k /∂k is purely anti-Hermitian. We use instead −i∂Ĥ k /∂k to obtain a Hermitian perturbation, which allows the response to real wavefunctions to remain real. The effective mass tensors are calculated as follows: The k · p wavefunctions can be used to compute the response to electric fields in periodic systems. In finite systems, a homogeneous electric field can be represented simply via the position operator r. However, this operator is not well defined in a periodic system and cannot be used. According to the quantum theory of polarization, the solution is to replace rϕ with −i∂ϕ/∂k [61], and then use this as the perturbation on the right hand side in the Sternheimer equation [62]. While this is typically done with a finite difference with respect to k [49,63], we use an analytic derivative from a previous k · p Sternheimer calculation. Using the results in eq. (7) gives a formula for the polarization of the crystal: The polarizability is most usefully represented in a periodic system via the dielectric constant where V is the volume of the unit cell. This scheme can also be extended to non-linear response. We can compute the Born charges from the electricfield response in either finite or periodic systems (as a complementary approach to using the vibrational response): This expression can be evaluated with the same approach as for the dynamical matrix elements, and is easily generalized to non-zero frequency too. We can also make the previous expression eq. 11 for Born charges from the vibrational perturbation usable in a periodic system with the replacement rϕ → −i∂ϕ/∂k. Unfortunately, the k · p perturbation is not usable to calculate the polarization [61], and a sum over strings of k-points on a finer grid is required. We have implemented the special case of a Γ-point calculation for a large supercell, where the single-point Berry phase can be used [64]. For cell sizes L i in each direction, the dipole moment is derived from the determinant of a matrix whose basis is the occupied KS orbitals:

IV. MAGNETIC RESPONSE AND GAUGE INVARIANCE IN REAL-SPACE GRIDS
In the presence of a magnetic field B(r, t), generated by a vector potential A(r, t), additional terms describing the coupling of the electrons to the magnetic field must be included in the Hamiltonian The first part describes the orbital interaction with the field, and the second one is the Zeeman term that represents the coupling of the electronic spin with the magnetic field. As our main interest is the evaluation of the magnetic susceptibility, in the following, we consider a perturbative uniform static magnetic field B applied to a finite system with zero total spin. In the Coulomb gauge the corresponding vector potential, A, is given as In orders of B the perturbing potentials are withL the angular momentum operator, and The induced magnetic moment can be expanded in terms of the external magnetic field which, to first order, reads where χ is the magnetic susceptibility tensor. For finite systems the permanent magnetic moment can be calculated directly from the ground-state wave-functions as m 0 = n ϕ n |δv mag |ϕ n .
For the susceptibility, we need to calculate the firstorder response functions in the presence of a magnetic field. This can be done in practice by using the magnetic perturbation, eq. (25), in the Sternheimer formalism described in section III. If the system is time-reversal symmetric, since the perturbation is anti-symmetric under time-reversal (anti-Hermitian), it does not induce a change in the density and the Sternheimer equation does not need to be solved self-consistently. From there we find χ ij = n ϕ n |δv mag j |δϕ n,i +c.c.+ ϕ n |δ 2vmag ij |ϕ n .
Before applying this formalism in a calculation, however, we must make sure that our calculation is gauge invariant.
In numerical implementations, the gauge freedom in choosing the vector potential might lead to poor convergence with the quality of the discretization, and to a dependence of the magnetic response on the origin of the simulation cell. In other words, an arbitrary translation of the molecule could introduce an nonphysical change in the calculated observables. This broken gauge-invariance is well known in molecular calculations with all-electron methods that make use of localized basis sets. In this case, the error can be traced to the finite-basis-set representation of the wave-functions [65,66]. A simple measure of the error is to check for the fulfillment of the hyper-virial relation [67].
where n is the eigenvalue of the state ϕ n . When working with a real-space mesh, this problem also appears, though it is milder, because the standard operator representation in the grid is not gauge-invariant. In this case the error can be controlled by reducing the spacing of the mesh. On the other hand, real-space grids usually require the use of the pseudo-potential approximation, where the electron-ion interaction is described by a non-local potentialv nl . This, or any other nonlocal potential, introduces a fundamental problem when describing the interaction with magnetic fields or vector potentials in general. To preserve gauge invariance, this term must be adequately coupled to the external electromagnetic field, otherwise the results will strongly depend on the origin of the gauge. For example, an extra term has to be included in the hyper-virial expression, eq. (30), resulting in i ϕ j |p|ϕ n = ( n − j ) ϕ j |r|ϕ n + ϕ j |[r,v nl ]|ϕ n . (31) In general, the gauge-invariant non-local potential is given by The integration path can be any one that connects the two points r and r , so an infinite number of choices is possible.
In order to calculate the corrections required to the magnetic perturbation operators, we use two different integration paths that have been suggested in the literature. The first was proposed by Ismail-Beigi, Chang, and Louie (ICL) [68] who give the following correction to the first-order magnetic perturbation term and a similar term for the second-order perturbation. Using a different integration path, Pickard and Mauri [69] proposed the GIPAW method, that has the form Calculated magnetic susceptibilities (χ in cgs ppm/mol) per number of boron atoms for the selected boron clusters shown in Fig. 1. Results from Ref. [73] where R α andv α nl are, respectively, the position and nonlocal potential of atom α. With the inclusion of either one of these methods, both implemented in Octopus, we recover gauge invariance in our formalism when pseudopotentials are used. This allows us to predict the magnetic susceptibility and other properties that depend on magnetic observables, like optical activity [70].
A class of systems with interesting magnetic susceptibilities are fullerenes. For example, it is known that the C 60 fullerene has a very small magnetic susceptibility due to the cancellation of the paramagnetic and diamagnetic responses [71,72]. Botti et al. [73] used the real-space implementation of Octopus to study the magnetic response of the boron fullerenes depicted in Fig. 1. As shown in table I, they found that, while most clusters are diamagnetic, B 80 is paramagnetic, with a strong cancellation of the paramagnetic and diamagnetic terms.

V. LINEAR RESPONSE IN THE ELECTRON-HOLE BASIS
An alternate approach to linear response is not to solve for the response function but rather for its poles (the excitation energies ω k ) and residues (e.g. electric dipole matrix elements d k ) [74]. The polarizability is given by and the absorption cross-section is whereα is the fine-structure constant. The simplest approximation to use is the random-phase approximation (RPA), in which the excitation energies are given by the differences of unoccupied and occupied KS eigenvalues, ω cv = c − v . The corresponding dipole matrix elements are d cv = ϕ c |r| ϕ v [75]. (As implemented in the code, this section will refer only to the case of a system without partially occupied levels.) The RPA is not a very satisfactory approximation, however. The full solution within TDDFT is given by a non-Hermitian matrix eigenvalue equation, with a basis consisting of both occupied-unoccupied (v → c) and unoccupied-occupied (c → v) KS transitions. The equation reads as where the A matrices couple v → c transitions among themselves and c → v among themselves, while the B matrices couple the two types of transitions. They have the form [75] wherev c is the Coulomb kernel, andf xc is the exchangecorrelation kernel (currently only supported for LDAtype functionals in Octopus). We do not solve the full equation in Octopus, but provide a hierarchy of approximations. An example calculation for the N 2 molecule with each theory level is shown in Table II. The lowest approximation we use is RPA. The next is the single-pole approximation of Petersilka et al. [76], in which only the diagonal elements of the matrix are considered. Like in the RPA case, the eigenvectors and dipole matrix elements are simply the KS transitions. The positive eigenvalues are ω cv = c − v + A cvcv . This can be a reasonable approximation when there is little mixing between KS transitions, but generally fails when there are degenerate or nearly degenerate transitions.
A next level of approximation is the Tamm-Dancoff approximation to TDDFT [77] in which the B blocks are neglected and thus we need only consider the occupiedunoccupied transitions. The matrix equation is reduced to a Hermitian problem of half the size of the full problem: Interestingly, the Tamm-Dancoff approximation is often found to give superior results to the full solution, for example for molecular potential-energy surfaces or when hybrid functionals are used, which can suffer from a "triplet instability" in which the lowest triplet state is lower in energy than the ground state [78]. The dipole matrix elements are now a superposition of the KS ones: When the wavefunctions are real, the full problem can be collapsed into a Hermitian one of the same size as the Tamm-Dancoff matrix, known as Casida's equation [79,80].
The dipole matrix elements are An alternate approach for finding excitation energies is to look for many-body eigenstates of the DFT Hamiltonian which are orthogonal to the ground state. In the "second-order constrained variational" or CV(2) theory [81], second-order perturbation theory from the groundstate density yields equations quite similar to the linearresponse approach, despite their different origin: We implement the case of real wavefunctions and eigenvectors, in which case (as for Casida's equation) a Hermitian matrix equation for only the occupied-unoccupied transitions can be written: The Tamm-Dancoff approximation to these equations is identical to the ordinary TDDFT Tamm-Dancoff approximation.
Note that all the levels of theory we have discussed use the same Coulomb andf xc matrix elements, so the code can calculate the results for multiple levels of theory with a small extra effort. We can also consider alternative perturbations in this framework beyond the dipole approximation for properties such as inelastic X-ray scattering [82]. TABLE II. The first 6 excitation energies (in eV) for the N2 molecule with different approximations to TDDFT in the electron-hole basis: the random phase approximation (RPA), Petersilka, Tamm-Dancoff approximation (TDA), Casida and CV (2). The VWN LDA parametrization [83] was used for the exchange-correlation functional, the bond length is 1.098Å, the real-space grid was a sphere of radius 7.4Å with spacing 0.16Å, and 16 unoccupied states were used. The experimental data is from Ref. [84].
For a non-spin-polarized system, the excitations separate into a singlet and a triplet subspace, which are superpositions of singlet and triplet KS transitions: The signs are reversed from the situation for a simple pair of electrons, since we are instead dealing with an electron and a hole. There are of course two other triplet excitations (m = ±1) which are degenerate with the m = 0 one above. Rather than performing spin-polarized groundstate and linear-response calculations, we can use the symmetry between the spins in a non-spin-polarized system to derive a form of the kernel to use in obtaining singlet and triplet excitations [75] These kernels can be used in any of the levels of theory above: RPA, Petersilka, Tamm-Dancoff, Casida, and CV (2). The corresponding electric dipole matrix elements are as in the spin-polarized case for singlet excitations. For triplet excitations, they are identically zero, and only higher-order electromagnetic processes can excite them.
There are three main steps in the calculation: calculation of the matrix, diagonalization of the matrix, and calculation of the dipole matrix elements. The first step generally takes almost all the computation time, and is the most important to optimize. Within that step, the Coulomb part (since it is non-local) is much more timeconsuming than thef xc part. We calculate it by solving the Poisson equation (as for the Hartree potential) for each column of the matrix, to obtain a potential P for the density ϕ c (r) * ϕ v (r), and then for each row computing the matrix element as Our basic parallelization strategy for computation of the matrix elements is by domains, as discussed in section XV, but we add an additional level of parallelization here over occupied-unoccupied pairs. We distribute the columns of the matrix, and do not distribute the rows, to avoid duplication of Poisson solves. We can reduce the number of matrix elements to be computed by almost half using the Hermitian nature of the matrix, i.e. The columns then are assigned to the available processors in a round-robin fashion. The diagonalization step is performed by direct diagonalization with LAPACK [85] in serial; since it generally accounts for only a small part of the computation time, parallelization of this step is not very important. The final step is calculation of the dipole matrix elements, which amounts to only a small part of the computation time, and uses only domain parallelization. Note that the triplet kernel lacks the Coulomb term, and so is considerably faster to compute.
Using the result of a calculation of excited states by one of these methods, and a previous calculation of vibrational modes with the Sternheimer equation, we can compute forces in each excited state, which can be used for excited-state structural relaxation or molecular dynamics [86]. Our formulation allows us to do this without introducing any extra summations over empty states, unlike previous force implementations [87][88][89]. The energy of a given excited state k is a sum of the ground-state energy and the excitation energy: The force is then given by the ground-state force, minus the derivative of the excitation energy: Using the Hellman-Feynman Theorem we find the last term without introducing any additional sums over unoccupied states. In the particular case of the Tamm-Dancoff approximation we have and Analogous equations apply for the difference of eigenvalues, Petersilka, and CV(2) theory levels. (The slightly more complicated Casida case has not yet been implemented.) The Coulomb term, with no explicit dependence on the atomic positions, does not appear, leading to a significant savings in computational time compared to the calculation of the excited states.

FIG. 2.
Distribution of matrix elements to be calculated among the columns, using Hermiticity of the response matrix. The columns are then distributed among the available MPI groups for electron-hole parallelization. The number of matrix elements to be calculated per column is equal for an odd size, and uneven for an even size.

VI. FORCES AND GEOMETRY OPTIMIZATION ON REAL-SPACE GRIDS
A function represented on a real-space grid is not invariant under translations as one would expect from a physical system. The potential of an atom sitting on top of a grid point might be slightly different from the potential of the same atom located between points. This implies that a rigid displacement of the system produces an artificial variation of the energy and other properties. If we plot the energy of the atom as a function of this rigid displacement, the energy shows an oscillation that gives this phenomenon the name of the "egg-box effect".
The egg-box effect is particularly problematic for calculations where the atoms are allowed to move, for example to study the dynamics of the atoms (molecular dynamics) or to find the minimum energy configuration (geometry optimization).
In Octopus we have studied several schemes to control the egg-box effect [90]. The first step is to use pseudopotential filtering to eliminate Fourier components of the potential that cannot be represented on the grid [91].
Additionally, we have found a formulation for the forces that reduces the spurious effect of the grid on the calculations. One term in the forces is the expectation value of the derivative of the ionic potential with respect to the ionic position R α , which can be evaluated as (For simplicity, we consider only local potentials here, but the results are valid for non-local potentials as well.) This term can be rewritten such that it does not include the derivative of the ionic potential v α , but the gradient of the orbitals with respect to the electronic coordinates [92]: The first advantage of this formulation is that it is easier to implement than eq. (54), as it does not require the derivatives of the potential, which can be quite complex and difficult to code, especially when relativistic corrections are included. However, the main benefit of using eq. (55) is that it is more precise when discretized on a grid, as the orbitals are smoother than the ionic potential. We illustrate this point in Fig. 3, where the forces obtained with the two methods are compared. While taking the derivative of the atomic potential gives forces with a considerable oscillation due to the grid, using the derivative of the orbitals gives a force that is considerably smoother. This alternative formulation of the forces can be extended to obtain the second-order derivatives of the energy with respect to the atomic displacements [90], which are required to calculate vibrational properties as discussed in section III. In general, the perturbation operator associated with an ionic displacement can be written as Using this expression, the terms of the dynamical matrix, eq. (8), are evaluated as and With our approach, the forces tend to converge faster with the grid spacing than the energy. This means that to perform geometry optimizations it would be ideal to have a local minimization method that only relies on the forces, without needing to evaluate the energy, as both values will not be entirely consistent. Such a method is the fast inertial relaxation engine (FIRE) algorithm, put forward by Bitzek et al. [93]. FIRE has shown a competitive performance compared with both the standard conjugate-gradient method, and more sophisticated variations typically used in ab initio calculations. A recent article shows also the FIRE as one of the most convenient algorithm due to its speed and precision to reach the nearest local minimum starting from a given initial configuration [94].
The FIRE algorithm is based on molecular dynamics with additional velocity modifications and adaptive time steps which only requires first derivatives of the target function. In the FIRE algorithm, the system slides down the potential-energy surface, gathering "momentum" until the direction of the gradient changes, at which point it stops, resets the adaptive parameters, and resumes sliding. This gain of momentum is done through the modification of the time step ∆t as adaptive parameter, and by introducing the following velocity modification where v is the velocity of the atoms, α is an adaptive parameter, andF is a unitary vector in the direction of the force F . By doing this velocity modification, the acceleration of the atoms is given bẏ where the second term is an introduced acceleration in a direction "steeper" than the usual direction of motion. Obviously, if α = 0 then V(t) = v(t), meaning the velocity modification vanishes, and the acceleratioṅ v(t) = F (t)/m, as usual.
We illustrate how the algorithm works with a simple case: the geometry optimization of a methane molecule. The input geometry consists of one carbon atom at the center of a tetrahedron, and four hydrogen atoms at the vertices, where the initial C-H distance is 1.2Å. In Fig. 4 we plot the energy difference ∆E tot with respect to the equilibrium conformation, the maximum component of the force acting on the ions F max , and the C-H bond length. On the first iterations, the geometry approaches the equilibrium position, but moves away on the 3rd. This means a change in the direction of the gradient, so there is no movement in the 4th iteration, the adaptive parameters are reset, and sliding resumes in the 5th iteration.

VII. PHOTOEMISSION
Electron photoemission embraces all the processes where an atom, a molecule or a surface is ionized under the effect of an external electromagnetic field. In experiments, the ejected electrons are measured with detectors that are capable of characterizing their kinetic properties. Energy-resolved, P (E), and momentum-resolved, P (k), photoemission probabilities are quite interesting observables since they carry important information, for instance, on the parent ion [95,96] or on the ionization process itself [97]. The calculation of these quantities is a difficult task because the process requires the evaluation of the total wavefunction in an extremely large portion of space (in principle a macroscopic one) that would be impractical to represent in real space.
We have developed a scheme to calculate photoemission based on real-time TDDFT that is currently implemented in Octopus. We use a mixed real-and momentum-space approach. Each KS orbital is propagated in real space on a restricted simulation box, and then matched at the boundary with a momentum-space representation.
The matching is made with the help of a mask function M (r), like the one shown in Fig. 5, that separates each orbital into a bounded φ A i (r) and an unbounded component φ B i (r) as follows: Starting from a set of orbitals localized in A at t = 0 it is possible to derive a time-propagation scheme with time step ∆t by recursively applying the discrete timeevolution operatorÛ (∆t) ≡Û (t+∆t, t) and splitting the components with eq. (61). The result can be written in a closed form for φ A i (r, t), represented in real space, and φ B i (k, t), in momentum space, with the following structure: and the additional set of equations, The momentum-resolved photoelectron probability is then obtained directly from the momentum components as [98] P (k) = lim while the energy-resolved probability follows by direct integration, P (E) = E=|k| 2 /2 dkP (k).
In eq. (63) we introduced the Volkov propagator U v (∆t) for the wavefunctions in B. It is the timeevolution operator associated with the HamiltonianĤ v describing free electrons in an oscillating field. Given a time dependent vector field A(t), the Hamiltonian expressed in the velocity gauge is diagonal in momentum and can be naturally applied to φ B i (k, t). For all systems that can be described by a Hamiltonian such thatĤ(r, t) =Ĥ v (r, t) for r ∈ B and all time t, eqs. (62) and (63) are equivalent to a time propagation in the entire space A ∪ B. In particular, it exactly describes situations where the electrons follow trajectories crossing the boundary separating A and B as illustrated in Fig. 5(b).
In Octopus we discretize eq. (63) in real and momen-tum space and co-propagate the complete set of orbitals φ A i (r, t) and φ B i (k, t). The propagation has to take care of additional details since the discretization can introduce numerical instability. In fact, substituting the Fourier integrals in (63) with Fourier sums (usually evaluated with FFTs) imposes periodic boundary conditions that spuriously reintroduces charge that was supposed to disappear. This is illustrated with a one-dimensional example in Fig. 6(a) where a wavepacket launched towards the left edge of the simulation box reappears from the other edge.
An alternative discretization strategy is zero padding. This is done by embedding the system into a simulation box enlarged by a factor α > 1, extending the orbitals with zeros in the outer region as shown in Fig. 6(b). In this way, the periodic boundaries are pushed away from the simulation box and the wavepackets have to travel an additional distance 2(α − 1)L before reappearing from the other side. In doing so, the computational cost is increased by adding (α − 1)n points for each orbital.
This cost can be greatly reduced using a special grid with only two additional points placed at ±αL as shown in Fig. 6(c). Since the new grid has non uniform spacing a non-equispaced FFT (NFFT) is used [99,100]. With this strategy, a price is paid in momentum space where the maximum momentum k max is reduced by a factor α compared to ordinary FFT. In Octopus we implemented all three strategies: bare FFT, zero padding with FFT and zero padding with NFFT.
All these discretization strategies are numerically stable for a propagation time approximately equivalent to the time that it takes for a wavepacket with the highest momentum considered to be reintroduced in the simulation box. For longer times we can employ a modified set of equations. It can be derived from (68) under the assumption that the electron flow is only outgoing. In this case we can drop the equation for ϕ B i responsible for the ingoing flow and obtain the set This new set of equations together with (62) lifts the periodic conditions at the boundaries and secures numerical stability for arbitrary long time propagations. A consequence of this approximation is the fact that the removal of charge is performed only in the equation for ϕ A i by means of a multiplication by M (r). This is equivalent to the use of a mask function boundary absorber that is known to present reflections in an energy range that depends on M (r) [101]. Carefully choosing the most appropriate mask function thus becomes of key importance in order to obtain accurate results.
We conclude briefly summarizing some of the most important features and applications of our approach. The method allows us to retrieve P (k), the most resolved quantity available in experiments nowadays. In addition, it is very flexible with respect to the definition of the external field and can operate in a wide range of situations. In the strong-field regime, it can handle interesting situations, for instance, when the electrons follow trajectories extending beyond the simulation box, or when the target system is a large molecule. This constitutes a step forward compared to the standard theoretical tools employed in the field which, in the large majority of cases, invoke the single-active-electron approximation. In Ref. [98] the code was successfully employed to study the photoelectron angular distributions of nitrogen dimers under a strong infrared laser field. The method can efficiently describe situations where more than one laser pulse is involved. This includes, for instance, timeresolved measurements where pump and probe setups are employed. In Ref. [102] Octopus was used to monitor the time evolution of the π → π * transition in ethylene molecules with photoelectrons. The study was later extended to include the effect of moving ions at the classical level [103]. Finally, we point out that our method is by no means restricted to the study of light-induced ionization but can be applied to characterize ionization induced by other processes, for example, ionization taking place after a proton collision.

VIII. COMPLEX SCALING AND RESONANCES
In this section we discuss the calculation of resonant electronic states by means of the complex-scaling method, as implemented in Octopus. By "resonant states," we mean metastable electronic states of finite systems, such as atoms or molecules, with a characteristic energy and lifetime.
Mathematically, resonances can be defined as poles of the scattering matrix or cross-section at complex energies [104,105]. If a pole is close to the real energy axis, it will produce a large, narrow peak in the cross-section of scattered continuum states. One way resonances can arise is from application of an electric field strong enough to ionize the system through tunnelling. Resonant states may temporarily capture incoming electrons or electrons excited from bound states, making them important intermediate states in many processes.
The defining characteristic of a resonant state, often called a Siegert state [104], is that it has an outgoing component but not an incoming one. They can be determined by solving the time-independent Schrödinger equation with the boundary condition that the wavefunction must asymptotically have the form where the momentum k is complex and has a negative imaginary part. This causes the state to diverge exponentially in space as r → ∞. The state can further be ascribed a complex energy, likewise with a negative imaginary part, causing it to decay over time at every point in space uniformly. Resonant states are not eigenstates of any Hermitian operator and in particular do not reside within the Hilbert space. This precludes their direct calculation with the standard computational methods from DFT. However, it turns out that a suitably chosen analytic continuation of a Siegert state is localized, and this form can be used to derive information from the state. This is the idea behind the complex-scaling method [106,107] where states and operators are represented by means of the transformation where N is the number of spatial dimensions to which the scaling operation is applied, and θ is a fixed scaling angle which determines the path in the complex plane along which the analytic continuation is taken. The transformation maps the Hamiltonian to a non-Hermitian oper-atorĤ θ =R θĤR−θ . The Siegert states ψ(r) of the original Hamiltonian are square-integrable eigenstates ψ θ (r) ofĤ θ , and their eigenvalues 0 − iΓ/2 define the energy 0 and width Γ of the resonance [108][109][110].
A typical example of a spectrum of the transformed HamiltonianĤ θ is shown in Fig. 7, and the corresponding potential and lowest bound and resonant states in Fig. 8. The bound-state energies are unchanged while the continuum rotates by −2θ around the origin. Finally, resonances appear as isolated eigenvalues in the fourth quadrant once θ is sufficiently large to "uncover" them from the continuum. Importantly, matrix elements (and in particular energies) of states are independent of θ as long as the states are localized and well represented numerically -this ensures that all physical bound-state characteristics of the untransformed Hamiltonian are retained.
Our implementation supports calculations with complex scaling for independent particles or in combination with DFT and selected xc functionals [111]. The energy functional in KS-DFT consists of several terms that are all expressible as integrals of the density or the wavefunctions with the kinetic operator and various potentials. The functional is complex-scaled as per the prescribed method by rotating the real-space integration contour of every term by θ in the complex plane. The DFT energy functional becomes with the now-complex electron density with occupation numbers f n , and complex-scaled KS states ϕ θn (r). Note that no complex conjugation is performed on the left component in matrix elements such as the density or kinetic energy. In order to define the complex-scaled xc potential, it is necessary to perform an analytic continuation procedure [111]. In standard DFT, the KS equations are obtained by taking the functional derivative of the energy functional with respect to the density. Solving the equations corresponds to searching for a stationary point, with the idea that this minimizes the energy. In our case, since the energy functional is complex-valued [112], we cannot minimize the energy functional, but we can still search for stationary points to find the resonances [113,114]. The complex-scaled version of the KS equations thereby becomes similar to the usual ones: The effective potential v θ (r) is the functional derivative of the energy functional with respect to the density n θ (r), and, therefore, consists of the terms where v ext (re iθ ) may represent atomic potentials as analytically continued pseudopotentials, and where the Hartree potential v θ H (r) = e −iθ dr n θ (r ) |r − r| (75) is determined by solving the Poisson equation defined by the complex density. Together with the xc potential, this defines a self-consistency cycle very similar to ordinary KS DFT, although more care must be taken to occupy the correct states, as they are no longer simply ordered by energy. Fig. 9 shows calculated ionization rates for the He 1s state in a uniform Stark-type electric field as a function of field strength. In the limit of weak electric fields, the simple approximation by Ammosov, Delone and Krainov (ADK) [115], which depends only on the ionization potential, approaches the accurate reference calculation by Scrinzi and co-workers [116]. This demonstrates that the ionization rate is determined largely by the ionization potential for weak fields. As the local density approximation is known to produce inaccurate ionization potentials due to its wrong asymptotic form at large distances, it necessarily yields inaccurate rates at low fields. Meanwhile exact exchange, which is known to produce accurate ionization energies, predicts ionization rates much closer to the reference calculation. The key property of the xc functional that allows accurate determination of decay rates from complex-scaled DFT therefore appears to be that it must yield accurate ionization potentials, which is linked to its ability to reproduce the correct asymptotic form of the potential at large distances from the system [117].   Potential (blue) and the real (solid) and imaginary (dotted) parts of the two bound (green) and three lowest resonant (red) wavefunctions. For improved visualization, the wavefunctions are vertically displaced by the real parts of their energies.
FIG. 9. Ionization rates of the He atom in strong electric fields using the local density approximation (LDA) and exact exchange (EXX), compared to an accurate numerical reference [116] as well as the analytic ADK approximation [115]. Results from Ref. [111] IX. QUANTUM OPTIMAL CONTROL In recent years, we have added to Octopus some of the key advancements of quantum optimal-control theory (QOCT) [118,119]. In this section, we will briefly summarize what this theory is about, overview the current status of its implementation, and describe some of the results that have been obtained with it until now.
Quantum control can be loosely defined as the manipulation of physical processes at the quantum level. We are concerned here with the theoretical branch of this discipline, whose most general formulation is precisely QOCT. This is, in fact, a particular case of the general mathematical field of "optimal control", which studies the optimization of dynamical processes in general. The first applications of optimal control in the quantum realm appeared in the 80s [120][121][122], and the field has rapidly evolved since then. Broadly speaking, QOCT attempts to answer the following question: given a quantum process governed by a Hamiltonian that depends on a set of parameters, what are the values of those parameters that maximize a given observable that depends on the behav-ior of the system? In mathematical terms: let a set of parameters u 1 , . . . , u M ≡ u determine the Hamiltonian of a systemĤ [u, t], so that the evolution of the system also depends on the value taken by those parameters: i.e. the solution of the Schrödinger equation determines a map u −→ ψ[u]. Suppose we wish to optimize a functional of the system F = F [ψ]. QOCT is about finding the extrema of G(u) = F [ψ [u]]. Beyond this search, QOCT also studies topics such as the robustness of the optimal solutions for those parameters, the number of solutions, or the construction of suitable algorithms to compute them. Perhaps the most relevant result of QOCT is the equation for the gradient of G, which allows use of the various maximization algorithms available. For the simple formulation given above, this gradient is given by where χ is the costate, an auxiliary wave function that is defined through the following equation of motion: This equation assumes, in order to keep this description short, that the target functional F depends on the state of the system only at the final time of the propagation T , i.e. it is a functional of ψ(T ). Note the presence of a boundary value equation at the final time of the propagation, as opposed to the equation of motion for the "real" system ψ, which naturally depends on an initial value condition at time zero. With these simple equations, we may already summarize what is needed from an implementation point of view in order to perform basic QOCT calculations: The first step is the selection of the parameters u, that constitute the search space. Frequently, these parameters are simply the values that the control function (typically, the electric-field amplitude) takes at the time intervals that are used to discretize the propagation interval, i.e. it is a "real-time parametrization". However, more sophisticated parametrizations allow fine-tuning of the search space, introducing constraints and penalties into the formulation.
Then, one must choose an algorithm for maximizing multi-dimensional functions such as G. One possibility is the family of gradient-less algorithms, which only require a procedure to compute the value of the function, and do not need the gradient. In this case, the previous equations are obviously not needed. One only has to propagate the system forwards in time, which is what Octopus can do best. The value of the function G can then be computed from the evolution of ψ obtained with this propagation, and fed into the optimization procedure. A few gradient-less algorithms are implemented in Octopus.
The most efficient optimizations can be obtained if information about the gradient is employed. In that case, we can use standard schemes, such as the family of conjugate-gradient algorithms, or the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton scheme -we use the implementation of these algorithms included in the GSL mathematical library [123]. Some ad hoc algorithms, developed explicitly for QOCT, exist. These may in some circumstances be faster than the general purpose ones. Some of those are implemented in Octopus as well [124][125][126].
In order to compute the gradient, one must implement a backwards-propagation scheme for the costate, which does not differ from the ones used for the normal forwards propagation [127]. Note, however, that in some cases the backwards propagation does not have the exact same simple linear form than the forwards propagation, and may include inhomogeneous or non-linear terms. The final step is the computation of the gradient from the integral given in eq. (79).
The formulation of QOCT we have just sketched out is quite generic; in our case the quantum systems are those that can be modeled with Octopus (periodic systems are not supported at the moment), and the handle that is used to control the system is a time-dependent electric field, such as the ones that can be used to model a laser pulse. The set of parameters {u} i define the shape of this electric field; for example, they can be the Fourier coefficients of the field amplitude.
The usual formulation of QOCT assumes the linearity of quantum mechanics. However, the time-dependent KS equations are not linear, making both the theory and the numerics more complicated. We have extended the basic theory previously described to handle the TDDFT equations, and implemented the resulting equations in Octopus [128].
We conclude this section by briefly describing some of the applications of the QOCT machinery included in Octopus, which can give an idea of the range of possibilities that can be attempted. The study presented in Ref. [129] demonstrates the control of single-electron states in a two-dimensional semiconductor quantum-ring model. The states whose transitions are manipulated are the current-carrying states, which can be populated or de-populated with the help of circularly polarized light.
Reference [130] studies double quantum dots, and shows how the electron state of these systems can be manipulated with the help of electric fields tailored by QOCT.
Another interesting application is how to tailor the shape of femtosecond laser pulses in order to obtain maximal ionization of atoms and molecules [131]. The system chosen to demonstrate this possibility is the H + 2 molecule, irradiated with short (≈ 5 fs) high-intensity laser pulses.
The feasibility of using the electronic current to define the target functional of the QOCT formalism is considered in Ref. [132].
Finally, a series of works has studied the use of optimal control for photo-chemical control: the tailoring of laser pulses to create or break selected bonds in molecules. The underlying physical model should be based on TDDFT, and on a mixed quantum/classical scheme (within Octopus, Ehrenfest molecular dynamics). Some first attempts in this area were reported in Refs [133,134]. However, these works did not consider a fully consistent optimal control theory encompassing TDDFT and Ehrenfest dynamics. This theory has been recently presented [135], and the first computations demonstrating its feasibility will be reported soon.

X. PLASMONICS
The scope of real-space real-time approaches is not confined to the atomistic description of matter. For instance, finite-difference time-domain [136] (FDTD) is a standard numerical tool of computational electromagnetism, while lattice Boltzmann methods [137] (LBM) are widely used in computational fluid dynamics. Indeed, real-space real-time approaches can be used to model physical processes on rather different space and time scales. This observation also bears an important suggestion: numerical methods based on real-space grids can be used to bridge between these different space and time scales.
Numerical nanoplasmonics is a paradigmatic case for multiscale electronic-structure calculations. A nanoplasmonic system -e.g., made up of metal nanoparticles (MNPs) -can be a few tens of nanometers across, while the region of strong field enhancement -e.g., in the gap between two MNPs -can be less than 1 nm across [138]. The field enhancement, h (r), is essentially a classical observable, defined as where E tot is the total electric field, E ext is the external (or driving) electric field, and · · · indicates a time average. Large field enhancements are the key to single molecule surface-enhanced Raman spectroscopy (SERS) and values as large as h > 100 (the intensity of the SERS signal scales as h 4 ) are predicted by classical electromagnetic calculations [139]. In classical calculations, the electronic response is modeled by the macroscopic permittivity of the material. The classical Drude model gives the following simple and robust approximation of the metal (complex) permittivity: For gold, typical values of the high-frequency permittivity ∞ , the plasma frequency ω p , and the relaxation rate γ, are: ∞ = 9.5, ω = 8.95 eV and γ = 69.1 meV [140]. A non-local correction to the Drude model can also be included by considering the plasmon dispersion [141,142]. The metal (complex) permittivity then reads The parameter β can be fitted to model the experimental data, although the value β = 3/5 v F , where v F is the Fermi velocity, is suggested by the Thomas-Fermi approximation. [143] Regardless of the level of sophistication of the permittivity model, all classical calculations assume that electrons are strictly confined inside the metal surfaces. This is a safe approximation for microscopic plasmonic structures. However, at the nanoscale the electronic delocalization (or spillout) outside the metal surfaces becomes comparable to the smallest features of the plasmonic nanostructure, e.g., to the gap between two MNPs. In this scale, the very definition of a macroscopic permittivity is inappropriate and the electronic response must be obtained directly from the quantum dynamics of the electrons.
TDDFT is currently the method of choice to model the plasmonic response of MNPs [144][145][146][147][148][149][150], via the simplified jellium model, in which the nuclei and core electrons are described as a uniform positive charge density, and only the valence electrons are described explicitly. Early calculations -especially nanospheres [145,151] -have suggested the existence of new charge-transfer plasmonic modes, which have been also demonstrated by pioneering experiments [138]. In the future, as the field of quantum plasmonics [152] -i.e., the investigation and control of the quantum properties of plasmons -will further develop, the demand for accurate, yet scalable, numerical simulations to complement the experimental findings is expected to grow. This demand represents both a challenge and an opportunity for computational physics.
Scaling up the TDDFT@jellium method to model larger and more complex plasmonic nanostructures is a challenge which can be addressed by high-performance real-space real-time codes, like Octopus. The code has been initially applied to investigate the plasmonic response of single gold nanospheres (Wigner-Seitz radius, r s = 3.0 bohr) [146]. A clear plasmonic resonance appears in the absorption cross section -computed by realtime propagation -for spheres containing a large enough number of electrons (N e > 100). A new plasmonic mode, deemed the "quantum core plasmon", has been also suggested from the analysis of the absorption cross-section. This new mode has been further characterized by probing the sphere at its resonance frequency. Within a real-time propagation scheme, this is simply done by including an external electric field, the "laser pulse", oscillating at a given frequency.
As versatility is a major strength of real-space realtime approaches, other jellium geometries can be easily modeled by Octopus, including periodic structures. For instance, a pair of interacting sodium nanowires (with periodicity along their longitudinal direction) has been investigated to assess the accuracy of classical methods based on the model permittivity in eq. (83) and eq. (84). Compared to pairs of nanospheres, nanowires display a stronger inductive interaction due to their extended geometry [147,148]. This is manifest in the absorption cross-section which already shows a large split of the plasmonic peak for a small gap between the wires (see Fig. 10(a)). Due to the electronic spillout and the symmetry of the system, it also turns out that the largest field enhancement is reached at the center of the gap, not on the opposing surfaces of the nanowires as predicted by the classical methods (see Fig. 10(b)). The maximum field enhancement estimated by the TDDFT@jellium method is also smaller than the classical estimates. Once again, the quantum delocalization ignored by the classical methods plays a crucial role in "smearing" the singularities of the induced field, effectively curbing the local field enhancement.
Simple jellium geometries have been implemented in Octopus and they can be used as effective "superatomic pseudopotentials". The similarity between the jellium potential and atomic pseudopotentials can be further exploited to develop an external "jellium pseudopotential" generator to be used with Octopus. In this way, a larger selection of jellium geometries will be made available along with refined, yet scalable, jellium approaches to include d electron screening in noble metals [153]. Efforts in this direction are being currently made.
Finally, a word of caution about the domain of applicability of the TDDFT@jellium method is in order. The non-uniformity of the atomic lattice is expected to affect the absorption cross-section of small MNPs. A careful assessment of the lattice contributions -including the lattice symmetry -on the main plasmon modes of a pair of nanosphere is available [150]. This last investigation further demonstrates the possibility to bridge between atomistic and coarse-grained electronic calculations by means of a real-space real-time approach.

XI. DEVELOPMENT OF EXCHANGE AND CORRELATION FUNCTIONALS
The central quantity of the KS scheme of DFT is the xc energy E xc [n], which describes all non-trivial manybody effects. Clearly, the exact form of this quantity is unknown and it must be approximated in any practical application of DFT. We emphasize that the accuracy of any DFT calculation depends solely on the form of this quantity, as this is the only real approximation in DFT (neglecting numerical approximations that are normally controllable).
During the past 50 years, hundreds of different forms have appeared [154]. They are usually arranged in families, which have names such as generalized-gradient approximations (GGAs), meta-GGAs, hybrid functionals, etc. In 2001, John Perdew came up with a beautiful idea on how to illustrate these families and their relationship [155]. He ordered the families as rungs in a ladder that leads to the heaven of "chemical accuracy", which he christened the "Jacob's ladder" of density-functional approximations for the xc energy. Every rung adds a dependency on another quantity, thereby increasing the precision of the functional but also increasing the numerical complexity and the computational cost.
The first three rungs of this ladder are : (i) the localdensity approximation (LDA), where the functional has a local dependence on the density only; (ii) the generalizedgradient approximation (GGA), which includes also a local dependence on the gradient of the density ∇n(r); and (iii) the meta-GGA, which adds a local dependence on the Laplacian of the density and on the kinetic-energy density. In the fourth rung we have functionals that depend on the occupied KS orbitals, such as exact exchange or hybrid functionals. Finally, the fifth rung adds a dependence on the virtual KS orbitals. Support for the first three rungs and for the local part of the hybrid functionals in Octopus is provided through the Libxc library [156]. Libxc started as a spin-off project during the initial development of Octopus. At that point, it became clear that the task of evaluating the xc functional was completely independent of the main structure of the code, and could therefore be transformed into a stand-alone library. Over the years, Libxc became more and more independent of Octopus, and is now used in a variety of DFT codes. There are currently more than 150 xc functionals implemented in Libxc that are available in Octopus, a number that has been increasing steadily over the years. All of the standard functionals are included and many of the less common ones. There is also support for LDAs and GGAs of systems of reduced dimensionality (1D and 2D), which allow for direct comparisons with the direct solution of the many-body Schrödinger equation for model systems described in section XIII.
Octopus also includes support for other functionals of the fourth rung, such as exact exchange or the self-interaction correction of Perdew and Zunger [157], through the solution of the optimized effective potential equation. This can be done exactly [158], or within the Slater [159] or Krieger-Lee-Iafrate approximations [160].
Besides the functionals that are supported by Octopus, the code has served as a platform for the testing and development of new functionals. For example, the method described in section XIII can be used in a straightforward way to obtain reference data against which to benchmark the performance of a given xc functional, for example a one-dimensional LDA [161]. In that case, both calculations, exact and approximate, make use of the same realspace grid approach, which makes the comparison of the results obtained with both straightforward. Despite the obvious advantage of using exact solutions of the manybody problem as reference data, this is often not possible and one usually needs to resort to the more commonly used experimental or highly-accurate quantum-chemistry data. In this case, the flexibility of the real-space method, allowing for the calculation of many different properties of a wide variety of systems, is again an advantage. Octopus has therefore been used to benchmark the performance of xc functionals whose potential has a correct asymptotic behavior [162] when calculating ionization potentials and static polarizabilities of atoms, molecules, and hydrogen chains.
In this vein, Andrade and Aspuru-Guzik [163] proposed a method to obtain an asymptotically correct xc potential starting from any approximation. Their method is based on considering the xc potential as an electrostatic potential generated by a fictitious xc charge. In terms of this charge, the asymptotic condition is given as a simple formula that is local in real space and can be enforced by a simple procedure. The method, implemented in Octopus, was used to perform test calculations in molecules. Additionally, with this correction procedure it is possible to find accurate predictions for the derivative discontinuity and, hence, predict the fundamental gap [164].

XII. REAL-SPACE REDUCED DENSITY-MATRIX FUNCTIONAL THEORY
An alternative approach to DFT that can model electrons using a single-particle framework is reduced density matrix functional theory (RDMFT) [165]. Here, we present the current results of an ongoing effort to develop a real-space version of RDMFT and to implement it in the Octopus code.
Within RDMFT, the total energy of a system is given as a functional of the one-body reduced density-matrix (1-RDM) which can be written in its spectral representation as where the natural orbitals φ i (r) and their occupation numbers n i are the eigenfunctions and eigenvalues of the 1-RDM, respectively. In RDMFT the total energy is given by The third term is the Hartree energy, E H , and the fourth the xc energy, E xc . As in DFT, the exact functional of RDMFT is unknown. However, the part that needs to be approximated, E xc [γ], comes, contrary to DFT, only from the electron-electron interaction, as the interacting kinetic energy can be explicitly expressed in terms of γ. Different approximate functionals are employed and minimized with respect to the 1-RDM in order to find the ground state energy [166][167][168]. A common approximation for E xc is the Müller functional [169], which has the form and is the only E xc implemented in Octopus for the moment.
For closed-shell systems, the necessary and sufficient conditions for the 1-RDM to be N -representable [170], i.e. to correspond to a N -electron wavefunction, is that 0 ≤ n i ≤ 2 and Minimization of the energy functional of eq. (87) is performed under the N -representability constraints and the orthonormality requirements of the natural orbitals, The bounds on the occupation numbers are automatically satisfied by setting n i = 2 sin 2 (2πϑ i ) and varying ϑ i without constraints. The conditions (89) and (90) are taken into account via Lagrange multipliers µ and λ ij , respectively. Then, one can define the following functional which has to be stationary with respect to variations in {ϑ i }, {φ i (r)} and {φ * i (r)}. In any practical calculation the infinite sums have to be truncated including only a finite number of occupation numbers and natural orbitals. However, since the occupation numbers n j decay very quickly for j > N , this is not problematic.
The variation of Ω is done in two steps: for a fixed set of orbitals, the energy functional is minimized with respect to occupation numbers and, accordingly, for a fixed set of occupations the energy functional is minimized with respect to variations of the orbitals until overall convergence is achieved. As a starting point we use results from a Hartree-Fock calculation and first optimize the occupation numbers. Since the correct µ is not known, it is determined via bisection: for every µ the objective functional is minimized with respect to ϑ i until the condition (89) is satisfied.
Due to the dependence on the occupation numbers, the natural-orbital minimization does not lead to an eigenvalue equation like in DFT or Hartree-Fock. The implementation of the natural orbital minimization follows the method by Piris and Ugalde [171]. Varying Ω with respect to the orbitals for fixed occupation numbers one obtains At the extremum, the matrix of the Lagrange multipliers must be Hermitian, i.e.
Then one can define the off-diagonal elements of a Hermitian matrix F as: (94) where θ is the unit-step Heaviside function. We initialize the whole matrix as F ji = (λ ji + λ * ij )/2. In every iteration we diagonalize F, keeping the diagonal elements for the next iteration, while changing the off-diagonal ones to (94). At the solution all off-diagonal elements of this matrix vanish, hence, the matrices F and γ can be brought simultaneously to a diagonal form. Thus, the {φ i } which are the solutions of eq. (93) can be found by diagonalization of F in an iterative manner [171]. The criterion to exit the natural-orbital optimization is that the difference in the total energies calculated in two successive F diagonalizations is smaller than a threshold. Overall convergence is achieved when the difference in the total energies in two successive occupation-number optimizations and the non-diagonal matrix elements of F are close to zero.
As mentioned above, one needs an initial guess for the natural orbitals both for the first step of occupationnumber optimization but also for the optimization of the natural orbitals. A rather obvious choice would be the occupied and a few unoccupied orbitals resulting from a DFT or HF calculation. Unfortunately, there are unbound states among the HF/DFT unoccupied states which are a bad starting point for the weakly occupied natural orbitals. When calculated in a finite grid these orbitals are essentially the eigenstates of a particle in a box. Using the exact-exchange approximation (EXX) in an optimized-effective-potential framework results in a larger number of bound states than HF or the local density approximation (LDA) due to the EXX functional being self-interaction-free for both occupied and unoccupied orbitals. Using HF or LDA orbitals to start a RDMFT calculation, the natural orbitals do not converge to any reasonable shape, but even when starting from EXX one needs to further localize the unoccupied states. Thus, we have found that in order to improve the starting point for our calculation we can multiply each unoccupied orbital by a set of Gaussian functions centered at the positions of the atoms. As the unbound states are initially more delocalized than the bound ones, we choose a larger exponent for them.
In Fig. 11 we show the dissociation curve of H 2 obtained with RDMFT in Octopus and compare it with results obtained by the Gaussian-basis-set RDMFT code HIPPO [172]. For the Octopus calculation, we kept 13 natural orbitals with the smallest occupation number being of the order of 10 −5 after the RDMFT calculation had converged. The HIPPO calculation was performed using 30 natural orbitals. The RDMFT curve obtained with Octopus looks similar to the one from HIPPO and other Gaussian implementations of RDMFT [166], keeping the nice feature of not diverging strongly in the dissociation limit. However, for internuclear distances R bigger than 1 a.u., the real-space energy lies above the HIPPO one.
We believe that the remaining difference can be removed by further improving the initial guess for the orbitals that we use in Octopus, because a trial calculation using HF orbitals from a Gaussian implementation showed a curve almost identical to the one from the HIPPO code (not shown in the figure). In the future, we plan to include support for open-shell systems and additional xc functionals.

XIII. EXACT SOLUTION OF THE MANY-BODY SCHRÖDINGER EQUATION FOR FEW ELECTRONS
In one-dimensional systems, the fully interacting Hamiltonian for N electrons has the form where the interaction potential v int (x j , x k ) is usually Coulombic, though the following discussion also applies for other types of interaction, including more than twobody ones. In 1D one often uses the soft Coulomb interaction 1/ (x j − x k ) 2 + 1, where a softening parameter (usually set to one) is introduced in order to avoid the divergence at x j = x k , which is non-integrable in 1D. Mathematically, the Hamiltonian (eq. (95)) is equivalent to that of a single (and hence truly independent) electron in N dimensions, with the external potential (96) For small N it is numerically feasible to solve the Ndimensional Schrödinger equation which provides a spatial wave function for a single particle in N dimensions. This equivalence is not restricted to one-dimensional problems. One can generally map a problem of N electrons in d dimensions onto the problem of a single particle in N d dimensions, or indeed a problem with multiple types of particles (e.g. electrons and protons) in d dimensions, in the same way. What we exploit in Octopus is the basic machinery for solving the Schödinger equation in an arbitrary dimension, the spatial/grid bookkeeping, the ability to represent an arbitrary external potential, and the intrinsic parallelization. In order to keep our notation relatively simple, we will continue to discuss the case of an originally one-dimensional problem with N electrons. Grid-based solutions of the full Schrödinger equation are not new, and have been performed for many problems with either few electrons (in particular H 2 , D 2 and H + 2 ) [173,174]) or model interactions [175], including time-dependent cases [176].
The time-dependent propagation of the Schrödinger equation can be carried out in the same spirit, since the Hamiltonian is given explicitly and each "single-particle orbital" represents a full state of the system. A laser or electric-field perturbation can also be applied, depending on the charge of each particle (given in the input), and taking care to apply the same effective field to each particle along the polarization direction of the field (in 1D, the diagonal of the hyper-cube).
Solving eq. (97) leaves the problem of constructing a wave function which satisfies the anti-symmetry properties of N electrons in one dimension. For fermions one needs to ensure that those spatial wave functions Ψ j which are not the spatial part of a properly antisymmetric wave function are removed as allowed solutions for the N -electron problem. A graphical representation of which wave functions are allowed is given by the Young diagrams (or tableaux) for permutation symmetries, where each electron is assigned a box, and the boxes are then stacked in columns and rows (for details see, for example, Ref. [177]). Each box is labeled with a number from 1 to N such that the numbers increase from top to bottom and left to right.
All possible decorated Young diagrams for three and four electrons are shown in Fig. 12. Since there are two different spin states for electrons, our Young diagrams for the allowed spatial wave functions contain at most two columns. The diagram d) is not allowed for the wave function of three particles with spin 1/2, and diagrams k) to n) are not allowed for four particles. To connect a given wave function Ψ j with a diagram one has to symmetrize the wave function according to the diagram. For example, for diagram b) one would perform the following operations on a function Ψ(x 1 , x 2 , x 3 ) Hence, one symmetrizes with respect to an interchange of the first two variables, because they appear in the same row of the Young diagram, and anti-symmetrizes with respect to the first and third variable, as they appear on the same column. We note that we are referring to the position of the variable in the list, not the index, and that symmetrization always comes before antisymmetrization. At the end of these operations one calculates the norm of the resulting wave function. If it passes a certain threshold, by default set to 10 −5 , one keeps the obtained function as a proper fermionic spatial part. If the norm is below the threshold, one continues with the next allowed diagram until either a norm larger than the threshold is found or all diagrams are used up. If a solution Ψ j does not yield a norm above the threshold for any diagram it is removed since it corresponds to a wave function with only bosonic or other non-fermionic character. Generally, as the number of forbidden diagrams increases with N , the number of wave functions that need to be removed also increases quickly with N , in particular in the lowest part of the spectrum. The case of two electrons is specific, as all solutions of eq. (97) correspond to allowed fermionic wave functions: the symmetric ones to the singlet states and the antisymmetric ones to the triplet states. For example, for a one-dimensional Li atom with an external potential  III. Eigenstates for a one-dimensional lithium atom. The first and the fourth eigenstates show norms that are smaller than 10 −13 and 10 −11 , respectively, for all diagrams. Hence, these states are bosonic and removed from any further calculations. The second and third states are energetically degenerate and correspond to diagrams b) and c) in Fig. 12. The same is true for the fifth and sixth states. and the soft Coulomb interaction, we obtain the states and energy eigenvalues given in table III. If certain state energies are degenerate, the Young diagram "projection" contains an additional loop, ensuring that the same diagram is not used to symmetrize successive states: this would yield the same spatial part for each wave function in the degenerate sub-space. A given diagram is only used once in the sub-space, on the first state whose projection has significant weight.
The implementation also allows for the treatment of bosons, in which case the total wave function has to be symmetric under exchange of two particles. Here one will use a spin part symmetrized with the same Young diagram (instead of the mirror one for fermions), such that the total wave function becomes symmetric.
In order for the (anti-)symmetrization to work properly one needs to declare each particle in the calculation to be a fermion, a boson, or an anyon. In the latter case, the corresponding spatial variables are not considered at all in the (anti-)symmetrization procedure. One can also have more than one type of fermion or boson, in which case the symmetric requirements are only enforced for particles belonging to the same type.
There are also numerical constraints on the wavefunctions: space must be represented in a homogeneous hyper-cube, eventually allowing for different particle masses by modifying the kinetic-energy operator for the corresponding directions. All of the grid-partitioning algorithms intrinsic to octopus carry over to arbitrary dimensions, which allows for immediate parallelization of the calculations of the ground and excited states. The code can run with an arbitrary number of dimensions, however, the complexity and memory size grow exponentially with the number of particles simulated, as expected. Production runs have been executed up to 6 or 7 dimensions.
Most of the additional treatment for many-body quantities is actually post-processing of the wave-functions. For each state, the determination of the fermionic or bosonic nature by Young-tableau symmetrization is followed by the calculation and output of the density for each given particle type, if several are present. Other properties of the many-body wave-function can also be calculated. For example, Octopus can also output the one-body density matrix, provided in terms of its occupation numbers and natural orbitals.
This type of studies, even when they are limited to model systems of a few electrons, allows us to produce results that can be compared to lower levels of theory like approximate DFT or RDMFT, and to develop better approximations for the exchange and correlation term. Exact results obtained from such calculations have been used to assess the quality of a 1D LDA functional [161] and adiabatic 1D LDA and exact exchange in a TDDFT calculation calculation of photoemission spectra [161,178].

XIV. COMPRESSED SENSING AND ATOMISTIC SIMULATIONS
In order to obtain frequency-resolved quantities from real-time methods like molecular dynamics or electron dynamics, it is necessary to perform a spectral representation of the time-resolved signal. This is a standard operation that is usually performed using a discrete Fourier transform. Since the resolution of the spectrum is given by the length of the time signal, it is interesting to look for more methods that can provide us a spectrum of similar quality with shorter time series, as this is directly reflected in shorter computation times. Several such methods exist, but a particular one that has been explored in Octopus, due to its general applicability and efficiency, is compressed sensing.
Compressed sensing [179] is a general theory aimed at optimizing the amount of sampling required to reconstruct a signal. It is based on the idea of sparsity, a measure of how many zero coefficients a signal has when represented in a certain basis. Compressed sensing has been applied to many problems in experimental sciences [180][181][182] and technology [183,184] in order to perform more accurate measurements. Its ideas, however, can also be applied to computational work.
In order to calculate a spectrum in compressed sensing, we need to solve the so-called basis-pursuit optimization problem where |σ| 1 = k |σ k | is the standard 1-norm, τ is the discretized time series, σ is the frequency-resolved function (the spectrum that we want to calculate) and F is the Fourier-transform matrix.
Since τ is a short signal, its dimension is smaller than the one of σ. This implies that the linear equation Fσ = τ is under-determined and has many solutions, in this particular case, all the spectra that are compatible with our short time propagation. From all of these possible solutions, eq. (100) takes the one that has the smallest 1norm, that corresponds to the solution that has the most zero coefficients. For spectra, this means we are choosing the one with the fewest frequencies, which will tend to be the physical one, as for many cases we know that the spectra is composed of a discrete number of frequencies.
To solve eq. (100) numerically, we have implemented in Octopus the SPGL1 algorithm [185]. The solution typically takes a few minutes, which is two orders of magnitude more expensive than the standard Fourier transform, but this is negligible in comparison with the cost of the time propagation.
By applying compressed sensing to the determination of absorptional or vibrational spectra, it was found that a time signal a fifth of the length can be used in comparison with the standard Fourier transform [35]. This is translated into an impressive factor-of-five reduction in the computational time. This is illustrated in Fig. 13 where we show a spectrum calculated with compressed sensing from a 10 fs propagation, which has a resolution similar to a Fourier transform spectrum obtained with 50 fs of propagation time.
Moreover, the general conclusion that can be obtained from this work is that in the application of compressed sensing to simulations the reduction in the number of samples that compressed sensing produces in an experimental setup is translated into a reduction of the computational time. This concept inspired studies on how to carry the ideas of compressed sensing into the core of electronic-structure simulations. The first result of this effort is a method to use compressed sensing to reconstruct sparse matrices, that has direct application in the calculation of the Hessian matrix and vibrational frequencies from linear response (as discussed in section III). For this case, our method results in the reduction of the computational time by a factor of three [186]. Optical absorption spectrum from a methane molecule from real-time TDDFT. Comparison of the calculation using a Fourier transform and a propagation time of 50 fs (top, black curve) with compressed sensing and a propagation time of 10 fs (bottom, blue curve). Compressed sensing produces a similar resolution, with a propagation 5 times shorter.

XV. PARALLELIZATION, OPTIMIZATIONS AND GRAPHICS PROCESSING UNITS
Computational cost has been and still is a fundamental factor in the development of electronic structure methods, as the small spatial dimensions and the fast movement of electrons severely limit the size of systems that can be simulated. In order to study systems of interest as realistically and accurately as possible, electronicstructure codes must execute efficiently in modern computational platforms. This implies support for massively parallel platforms and modern parallel processors, including graphics processing units (GPUs).
Octopus has been shown to perform efficiently on parallel supercomputers, scaling to hundreds of thousands of cores [35,187]. The code also has an implementation of GPU acceleration [35,188] that has shown to be competitive in performance with Gaussian DFT running on GPUs [189].
Performance is not only important for established methods, but also for the implementation of new ideas. The simplicity of real-space grids allows us to provide Octopus developers with building blocks that they can use to produce highly efficient code without needing to know the details of the implementation, isolating them as much as possible from the optimization and parallelization requirements. In most cases, these building blocks allow developers to write code that is automatically parallel, efficient, and that can transparently run on GPUs. The type of operations available run from simple ones, like integration, linear algebra, and differential operators, to more sophisticated ones, like the application of a Hamiltonian or solvers for differential equations.
However, it is critical to expose an interface with the adequate level that hides the performance details, while still giving enough flexibility to the developers. For example, we have found that the traditional picture of a state as the basic object is not adequate for optimal performance, as it does not expose enough data parallelism [188]. In Octopus we use a higher-level interface where the basic object is a group of several states.
In the case of functions represented on the grid, the developers work with a linear array that contains the values of the field for each grid point. Additional data structures provide information about the grid structure. This level of abstraction makes it simple for developers to write code that works for different problem dimensionality, and different kinds and shapes of grids.
In terms of performance, by hiding the structure of the grid, we can use sophisticated methods to control how the grid points are stored in memory with the objective of using processor caches more efficiently in finitedifference operators. We have found that by using spacefilling curves [190], as shown in Fig. 14, and in particular the Hilbert curve [191,192], we can produce a significant improvement in the performance of semi-local operations. For example, in Fig.15 shows that a performance gain of around 50% can be obtained for the finite-difference Laplacian operator running on a GPU by using a Hilbert curve to map the grid into memory.
Examples of different mappings from a 2D grid to a linear array: (a) standard map, (b) grid mapped by small parallelepipedic subgrids, and (c) mapping given by a Hilbert space-filling curve. These last two mappings provide a much better memory locality for semi-local operations than the standard approach. Numerical performance of the Octopus finitedifference Laplacian implementation using different grid mappings. Spherical grid with 500,000 points. Computations with a AMD Radeon 7970 GPU. A speed up of around 50% is observed for the subgrid and Hilbert curve mappings.
Parallelization in Octopus is performed on different levels. The most basic one is domain decomposition, were the grid is divided in different regions that are assigned to each processor. For most operations, only the boundaries of the regions need to be communicated among processors. Since the grid can have a complicated shape dictated by the shape of the molecule, it is far from trivial to distribute the grid-points among processors. For this task we use a third-party library called ParMETIS [193]. This library provides routines to partition the grid ensuring a balance of points and minimizing the size of the boundary regions, and hence the communication costs. An example of grid partitioning is shown in Fig. 16.
Additional parallelization is provided by other data decomposition approaches that are combined with domain decomposition. This includes parallelization over KS states, and over k-points and spin. The latter parallelization strategy is quite efficient, since for each k-point or spin component the operations are independent. However, it is limited by the size of the system, and often is not available (as in the case of closed-shell molecules, for example).
The efficiency of the parallelization over KS states depends on the type of calculation being performed. For ground state calculations, the orthogonalization and subspace diagonalization routines [194] require the communication of states. In Octopus this is handled by parallel dense linear-algebra operations provided by the ScaLA-PACK library [195]. For real-time propagation, on the other hand, the orthogonalization is preserved by the propagation [34] and there is no need to communicate KS states between processors. This makes real-time TDDFT extremely efficient in massively parallel computers [35,196].
An operation that needs special care in parallel is the solution of the Poisson equation. Otherwise, it constitutes a bottleneck in parallelization, as a single Poisson solution is required independently of the number of states in the system. A considerable effort has been devoted to the problem of finding efficient parallel Poisson solvers that can keep up with the rest of the code [197]. We have found that the most efficient methods are based on FFTs, which require a different domain decomposition to perform efficiently. This introduces the additional problem of transferring the data between the two different data partitions. In Octopus this was overcome by creating a mapping at the initialization stage and using it during execution to efficiently communicate only the data that is strictly necessary between processors [187]. Example of adaptive mesh partitioning for a molecule of chlorophyll a. Simplified mesh with a spacing of 0.5Å and a radius of 2.5Å. Each color represents a domain, which will be distributed to a set of processors for parallel computation.

XVI. CONCLUSIONS
In this article, we have shown several recent developments in the realm of electronic-structure theory that have been based on the Octopus real-space code and made possible in part by the flexibility and simplicity of working with real-space grids. Most of them go beyond a mere implementation of existing theory and represent new ideas in their respective areas. We expect that many of these approaches will become part of the standard tools of physicists, chemists and material scientists, and in the future will be integrated into other electronic-structure codes.
These advances also illustrate the variety of applications of real-space electronic structure, many of which going beyond the traditional calculation schemes used in electronic structure, and might provide a way forward to tackle current and future challenges in the field.
What we have presented also shows some of the current challenges in real-space electronic structure. One example is the use of pseudo-potentials or other forms of projectors to represent the electron-ion interaction. Nonlocal potentials introduce additional complications on both the formulation, as shown by the case of magnetic response, and the implementation. Pseudo-potentials also include an additional, and in some cases, not wellcontrolled approximation. It would be interesting to study the possibility of developing an efficient method to perform full-potential calculations without additional computational cost, for example by using adaptive or radial grids.
Another challenge for real-space approaches is the cost of the calculation of two-body Coulomb integrals that appear in electron-hole linear response, RDMFT or hybrid xc functionals. In real-space these integrals are calculated in linear or quasi-linear time by considering them as a Poisson problem. However, the actual numerical cost can be quite large when compared with other operations. A fast approach to compute these integrals, perhaps by using an auxiliary basis, would certainly make the realspace approach more competitive for some applications.
The scalability of real-space grid methods makes them a good candidate for electronic-structure simulations in the future exaflop supercomputing systems expected for the end of the decade. In this aspect, the challenge is to develop high-performance implementations that can run efficiently on these machines.

XVII. ACKNOWLEDGMENTS
We would like to thank all the people that have contributed to Octopus and to the development and implementation of the applications presented in this article. In particular we would like to acknowledge Silvana Botti, Jacob Sanders, Johanna Fuks, Heiko Appel, Danilo Nitsche, and Daniele Varsano.
XA acknowledges that part of this work was performed under the auspices of the U.S. Department of Energy at Lawrence Livermore National Laboratory under Contract DE-AC52-07A27344. XA and AA-G would like to thank the support received from Nvidia Corporation through the CUDA Center of Excellence pro-gram and the US Defense Threat Reduction Agency under contract no. HDTRA1-10-1-0046. DAS acknowledges support from the U.S. National Science Foundation graduate research program and IGERT fellowships, and from ARPA-E under grant DE-AR0000180. MJTO acknowledges financial support from the Belgian FNRS through FRFC project number 2.4545.12 "Control of attosecond dynamics: applications to molecular reactivity". NH and IT received support from a Emmy-Noether grant from Deutsche Forschungsgemeinschaft. JAR, AV, UDG, AHL and AR ackowledge finan-