Garnet Kin-Lic Chan*
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, USA
First published on 11th July 2024
We describe the problems of quantum chemistry, the intuition behind classical heuristic methods used to solve them, a conjectured form of the classical complexity of quantum chemistry problems, and the subsequent opportunities for quantum advantage. This article is written for both quantum chemists and quantum information theorists. In particular, we attempt to summarize the domain of quantum chemistry problems as well as the chemical intuition that is applied to solve them within concrete statements (such as a classical heuristic cost conjecture) in the hope that this may stimulate future analysis.
This essay has four subsequent parts. First, we give a brief tour of quantum mechanics (QM) in chemical matter and review some applications of modern day interest. Second, we summarize a range of classical heuristics in quantum chemistry. Third, we give an intuitive perspective on how to think about complexity in quantum chemistry. Finally, we briefly discuss quantum algorithms for quantum chemistry and where to search for quantum advantage in the simulation of chemical matter.
A second source of confusion is that in quantum information discussions, chemistry usually means molecular chemistry, perhaps related to the popular view of chemistry as something to do with brightly colored solutions bubbling away in test-tubes. Yet in the chemical sciences, chemistry has long grown beyond only the molecular setting to encompass materials and biochemistry. In this broadened perspective, the defining characteristic of chemistry is the understanding of matter at the level of the chemical identity of the atoms, under the relatively accessible conditions of a chemical laboratory.
As quantum chemistry and quantum information come together, it is perhaps most interesting, and most useful, to use a broader definition of the term quantum chemistry, as the application of quantum mechanics to molecules, materials, and biomolecules under relatively accessible experimental conditions, which we collectively refer to here as chemical matter. In this essay, we will start with this broad definition of quantum chemistry. However, for reasons of scope and the expertise of the author, we will necessarily need to specialize and we focus only on the electronic structure aspects of quantum chemistry later on.
Hel = Tel + Vnuc–el + Vel–el + VNN | (1) |
Hel(R)Ψeli(r;R) = EeliΨeli(r;R) | (2) |
Electronic structure is, however, not the only way in which quantum mechanics enters into chemistry. For example, the time-independent electronic Schrödinger equation has its time-dependent counterpart
i∂tΨel(r;R) = HelΨel(r;R), | (3) |
In addition, the nuclei in the Born–Oppenheimer approximation may be thought of as particles moving in the effective classical potential that is the potential energy surface Eeli(R). The stationary quantum solutions then satisfy the time-independent nuclear Schrödinger equation
(Tnuc + Eeli)Φnuci,α(R) = Etoti,αΦnuci,α(R) | (4) |
i∂tΦnuc(R) = (Tnuc + Eeli)Φnuc(R) | (5) |
As we discuss somewhat more technically in Section 3.7, the existence of locality in chemistry can reasonably be argued to be partly due to the laws of nature, and partly due to the lack of perfect quantum control over chemical matter, which limits the types of systems that are studied in chemical experiments. Whatever the origin, the locality principle limits the complexity of quantum effects: coherence and entanglement are limited to finite distances and finite times. The task of computational quantum simulations is to address quantum effects up these finite scales.
The motivation for new methods in quantum chemistry naturally arises due to limitations in our capabilities in the above tasks. As all methods are approximate, the limitations are usually either of the kind that the approximation is not accurate enough, or the approximation is not cheap enough! These are clearly two sides of the same coin, but as many heuristics have a setting that one can adjust to balance cost versus accuracy, the first problem refers to the case where dialing up the accuracy leads to us exceeding our computational budget, while the second refers to the case where dialing down the accuracy still does not lead to a fast enough calculation (or else produces nonsense).
Discussing the methods and the limitations in more detail for all the above tasks exceeds the scope of this essay and the capabilities of this author. In the remainder, we focus primarily on the task of solving the electronic Schrödinger equation, the original domain of quantum chemistry.
Assuming that the protein is relatively rigid and the target site and general orientation (“pose”) of the drug are already known, computing the binding affinity, however, is still quite non-trivial. Simple approximate mean-field quantum chemistry methods – such as density functional theory (DFT, see Section 3.3) – in most cases yield sufficient accuracy in the electronic energy, and a density functional calculation of the energy can indeed today be carried out on small proteins and drugs.3 However, the challenge arises from the fact that the binding affinity is not a zero-temperature energy, but a free energy, that includes the entropic effects from the solvent and the protein. In actual binding events, the entropy and energy effects are often similar in magnitude with opposite sign, with the binding free energy being the small residual.4 The motion of the atoms can be assumed to be entirely classical, described by Newton’s equations, but in standard methodologies (such as free-energy perturbation theory which computes the relative affinities of ligands relative to a base ligand) one needs to simulate, even under optimistic assumptions, ∼100 ns of dynamics, or roughly 108 energies and forces. For this to complete in a realistic turnaround time (say 1 day of compute) each electronic energy and gradient must be computed in ∼1 millisecond.
Thus computing ligand affinities is a problem where standard QM simulations, as applied directly to the protein plus ligand complex, are simply not fast enough. Instead, practical solutions for free energy sampling involve empirical models trained to the QM data: these either abandon the QM description entirely for a parametrized “force-field” form of the potential energy surface (perhaps, today, enhanced with a general function approximator such as a neural network); or else simulate only a small part of the problem near the binding site quantum mechanically (and the rest with force fields); these methods can be combined with statistical mechanical “enhanced sampling” techniques to reduce the number of atomic configurations to be computed for the free energy. The goal of developing new quantum methods in this area (such as faster DFT algorithms or even cheaper mean-field methods) is to partially bridge the gap to the point that training techniques have sufficient data; then they can be hoped to generalize accurately to the timescales of interest.
N2 + 3H2 → 2NH3 | (6) |
N2 + 8H+ + 8e− + 16MgATP → 2NH3 + H2 + 16MgADP + 16Pi | (7) |
One oft-quoted statistic about the Haber–Bosch process is that it consumes an enormous amount of energy, e.g. 2 percent of the world’s energy supply. Efficient Haber–Bosch plants today use about 26 GJ per ton of ammonia.5 It is tempting to think that biological nitrogen fixation uses much less energy, but this is not simple to establish. For example, the above equation suggests 16 ATP molecules are consumed for every N2 molecule, translating under physiological ATP hydrolysis conditions (which can vary, leading to slightly different numbers6) to a nominal 24 GJ per ton of ammonia, a comparable energy cost. But this is not an end-to-end cost, e.g. it does not include the costs of synthesizing and maintaining the nitrogenase machinery (a significant component of the cellular proteome in nitrogen fixing cells) as well as general cellular function and homeostasis. In addition, there are other challenges besides energy consumption in industrializing a biological process, for example, the nitrogenase enzyme is extremely slow. The chemical motivation to study nitrogenase is thus less to produce an energy-efficient replacement of the Haber–Bosch process but rather because it is an interesting system in its own right, and perhaps it may motivate how to understand and design other catalysts that can activate and break the nitrogen–nitrogen triple bond under ambient conditions. In this context, some basic chemical questions to answer include (i) where do the various species in the reaction bind to the enzyme, and (ii) what is the step-by-step catalytic mechanism?
The focal point of the earliest quantum chemistry calculations has mainly been to determine the electronic structure of the Fe–S clusters that are the active sites of nitrogenase, which would seem to be the starting point to understand their catalytic activity. This problem is non-trivial because the Fe atoms contain 3d orbitals and the associated electrons are “strongly correlated” (discussed in detail in Section 3.6), posing a practical challenge for many common quantum chemistry approximations. At the current time, there is a reasonable understanding of the general qualitative features of the electronic structure of the iron–sulfur clusters from a combination of quantum chemistry calculations and experimental spectroscopy.7–9 However, the computational challenge of treating the strongly correlated electronic structure at a quantitative level means that the precise nature of the ground-state of the important FeMo cofactor (where nitrogen reduction takes place) is not fully resolved.
Computing the electronic ground state, although a useful reference point, is still of limited value in deciphering the mechanism. More directly useful, while still computationally accessible, quantities are the zero-temperature relative energetics e.g. to determine proton binding sites and protonation states of the Fe–S clusters and their environments. This requires simulating not just the isolated Fe–S clusters but also the surrounding residues, a simulation scale of a few hundred atoms.10 The only practical quantum chemistry method for this task currently is density functional theory. Although it is well known that DFT can yield erroneous predictions in transition metal systems, one can still obtain useful information by looking for consistent results between different DFT approximations and constraining the results by experimental data. There is great need for improvement in density functionals (or other methods) that can be applied at the scale of these problems which will impact our understanding here. Looking beyond zero-temperature energies to free energies, there are monumental challenges to overcome similar to those discussed for ligand binding free energies, but now with the additional complications of the non-trivial electronic structure.
While many mysteries of high temperature superconductors remain unexplained, it is important to recognize that after almost four decades of work, there is substantial partial understanding from both theory and experiment. Limiting oneself only to ground-states and static properties, in the context of simple models such as the 1-band (Fermi)–Hubbard model, numerical calculations nowadays give a fairly comprehensive view of the potential ground-states, although in some parts of the phase diagram it is not entirely clear what the true ground-state order is due to close competition between different orders.11–13 We also note that none of the proposed ground-states are of some qualitatively “classically intractable variety” (i.e. the orders still have a succinct classical description, see Section 4.4); but the uncertainty arises because the precise refinement of their energies requires too large a cost. For the zero-temperature states, in the author’s view achieving a 10× improvement in accuracy over the current state-of-the-art would likely resolve most outstanding questions about this energetic competition.
In the context of material specific predictions, by neglecting some of the physics it is also becoming possible to perform concrete calculations of the low-temperature properties of different material structures. Both parametrized Hamiltonians14,15 as well as fully multi-orbital ab initio approaches16,17 have been deployed for this task, and predictions can be made of modest quality. The techniques involved are extensions of the simulation methods used with simpler models, with a polynomial increase in computational complexity associated with the increased material realism. On the other hand, in practice one still needs to make choices about what physics to include and what to neglect. Much work remains to improve all aspects of these simulation methods in terms of speed, accuracy, and generality (e.g. to simultaneously include correlated electron, phonon, and finite temperature effects). We are still quite far from being able to make precise predictions about Tc in calculations that cover all microscopic mechanisms of potential interest.
The (non-relativistic) electronic Hamiltonian is written in atomic units as
(8) |
As basic intuition, we first consider the energy scales of electrons in atoms. Atomic orbitals are defined as eigenstates of an (effective) 1-particle Hamiltonian (see mean-field methods below, Section 3.3), and we typically divide the orbitals into core, valence, Rydberg, and continuum states. The core states lie at an energy of O(Z2) (where Z is the nuclear charge); valence states at energies of O(1); Rydberg states at close to 0, and the continuum states at positive energies.
Chemistry mainly involves the valence orbitals. As atoms come together in a bond (in a single-particle mean-field description) the valence orbitals interact with each other and their energies split, and the range of splittings takes a range of values, say O(0.01–1) (atomic units, i.e. Hartrees) in chemically relevant systems; we might denote this energy scale as t. The Coulomb matrix elements of the valence orbitals (on the same atom) are typically in the range of O(0.1–1) (atomic units); we can denote this U. Thus the effect of the Coulomb interactions on the mean-field picture of electronic structure can be understood in terms of the ratio of these energies U/t; for large U/t the Coulomb interaction is non-perturbative, or “strong”, while for small U/t it is a perturbation, or “weak”. We emphasize that in a material, it is the full range of splittings, not an individual energy level splitting such as the bandgap, which determines if the Coulomb interaction should be thought of as strong. This is because the number of electrons immediately around the bandgap (or Fermi surface) is vanishingly small compared to the total number of electrons and does not determine the overall electronic structure. The correct ratio is thus that of the Coulomb interaction to the bandwidth, which is O(U/t).
The implication of the weak and strong limits of the Coulomb interaction is that they lend themselves to different kinds of approximations. Note that strong does not necessarily mean qualitatively complicated, because non-perturbative effects may be captured by a modified starting point, such as a different mean-field theory. We return to this in Sections 3.3 and 4.4.
Fortunately, in the low-lying electronic states in chemical matter, the electrons are largely confined. This greatly simplifies basis construction because (i) we know where the electrons are spatially localized and (ii) we know the feature size of the electrons.
The most popular basis functions have historically been chosen to simplify the required Hamiltonian matrix elements in the basis. They include atomic basis sets (commonly Gaussian basis sets) and plane wave basis sets. In Gaussian bases, a linear combination of Gaussian functions are centered on each nucleus of the problem, i.e.
χ(r) = (x − Ax)i(y − Ay)j(z − Az)ke−α(r − A)2 | (9) |
Plane wave bases are constructed by placing the system of interest in a finite box of volume V (say cubic, for simplicity) with periodic boundary conditions. This suggests a natural basis function of the form
(10) |
Regardless of the choice of single-particle basis, it is the sharpest features of the function to be represented that dominate the asymptotic rate of convergence. If the function is not infinitely differentiable, then in general the error of a basis expansion will converge algebraically with the number of basis functions M. In the eigenstates of the Hamiltonian, non-differentiable features arise when the Hamiltonian is singular. For mean-field theories, such as Hartree–Fock theory and density functional theory (Section 3.3) where the goal is to determine the single particle orbitals ϕ(r), the sharp features are the orbital cusps at the nuclei, which is not infinitely differentiable. Plane wave expansions then converge algebraically, but because in a Gaussian basis the functions can be centered on the nucleus, and the exponents can be chosen in a non-uniform manner, superalgebraic convergence can be achieved.
In the many-body wavefunction however, there is another cusp in the inter-electronic coordinate due to the electron–electron Coulomb singularity at coincidence. The presence of this cusp leads to a convergence with respect to basis size M that is ∼O(1/M) regardless of the choice of single-particle basis function above (for a longer discussion that gathers some of the known results, see Appendix E of ref. 18). This slow rate of convergence is known as the basis set problem, and remedies to this problem, in the form of introducing analytic functions with an explicit dependence on the inter-electron coordinate, are known as explicit correlation methods.19
Because electrons are identical particles, it is convenient to rewrite the Hamiltonian in terms of second quantization. Once we have chosen a basis (and assuming the basis has been orthogonalized e.g. in the case of Gaussian), the Hamiltonian becomes
(11) |
(12) |
(13) |
Conceptually, the simplest mean-field theory is Hartree–Fock theory. Here, the molecular orbitals are determined by variationally minimizing the energy E = 〈Φ|H|Φ〉/〈Φ|Φ〉, which leads to a non-linear eigenvalue problem, the Hartree–Fock equations. In an orthogonal basis this takes the form
F(C)C = Cε | (14) |
Another common mean-field theory, closely related to Hartree–Fock theory, is Kohn–Sham density functional theory. Because all information in a Slater determinant (other than its absolute phase) is contained in the single particle density matrix , the Hartree–Fock approximation to the energy is a functional of the single-particle density matrix; E ≡ E[γ]. Density functional theory takes this one step further and writes the ground-state energy as a functional of the single-particle real-space density, ρ(r) = ∑ijγijϕi(r)ϕj(r); E ≡ E[ρ]. Intuitively, this functional dependence is achieved because it is valid only for the ground-state, and because the set of electronic Hamiltonians can be labelled by the single particle potentials vext(r), a label of the same dimensionality as ρ(r).
In the Kohn–Sham version of density functional theory, which is the most widely applied form (and what people mean by DFT without further qualification) this energy is broken into several components
E[ρ] = Ts[ρ] + Vne[ρ] + J[ρ] + Exc[ρ] | (15) |
The cost of Hartree–Fock and DFT calculations can be summarized as the cost to evaluate the energy in a basis (which is related to the cost of computing the Hamiltonian matrix elements) together with the cost to minimize (to a local minimum or stationary point) the Hartree–Fock or DFT energy. The most common way to find a minimizing solution is to solve the Hartree–Fock or Kohn–Sham non-linear eigenvalue problem. Depending on the basis representation, building the Fock matrix can be done in O(N3) − O(N4) time, while solving the Fock equations can be performed as a set of self-consistent diagonalizations, each of which takes O(N3) time. The number of iterations is unknown and this procedure only guarantees a stationary point of the energy; formally finding the global minimum, being a non-convex optimization, is computationally NP hard.
The formal NP hardness of mean-field optimization, however, provides an important example of the difference in intuition offered by worst case complexity and the practical experience of quantum chemists. In practice, we have a lot of information about the kinds of solutions we are looking for. For example, the principle of locality (see Section 3.7) suggests that it is useful to assemble a guess of the Hartree–Fock solution from solutions of atomic Hartree–Fock calculations, and starting from such guesses often provides quick convergence to a local minimum. In large systems, locality also reduces the cost in other ways, due to the sparsity of the Hamiltonian in a local basis. And of course, one can argue that large physical systems cannot reach their global minimum anyways, if such a minimum is hard to find.
The first is that often in real chemical applications, even if one is interested in only the zero-temperature energy differences, multiple electronic structure calculations must be performed to optimize the nuclear positions and explore the potential energy surface (see e.g. Section 2.7). Mean-field methods therefore enter as an initial, if imperfect, guide to this landscape.
The second reason is arguably deeper, and is related to the existence of different types of mean-field solutions. The non-convexity of the Hartree–Fock (or approximate Kohn–Sham) energy optimization means that multiple low-energy local minima can exist. As every Slater determinant is a type of classical state, in the sense that it has no entanglement in the molecular orbital basis, this means that there are many potential “classical” approximations to the quantum ground-state. In general, the local minimizing Slater determinants need not transform as an irreducible representation of the symmetry group of H. When they do not, we say the mean-field solution is a broken symmetry solution. But in the thermodynamic limit, the true ground-state may indeed also break symmetry, a phenomenon known as spontaneous symmetry breaking. (In fact, symmetry breaking characterizes many of the common electronic phases of matter, such as ferromagnets, antiferromagnets, superconductors and others).
Consequently, although a given mean-field calculation may not provide a reasonable description of the system of interest, the set of low-energy mean-field states may still contain an appropriate qualitative classical description among them. Selecting the appropriate mean-field reference recovers a simple and intuitive picture of the eigenstate that may be the starting point for more advanced treatments. We return to this below.
Weak correlations are sometimes referred to as “dynamic” correlations. On the atomic scale, they arise as fluctuations of occupancy between orbitals that are widely separated in energy (such as the valence and the continuum). On longer scales, they also include fluctuations involving orbitals which are spatially widely separated (and thus weakly coupled).
Although individually such fluctuations are small they can add up to a large effect, for example, by integrating over the continuum of orbitals to excite, or over a large domain in space. In practice, dynamic correlations are thus always quantitatively important. A traditional rule of thumb is that dynamic correlation introduces about 1 eV of correlation energy per pair of electrons in an occupied orbital.21 Excitations to high energy continuum states can modify the wavefunction on small length scales and thus help to capture features near the singularity of the electron–electron cusp in the wavefunction. Because of the need to describe excitations to the continuum, dynamic correlation requires large basis sets to capture faithfully.
The first is that the degeneracy of valence electron configurations (on the scale of the fluctuations) may only involve a few orbitals. This is encountered in chemical processes such as bond breaking, where the stretching of a given bond causes the valence orbitals associated only with that bond to become strongly correlated, or in a molecular photoexcitation, where only specific orbitals are involved in the excitation process, or in a finite magnetically coupled molecule, with a finite number of coupled spins. Since only a small number of orbitals are involved in any given bond or the excitation or in the spin couplings, this limits the possible degeneracy.
The second simplification occurs when there is a large number of nearly degenerate valence orbitals in a material, as occurs for example, when there are d or f orbital bands. One might worry that one needs an asymptotically exponentially large number of configurations to mix, but this is a scenario where it is useful to consider the entire set of low-energy mean-field solutions, including the broken symmetry solutions. If there is only a small (i.e. non-exponentially growing) number of low-energy mean-field broken-symmetry solutions, then for many problems one can assume that the true ground-state in fact is described by one of them, spontaneously breaking symmetry. For example, imagine that for a lattice translationally invariant (crystal) Hamiltonian, one finds a set of translational symmetry breaking mean-field states at low energy. As the system size increases, if these states differ by more and more fluctuations because they correspond to spatial electronic patterns which are globally displaced with respect to each other, then they will not couple via a finite power of the Hamiltonian in the thermodynamic limit, and one can choose any one of them as the starting point to describe the eigenstate. (Should one wish to restore the translational invariance, one can apply the translation operator to the symmetry broken state, generating a linear, i.e. not exponentially large, number of additional configurations). There is an intrinsic length-scale introduced by the symmetry breaking mechanism: once the system size is sufficiently large that the coupling between different broken-symmetry solutions is smaller than the energy accuracy we care about (or our control over the energy of the state in experiment), typically O(kT), then we might as well choose to describe the system with a symmetry broken mean-field state. Then once the appropriate mean-field state is chosen, one can view the remaining correlation as weak correlation.
A third simplification applies even where there is an exponential degeneracy (or near-degeneracy) in the (broken symmetry) mean-field configurations. While this type of exponential degeneracy is not much encountered in systems that are currently of most interest in chemistry, we discuss it here because it does arise in real chemical matter, particularly in the context of magnetism, where it is referred to as frustration. The main simplification that appears here is that even when there is frustration, correlations in chemical matter remain quite local. This makes it possible to represent the low-energy eigenstates with techniques that forgo any mean-field starting point but build in locality, for example the tensor networks discussed in Section 3.9.
The vast majority of problems studied in chemical matter are gapped, even if only because they have a finite size. But it is important to note that even in critical systems, the correlations still decay at long distances, and thus are not as non-local as they could potentially be.
Another way to characterize quantum correlations is via the (bipartite) entanglement entropy S of the state. This counts the (logarithm of) the number of quantum degrees of freedom that are entangled across a spatial dividing surface (the “boundary”). In terms of entanglement entropy, locality is associated with an area law of entanglement entropy, S ∼ O(L), where L is the area of the dividing surface (i.e. only degrees of freedom near a surface can be entangled with the degrees of freedom on the other side). Critical systems show a small correction to the area law, S ∼ O(LlogL).24
The main intuition for locality is simple, namely it stems from the (electronic) Hamiltonian operator, which in a local basis, is quite local. For example, taking the second quantized expression in eqn (11), the one-electron matrix elements in an exponentially localized basis decay exponentially with the separation between basis functions i and j. The two-electron matrix elements that derive from the Coulomb interaction decay somewhat slowly like 1/r12, but in electrically neutral matter, the presence of the nuclear charges on average eliminates the longest range effects. We are then interested in the low-energy eigenstates of the Hamiltonian with this local interaction. Because the ground-state can be obtained as a function of the Hamiltonian (think about the projector onto the ground-state e−βH as β → ∞), or so the intuition goes, then the low-energy eigenstates inherit this locality in their correlations.
But the above argument is not rigorously right and proving locality of ground-states for different classes of Hamiltonians without additional strong assumptions is notoriously difficult. To see where the difficulty might arise, consider the ground-state as limβ→∞e−βH|Φ〉 (where |Φ〉 is some arbitrary state). β is here imaginary time, which is naively analogous to an inverse temperature, so we might think that the “temperature” need only be significantly below the gap to the first excited state. But for a system of size L, even if we assume that the Hamiltonian has a constant gap between the ground and first excited state (i.e. independent of L) then one needs β ∼ O(L) to avoid the orthogonality catastrophe, as the overlap of |Φ〉 with the true ground-state |Ψ0〉 is e−O(L). Then, because the Hamiltonian is being applied O(L) times (e.g. in a polynomial approximation to e−βH), these repeated applications could in principle generate longer and longer correlations.
Regardless of this formal difficulty, chemical intuition is that for most problems we can view the ground-state of a chemical system as indistinguishable from that of a system at a low finite temperature (and, of course, all experiments are performed at a finite temperature). More precisely, this means we really want to model the canonical density operator Γ = e−βH at some large but system-size independent β (i.e. a fixed low temperature), and the empirical finding is that this will not yield very different properties from the ground-state. The use of a fixed, rather than system-size dependent, β is consistent with locality, but the statement that e−βH and the ground-state yield similar predictions for observables is only true if there is not some very large (exponential in the thermodynamic limit) number of excited states lying just above the ground-state (which would otherwise dominate due to entropy). (For formal connections between the density of states and area laws, see e.g. ref. 25.) This is violated in some frustrated quantum problems, such as quantum spin glasses, but these are not the typical systems encountered in chemical matter.
Ultimately, we can make a case that locality is prevalent in chemical matter for the combination of reasons that the electronic Hamiltonian is naturally local, and because of a lack of precise control: we do not actually simulate the ground-state, nor do we commonly control chemical experiments and matter sufficiently well, to model Hamiltonians where the low-temperature thermal state and ground-state yield different predictions. Precise quantum control of atoms and molecules is improving in the context of quantum computing. These intuitions about chemical matter may thus need to change in the future if and when the nature of chemical experiments also changes.
Finally, we observe that the locality of quantum correlations does not itself mean that solving the electronic structure problem to find the ground state can immediately be restricted to the cutoff length associated with the correlations alone. This is because even if there are no quantum correlations at all, there can still be non-trivial classical correlations over long distances.
As a simple example, we can consider a classical Ising model with antiferromagnetic interactions on a geometrically frustrated lattice (e.g. one with triangles). Then the ground-state is a classical state where the connected correlation functions all vanish, i.e. it is some arrangement of up and down spins, that is simply a product state. But we cannot find this product state just by satisfying antiferromagnetic constraints around each triangle (since they cannot be simultaneously satisfied, i.e. it is frustrated). Instead, finding the ground-state requires looking over longer length scales to minimize the frustration. In the worst case, encountered in classical spin glasses, one needs to consider the exponential space of configurations.
Regardless, in many kinds of chemical matter, finding the ground-state locally in some finite subregion (e.g. measuring observables by simulating a portion of a molecule) does yield results compatible with simulating the ground-state problem as a whole. We can refer to this deeper manifestation of locality as strong locality. In the case of strong locality, the need to model quantum effects is entirely limited to the subregion itself.
• do they capture weak (dynamic) correlation, strong (multireference) correlation, or both,
• are they systematically improvable,
• is the energy extensive (i.e. does one obtain meaningful predictions for a large number of atoms or in the thermodynamic limit of materials),
• what is the cost scaling as a function of system size and as a function of accuracy,
• what is the absolute cost (i.e. what are the computational prefactors?),
• is the method applicable to lattice models only or can it be used with realistic Hamiltonians?
(16) |
The characteristics of configuration interaction methods are listed in Table 1. Today, it is commonly used in molecular applications to describe strong correlation in valence orbitals (often an “active” space, i.e. all valence configurations), but it has also seen a resurgence of interest in the form of selected configuration, where |ΦI〉 is chosen in a problem specific manner to sparsely span the Hilbert space, and the number of configurations (in a small molecule) is treated as a convergence parameter, to converge to the exact result.27,28 The lack of extensivity limits the application of configuration interaction to small problems.
Weak and strong | Both |
Systematically improvable | Yes |
Extensive energy | Only with exponential cost |
Cost scaling | Poly. to exp. depending on truncation |
Prefactors | Very low |
Lattice models and ab initio | Both |
(17) |
Perturbation theory has a number of nice properties as illustrated in Table 2. For example, it leads to an extensive energy (because E(λ) is extensive, so its derivatives are extensive). As the sum of the zeroth and first-order energies is just the Hartree–Fock energy, the lowest non-trivial order is the second-order correction to the energy, conventionally denoted MP2 (Moller–Plesset second order perturbation theory). This is a fast and often qualitatively accurate correction to the Hartree–Fock results in systems where the Hartree–Fock energy gap is large (as is often true in small molecules).29 Applying perturbation theory to higher order rapidly becomes expensive, and need not lead to a better result because the theory usually diverges.30 Higher order perturbation theory contributions therefore usually come from one or more iterative techniques to resum certain classes of connected diagrams: these methods yield partial contributions to each order of perturbation theory up to infinite order.31
Weak and strong | Weak (unless used with other methods) |
Systematically improvable | Yes (if converges) |
Extensive energy | Yes |
Cost scaling | Poly. |
Prefactors | Very low |
Lattice models and ab initio | Both |
(18) |
(19) |
(20) |
Coupled cluster theory, for some low-order truncation of T (usually to the first two terms in eqn (20), denoted singles and doubles, and with an approximate correction for the next term, the triples, known as CCSD(T)) is the most widely used many-electron wavefunction method in quantum chemistry.32 This is because it has good formal and practical properties and provides a good balance between cost and accuracy (see Table 3); for example, when truncated to the singles and doubles level, the coupled cluster ansatz it is still exact for any problem that breaks down into sets of independent two-electron problems. In practice, for problems that are not too strongly correlated, CCSD(T) gives results for ground-state energy differences approaching “chemical accuracy” and, even for more strongly correlated problems, assuming an appropriate Slater determinant starting point can be found (see Section 3.6) it provides a reasonably accurate treatment of the quantum fluctuations around such a determinant. The main drawbacks of coupled cluster are that its costs grows exponentially with the truncation order, and thus it can become impractically expensive when one does not have a good single Slater determinant starting point. (It can fail also in more significant ways that have not been fully analyzed, such as having no real valued solutions at a given truncation order,33 and also the solution conditions can be hard to converge.) These weaknesses manifest even if only a small number of Slater determinants are required in the multireference treatment if the participating determinants are separated by large number of excitations, perhaps as few as two. As a practical method, therefore, it does not extend to the full set of problems covered by the simplifications discussed in the context of strong correlation in Section 3.6.
Weak and strong | Weak to medium |
Systematically improvable | Yes |
Extensive energy | Yes |
Cost scaling | Poly. |
Prefactors | Low |
Lattice models and ab initio | Both |
(21) |
(22) |
The structure of the MPS is quite different from that of the approximations so far discussed, which are based on limiting the complexity of excitations relative to a given Slater determinant |Φ0〉, such as the Hartree–Fock reference.35 Assuming the occupancy basis in eqn (22) is the Hartree–Fock orbital basis, the Hartree–Fock reference enters as just one of the possible occupancies, on much the same footing as the others. On the other hand, the flexibility of the amplitudes is severely limited as they must satisfy the near-product structure, and there is an inherent one-dimensional nature to the ansatz: more correlations can be captured between orbitals with indices i, j if |i − j| is small. In other words, MPS provide a parametrization of quantum states that efficiently enforce the correct structure of locality in one dimension. In model systems, it can be proved that the ground-states of gapped local Hamiltonians in 1D have an efficient representation as a matrix product state.36 This means that the bond-dimension needed to represent the state to some accuracy is independent of system size.
In practice, however, MPS and the density matrix renormalization group (DMRG) algorithm that provides a practical way to variationally optimize MPS37,38 have found applications significantly outside of pure 1D model Hamiltonians.39 The ansatz is always exact as D is increased, and this, coupled with the efficient formulation in terms of matrix multiplications, and the lack of bias towards a Hartree–Fock (or other mean-field) reference, means that it is a good replacement for full configuration interaction/exact diagonalization for problem sizes that are too large for such methods and when there is not another simpler alternative ansatz (Table 4). In quantum chemistry, it was one of the first widely used techniques for molecules with a large number of strongly correlated electrons,40–43 and remains an important technique for challenging problems today.44,45
Weak and strong | Both; less efficient than other methods for weak |
Systematically improvable | Yes |
Extensive energy | Yes |
Cost scaling | Poly. for fixed D |
Prefactors | High (DMRG), very high (TN) |
Lattice models and ab initio | Both for MPS/DMRG, lattice for TN |
Tensor networks refer to the natural generalization of the matrix product state beyond the 1D entanglement structure.46 Different kinds of tensor networks exist and some, such as projected entangled pair states, are now widely used with model Hamiltonians in two dimensions where they are a powerful alternative to MPS/DMRG. However, the algorithms that have been developed to work with them are currently only practical for model Hamiltonians; the overhead of more complicated long-range interactions is a formidable challenge.47 Extending these techniques to ab initio Hamiltonians remains an open scientific problem.
(23) |
Weak and strong | Both, may require special trial |
Systematically improvable | Yes (if trial is) |
Extensive energy | Yes (if trial is) |
Cost scaling | Poly. (if trial is) |
Prefactors | High to very high |
Lattice models and ab initio | Both |
The advantage of variational Monte Carlo is that very general functional forms can be used: it is often easier to find functions for which Ψ(n) can be efficiently evaluated, than to find ones for which 〈Ψ|Ψ〉 can be evaluated. (The first is analogous to evaluating a high-dimensional function value, while the second is analogous to evaluating a high-dimensional integral.) In recent years, the rapid development of neural networks as general function approximators has led to the use of neural network parametrizations of Ψ(n), which have been termed neural quantum states.49–52 Note that the amplitudes can be written in the antisymmetric space of Slater determinants, or simply in the space of product functions: in the latter case, care must be taken to ensure that Ψ(n) is antisymmetric with respect to particle intercharge.
Usually, evaluating a sample Ψ(n) in variational Monte Carlo scales better than the cost to deterministically evaluate the energy using another approximate method. Common variants exhibit the same sample scaling as mean-field theory. On the other hand, one needs to have a sufficient number of samples to control the stochastic error, and in large systems with many parameters, one needs to optimize them in the presence of the stochastic error. Since stochastic errors in different systems do not generally cancel, it is often necessary to compute energies to a fixed precision, rather than a fixed precision per particle. For a system size L, this adds an additional L2 scaling to the computational cost.
One special aspect of quantum Monte Carlo methods, including variational Monte Carlo, is that it is the only method so far discussed that can work in a continuous space (e.g. by specifying a N-particle basis state by 3N continuous numbers, the list of positions of the electrons).53 Thus quantum Monte Carlo methods need not use a single particle basis.
The second main family of ground-state quantum Monte Carlo methods are projector quantum Monte Carlo methods.48,53,54 These are based on an implicit representation of the ground-state wavefunction as limβ→∞exp(−βH)|Φ0〉, with a stochastic representation of e−βH. The general idea is to write
(24) |
(25) |
The sign problem is generally removed by introducing a trial wavefunction which contains information on the sign; the way in which the trial wavefunction is used to remove the sign depends on the type of projector Monte Carlo, but in all cases the outcome of the walk no longer samples the exact wavefunction but some approximation to it. However, all methods employing a trial wavefunction converge to the exact result as the trial wavefunction improves.
Projector Monte Carlo avoids the need to optimize a large number of variables as in variational Monte Carlo. When a simple trial state is sufficient it thus inherits some of the favourable scaling of quantum Monte Carlo methods, and it also has the flexibility to incorporate complicated trial states which capture features of strong correlation, although the need to beat down the stochastic error with a large number of samples remains.
In the numerical simulations of electron correlation beyond mean-field theory, there are a wide variety of approximate methods that can be applied. Surveying the tables of characteristics of methods discussed, however, we can identify some immediate gaps, in particular in the treatment of systems with both a large amount of strong and weak correlation. This gap may not be one of principle: for example, there is no reason a priori to expect that tensor networks cannot be used in a more continuum like (complete basis) description,55 and perhaps, truncated coupled cluster methods with sufficiently high levels of truncation can be used to describe the strongly correlated problems of interest in chemistry; but such methods do not yet exist today, and will certainly need new ideas if they are to be practical in the future. We return to this point in Section 4.5.
The fact that there is a gap in capabilities does not conflict with the possibility of achieving an intuitive understanding of correlated electron systems: even for problems where mean-field methods are an appropriate description of the electronic structure (for example, in the ligand-binding example in Section 2.7) there can still be a formidable computational gap to making quantitatively precise predictions. At the same time, some of these gaps (particularly in the systems with both substantial strong and weak correlations44) seem to arise because the current set of classical heuristics do not fully embody the intuitive simplifications that exist in physical systems. This suggests that there are new conceptual directions to explore.
The methods above as usually employed are polynomial cost heuristics. Aside from the tensor network methods, the techniques do not incorporate locality as an assumption from the outset. Variants of these approaches that further incorporate locality can be devised, and under the (often chemically reasonable) assumption of strong locality (see Section 3.7), these reduce the cost scaling to linear in the system size. Although there is no formal theory of error associated with these heuristics, there is a lot of empirical understanding of when they work and when they do not. Thus the judicious application of these methods, sometimes in concert with each other, has helped establish the success of computational quantum chemistry.
Heuristics are usually run with polynomial cost, but without a detailed understanding of the error, but to discuss the complexity one needs to understand the associated error as well. In other words, we seek to obtain the cost C of running a heuristic to some desired error ε, for a given system size L, the function C = f(L, ε).
The term chemical accuracy was originally coined in the context of chemical energy differences.56 When applied to total energies, one could argue that a more reasonable definition of chemical accuracy is in terms of the relative error = ε/N (and up to a multiplicative factor, ε/L). Ideally this applies to the energy of the valence electrons (where chemistry takes place) but in practice it is difficult to separate out such an energy component. Leaving that aside, the relative error is a more appropriate metric for chemical reactions, as these involve local changes, and it is also consistent with assuming equivalence of statistical mechanical ensembles in the thermodynamic limit.
Nonetheless, almost all numerical calculations compute the total energy of a problem rather than the relative energies directly: relative energies are obtained as the difference of the total energies. We might then wonder whether computations should target a relative error or an absolute error in the total energy?
Because of the observed locality of chemical matter, it turns out that many heuristics for total energy computation can yield useful answers even if one only targets the relative error rather than the total error. For example, consider the basis set error in a chemical reaction. The basis set error in the total energy is a global quantity which (for a given basis resolution) simply increases with the size of a system. However, in a chemical reaction, where changes are only occurring locally, then the energies of the regions far away cancel out between, say, the reactants, and the products. Under the setting of strong locality, most of the classical approximations with the “extensivity” property in Section 3.9 produce an energy error which is approximately additive across the atoms in the simulation. Then only the local error matters in modeling a chemical reaction.
The main exception to the above is stochastic errors, which (without using special types of sampling) are uncorrelated between different calculations. Uncorrelated stochastic errors in calculations where an energy difference is to be taken must therefore be converged to a given absolute error, in order to obtain the same order of error in the energy differences.
Although the limit of L → ∞, ε → 0 is not chemically very relevant, it is still useful to understand the scaling of costs and errors up to the relevant cutoff scales, which we might denote Lc and εc. Of course it is difficult to agree exactly what these cutoffs are, especially for Lc. To limit the scope of discussion, we can consider only the case of modeling electron correlation in ab initio calculations. Then, one pragmatic perspective is that we would like to model correlated electron effects for the same systems that one can routinely perform mean-field calculations for, say ∼O(1000) atoms and O(10000) electrons, or a linear dimension of O(10) atoms in a three-dimensional problem. (Under chemical conditions, the minimum spacing of atoms is O(1) atomic units, thus this translates into a length scale as well.)
If a system exhibits strong locality, then one would expect that on scales larger than the associated Lc, the cost to simulate the system is O(L). If there were no physical structure to the problem before the cutoff scale is reached, then up to size Lc, the observed C(L) ∼ eO(L); subsequently across a class of problems where Lc can be tuned, the cost scaling would be expected to grow like eO(Lc). But in the author’s experience, this is too pessimistic, as the various physical structures of correlation impose themselves before Lc is reached. For example, in the weak correlation limit, a truncated coupled cluster calculation may be sufficient, with O(poly(L)) scaling, before systems sizes of O(Lc), while e.g. in tensor networks, as one tunes correlation functions towards a critical system (thereby increasing Lc), the ground-state entanglement does not change from an area-law entanglement to a volume-law entanglement at the critical point, but only acquires logarithmic corrections. Thus for a given error, we expect to see a smooth crossover between the functional form of C(L) at small L and for L > Lc.
We have already described the various kinds of electronic structure observed in the ground- and low-lying excited states in Section 3.9. These different qualitative starting points might be thought of as the references. Thus the reference error is associated with constructing an appropriate choice of starting point. The refinement error is the error associated with applying a heuristic after that starting point has been constructed.
In a molecule, a prototypical example of these two types of errors can be seen in the case of stretching a chemical bond. Consider the case of using the coupled cluster heuristic. This requires a Slater determinant starting point (such as the restricted Hartree–Fock reference) which we observe is usually qualitatively correct near the equilibrium geometry but becomes increasingly poor as the bond is stretched. The coupled cluster calculation (truncated to singles and doubles) is then very accurate at the equilibrium geometry, and the remaining error from the exact calculation (in this basis) can be viewed as the refinement error. However, the same coupled cluster calculation becomes poor as the bond is stretched.
Because the coupled cluster method is systematically improvable, we could in principle improve the result at long bond lengths by increasing the truncation level at additional cost. But, in the stretched region, one finds that another type of Hartree–Fock solution can be found (the unrestricted Hartree–Fock solution) and this qualitatively describes the energetics of the bond dissociation. Introducing the coupled cluster ansatz on top of the unrestricted Hartree–Fock solution then leads to much more accurate energies at stretched geometries (see Fig. 1).
Alternatively, we could construct a reference consisting of the important configurations around the Hartree–Fock reference. If we include all the valence excitations, associated with the triple bond breaking, we obtain the complete active space self-consistent field (CASSCF)) reference.26 This reference state contains (but is not limited to) all Slater determinants that become degenerate or nearly degenerate at long distance. We see that its energy is now qualitatively correct both at short and long bond lengths. Refining this now by including singles and doubles type fluctuations on top of it (the MRCISD26 curve) yields an almost exact result.
Thus we can see that it is useful to think of the error associated with the choice of reference as a distinct error from the error associated with refining it. In the coupled cluster truncation, this is because the cost to find another Hartree–Fock solution that is appropriate for longer bond lengths will be much less than the cost to improve the coupled cluster calculation starting from a poor choice of reference. In the case of using the CASSCF reference, it is the fact that we are making an appropriate choice of Slater determinants to linearly combine which leads to the qualitatively correct curve; the refinement on top of this then requires only low-order fluctuations. In the materials setting, we can regard the idea of starting from the correct reference point as starting from a state that is in the correct quantum phase.
Accepting that there are these two types of errors, we may view a classical heuristic algorithm as containing two steps: first, finding the appropriate reference state (state preparation) and second, preparing it and refining its energy. The cost function then takes the form
C = Rstate prep × Crefinement | (26) |
There are two logical possibilities. The first is that the starting point is intrinsically hard to describe classically i.e. the quantum state does not even have a succinct classical description. (By classically succinct we mean here that the state can be stored with a polynomial amount of classical information and that its relevant properties extracted with a polynomial amount of classical computation).
If there is no such succinct description, that means that the state cannot be qualitatively expressed or interrogated in a faithful form without a beyond classical, i.e. quantum, computer. It would be some kind of black box for which, at least for certain properties, we could not write down a “theory” in the usual sense. Although such states can certainly be imagined (and manipulated, say, on a quantum computer) to the best of the author’s knowledge, they have never been observed in the low energy states of chemical matter. Indeed, it is a remarkable feature of Nature that despite the vastness of the Hilbert space of a material in the thermodynamic limit, the number of ground-state phases observed is relatively small (e.g. insulators, metals, superconductors, various topological orders etc.) – at least compared to the size of the Hilbert space. Furthermore, simple qualitative wavefunctions (and theories) have been formulated to describe the observed phases, such as Fermi liquid theory, Landau’s theory of symmetry broken phases, the Bardeen–Cooper–Schrieffer wavefunction, the Laughlin wavefunction, and so on. As argued already in Section 3.6, we can even make an empirical claim that much of the chemical matter actively studied in chemistry can be thought of as qualitatively close to some mean-field state, perhaps with a broken symmetry, or with a polynomial number of connected fluctuations away from it. Thus, we conjecture that in chemical matter the starting point can be expressed in a classically succinct manner.
With the above assumption, the main remaining challenge comes from finding this classical starting point. One strategy is to search over the possibilities, i.e. enumerate the different starting points, apply a refinement (using one of the many classical heuristics) and see which yields the best answer (i.e. the lowest energy). Indeed, this is what is done in practice in particularly challenging problems. For example, in the study of the Fe–S clusters in nitrogenase, in practice one generates many different broken-symmetry mean-field solutions,10,57 which each can then be refined using more sophisticated techniques, such as the density matrix renormalization group.8 Similarly, in condensed matter simulations of the Hubbard model, one applies different kinds of pinning fields and boundary conditions to “prepare” different ordered phases which are in close competition in the phase diagram.11
While we do not have a rigorous enumeration of the possible phases of matter, it seems reasonable to assume that in the worst case (over systems of chemical relevance) we can search over a number of possibilities that is exponential in system size. This exponential complexity is realized in finding the ground-state of classical spin glasses, for example, where the complexity class for ground-state determination is NP hard. Given that we expect the starting point to be classically succinct, we can conjecture additionally that the state-preparation problem in chemical matter is then in fact NP hard (loosely, classically exponentially hard). This is a simplification of the QMA hardness (i.e. quantumly exponentially hard) that is expected for the general ground-state problem, and which is realized in quantum spin glasses. (In essence, this simplification is equivalent to stating the worst case quantum spin glass problems are not in the set of today’s chemically relevant problems.)
Of course, all the above discussion also does not mean that we will always encounter exponential complexity of state preparation in practice: most chemical systems studied are not spin glasses of any type. In those cases where empirical inference tells us that we can prepare a simple mean-field state, or perhaps, a linear combination of such mean-field references e.g. to describe a bond breaking, Rstate prep = 1.
In ref. 58, we first considered a set of molecules for which the Hartree–Fock reference is a good starting point, where the energy is further refined using coupled cluster theory. We can plot the error of the coupled cluster theory as a function of the truncation order, and estimate the cost of the associated coupled cluster calculation (from the cost of the most expensive tensor contraction). (To be precise, this is not a perfect accounting of costs as (i) it does not account for the number of such contraction terms or (ii) the number of iterations needed to solve the amplitude equations. To our knowledge, (i) has not previously been examined, but for the most common case where the number of virtual (unoccupied) orbitals is much larger than the number of occupied orbitals, the number of such terms is O(1).)
In Fig. 2 we show data for the nitrogen molecule and the water molecule, at various bond lengths and (in the case of the nitrogen molecule) starting from restricted and unrestricted Hartree–Fock references. (The data for N2 starting from the unrestricted Hartree–Fock reference at the equilibrium geometry was previously discussed in ref. 58, but is here presented with a more precise accounting of computational cost.)
Fig. 2 Energy error of nitrogen and water molecules at equilibrium re and stretched (multiples of re) bond lengths, as a function of the level of coupled cluster truncation, against a computational cost metric. (a) and (b) show the data computed using RHF/CC and UHF/CC, respectively. Data taken from ref. 35 and 59. |
The straight line fits for the N2 and H2O molecules on the log–log plot of error (relative to the full configuration interaction result in the given basis) versus cost indicates that the functional form of the cost is O(poly(1/ε)). Because of the exponential ansatz formulation of coupled cluster theory, for a gas of such non-interacting molecules, using the same parameters in the exponential operator yields the same relative error, thus the total empirical cost of coupled cluster for a gas of these molecules is O(poly(L)poly(1/)) which translates to a cost of O(poly(L)poly(1/ε)). We might conjecture that this is the cost scaling of coupled cluster theory for so-called single reference problems. In ref. 58, we give further evidence, using local coupled cluster theory,60 that indeed this is the appropriate scaling in large organic molecules.
An important feature of these error plots, however, is that the slope of the curves shows substantial system dependence, on the molecule, the bond length, and the starting reference that is chosen. At the equilibrium geometries, the slope is large, indicating a modest polynomial dependence on 1/ close to O(1/2). However, as the bond lengths are stretched (to some multiple of re, the equilibrium lengths), and the problem is more and more multireference, the convergence becomes slower. For some (but not all) stretched bond lengths, the convergence is less systematic than at the equilibrium geometry, and the effective error scaling, for the longest nitrogen bond length, is as poor as O(1/7).
Taking now a rather different limit, we may consider systems for which a single mean-field reference is certainly a poor qualitative description. In Fig. 3, we show the error convergence of a tensor network ansatz (the projected entangled pair state, PEPS) for the 2D frustrated J1–J2 Heisenberg model, and the 4-leg and 8-leg 2D Hubbard model. The data on the J1–J2 model is analyzed here for the first time, as is that for the 2D Hubbard model for the 8-leg ladder. Additional data, for the 3D Heisenberg model, can be found in ref. 58.
Fig. 3 The log–log plot of energy accuracy versus PEPS bond dimension D for (a) the spin-1/2 frustrated J1–J2 Heisenberg model at J2/J1 = 0.5 on 16 × 16 with PEPS D = 10 energy as the reference,61 and (b) the 2D Hubbard model with onsite repulsive interaction U = 8 at 1/8 hole doping with extrapolated DMRG energies as the reference.62 For both cases the computational cost scales as O(D6). The linear fits of log versus logD give ∼ 1/D4 for the J1–J2 model (red dashed line) and ∼ 1/D1.5 for the Hubbard model. |
We find that as a function of the flexibility of the ansatz, as expressed by the tensor bond dimension D, the relative error behaves like ∼ poly(1/D). Then since the computational cost is O(Lpoly(D)), more precisely, O(LD6) in the variational Monte Carlo PEPS formulation used here,63 the overall cost of refinement is O(L/1.5) in the J1–J2 Heisenberg model, and O(L/4) in the 2D Hubbard model, both consistent with O(poly(L)poly(1/ε)) refinement cost. However, as observed in the coupled cluster examples, there is a range of exponents observed, consistent with the general understanding that fermionic systems are harder to simulate than spin systems with tensor network methods.
The above constitutes a very limited set of examples. In particular, while the calculations on strongly correlated models with tensor networks, and those on molecular systems with coupled cluster methods, correspond to two limits of strong and weak correlation (and are also representative of material versus finite molecular systems) we do not have much data for strongly correlated ab initio problems, where one needs to converge the description of both strong and weak correlation. This reflects the gap in current methodologies discussed in Section 3.9. Nonetheless, if we take the evidence in hand, and with a certain degree of optimism that this author possesses, we can argue that the empirical error convergence of appropriate classical heuristics to refine the energy will be O(poly(L)poly(1/ε)), although it is likely that in some problems there is a rather high power dependence on the inverse error.
Classical heuristics are methods that are used with O(poly(L)) runtimes, essentially by definition. However, the errors from running the heuristics will need to be better characterized for a full understanding of their effectiveness.
It appears to be useful to separate the error into two categories: a reference error, and a refinement error. We conjecture that the reference error can be eliminated by simply constructing an appropriate reference, which is some classically succinct state. Often, constructing such an appropriate reference is easy (e.g. one can just take a mean-field reference), but there are also problems where finding the reference will involve a classical search over a set of possibilities, which may be exponentially large in some difficult problems. The search over reference states may be where the exponential complexity of quantum chemistry ultimately lies, but our conjecture suggests that it is a type of classical exponential hardness, rather than a quantum exponential hardness.
Once a suitable reference is identified, limited numerical data suggests that the refinement can be done efficiently (although perhaps not practically) by current classical heuristics. The cost of classical approaches under these assumptions thus takes the conjectured form Rstate prep × O(poly(L)poly(1/ε)). We refer to this hence as the classical heuristic cost conjecture.
To formalize the notion of advantage, we define the computational task as that of finding the ground-state energy of a many-electron quantum chemical problem of size L (for the kinds of chemical Hamiltonians we have been discussing) to some desired error ε. Then, quantum advantage refers to a favourable relationship between the classical and quantum costs for this task. Usually, the most desirable advantage and the one most discussed is an asymptotic exponential quantum advantage (EQA).58 One way to define advantage is as the ratio of costs, then EQA means that asymptotically the ratio is eO(L). EQA is clearly achieved if the quantum algorithm has polynomial cost but the classical algorithm has exponential cost, but under this definition, if the classical and quantum algorithms both have exponential cost, we still obtain an exponential advantage if the exponent in the quantum case is smaller. A different way to assess advantage is to define the classical cost as a function of the quantum cost, i.e. Cclassical = f(Cquantum) and ask if this is a polynomial or exponential function. Under this definition, if say Cclassical = ecL and Cquantum = ecL/2, one would say that there is quadratic speedup and no EQA. The latter definition is probably more common (but note that ref. 58 uses the first definition). We also note that there are other kinds of advantage, such as polynomial quantum advantage or even constant factor advantage.
In principle, under common complexity assumptions, quantum computation provides a strict superset of classical computational power. Thus, assuming a quantum computer has the option to execute the classical algorithm at the same speed as the classical computer, there must always be an asymptotic theoretical advantage to using a quantum computer. However, there is more to establishing quantum advantage in the quantum chemistry setting than this theoretical statement because we are generally interested only up to finite sizes and finite errors, as discussed in Section 4.2. As a consequence, the detailed form of speedup, including the prefactors, is important. While asymptotic analysis to date has usually ignored these details, in part because some of the subleading costs result from the implementations of the algorithms on hardware that has yet to exist, it is clear that non-asymptotic analysis must play an important role in demonstrating quantum advantage in the quantum chemistry.
There are currently two main kinds of algorithms that have been proposed for simulating the quantum chemistry problem (i) variational (or hybrid) algorithms and (ii) algorithms based on phase estimation or quantum linear algebra. Although the first class of algorithms is often discussed in the context of noisy, non-error-corrected quantum computers, we assume below they are executed on perfect quantum hardware.
Indeed, it is clear that not every plausible classical heuristic has polynomial inverse error dependence, even under strong assumptions about the behaviour of chemical matter. For example, in a system exhibiting strict locality, one might choose to simulate the system as a set of non-overlapping fragments and add the fragment energies together; and one could, for example, use a full configuration interaction solver to simulate the ground state of each fragment. For any fixed fragment size M, the scaling of the method is then O(poly(L)) in system size (the polynomial dependence above linear arising from the long-range Coulomb interaction) but each fragment can be expected to have an energy error proportional to its surface to volume ratio, and since the cost to compute a fragment’s energy when using a full configuration interaction solver is eO(M), the relative error scaling of the method becomes eO(1/), translating to an overall total cost of eO(L)eO(1/ε) when considering the absolute error. This then provides room for a quantum algorithm to achieve EQA. Obtaining O(poly(1/ε)) error clearly requires a careful heuristic approach, and we cannot prove that such a heuristic exists for all chemical matter.
Even if the classical heuristic cost conjecture holds, i.e. we can always find methods with O(poly(1/ε)) refinement error, the polynomial power may also be very large. This is seen in some of the numerical examples in Section 4.5 and presents a situation where a quantum algorithm can potentially achieve a large polynomial speedup. Given that we should always consider there to be a finite size cutoff Lc in chemistry anyways, a large polynomial speedup up to sizes of O(Lc) might be viewed as a more appropriate target than asymptotic exponential quantum advantage.
There is also the class of problems where Rstate prep ∼ eO(L). For these problems, there is again the possibility that the subsequent classical refinement is inefficient, leading to a similar mechanism for quantum speedup as described above. But assuming the classical energy can be refined efficiently, then another possibility is that there is a quantum speedup that arises from eliminating the repeated classical state preparation. However, in the author’s view, problems of this kind arise mainly when there are competing references (competing phases in a material) and such problems have characteristics of spin glasses, where it is already known that quantum algorithms for ground states must themselves have a similar exponential cost, essentially due to a similar state preparation problem. Spin glasses offer no room for exponential speedup arising from the state preparation, although EQA (in terms of the first definition) is still possible in terms of a ratio of exponential costs. While one might search for more exotic matter where the quantum state preparation problem is exponentially easier than the classical one (and there are certainly artificial Hamiltonians where this can be made to be the case64) we do not know of a chemically relevant example, and arguably, finding such a fine-tuned case has limited impact in the discipline of quantum chemistry.
Classical | Quantum speedup | Advantage | Likelihood |
---|---|---|---|
Rstate prep × O(poly(L)poly(1/ε)) | Refinement | Poly. | Common |
Rstate prep × O(poly(L)exp(O(1/)) | Refinement | Exp. | Possible |
eO(poly(L)) × Crefinement | State prep. | Poly. | Common |
eO(L) × Crefinement | State prep. | Exp. | Unlikely |
(27) |
Variational algorithms, such as the variational quantum eigensolver,65 have elicited an enormous amount of attention and are extremely popular. To some extent, this can be attributed to the fact that are easy to understand: any circuit parametrization yields a viable ansatz which may even be tested on one of today’s existing noisy quantum devices. On the other hand, they contain a non-linear optimization and involve very large prefactors due to the number of measurements required to reduce the statistical error (for some examples of numbers, see ref. 68). Even disregarding these constant prefactors it is currently difficult to find quantitative evidence of quantum advantage with variational algorithms in problems of chemical matter. In principle, a circuit ansatz must be able to represent states that cannot be efficiently represented on a classical device. However, this by itself does not mean there is a quantum advantage, because one needs to (i) first find the circuit representation that yields such a state through the optimization in eqn (27) and (ii) ensure that there is not some other classical representation of the wavefunction that yields a similar quality of the energy or other observable, with a lower cost.
The first issue has been much discussed in the quantum literature because, in general, optimizing unitary circuits is difficult, due to the “barren plateau” problem, where a gradient algorithm gets stuck in some high-energy local minimum.69 Indeed, this has led some to question the value of variational quantum algorithms.70 Here we might appeal to the classical intuitions established above to ameliorate this problem, namely we can initialize with the same classically succinct reference states used in classical quantum chemistry: this might be viewed as the state preparation step of a variational quantum algorithm. Whether this solves or ameliorates the barren plateau problem remains to be established. Under the assumption that it does, there is still a possibility for polynomial quantum advantage with respect to the search over classical references (although the size of this advantage may be small). Given the above strategy, the cost of a variational algorithm may be analyzed as
C = Rinit × Crefinement | (28) |
We can then assess the possibility of quantum advantage using a variational quantum circuit for energy refinement. One study which attempts to do this is ref. 71, which compares variational quantum algorithms with different circuit architectures to classical MPS calculations on 1D (and pseudo-1D) model Hamiltonians. Because DMRG and MPS are recognized as some of the best classical algorithms in the 1D context, we can argue that this is a reasonable classical comparison. Ref. 71 examined the expressivity (the number of variational parameters) and the theoretical cost needed to reach a desired accuracy in the ground-state energy. Extrapolating to high accuracy, and neglecting the difficulty of optimization, it showed that for certain kinds of circuit ansatz, there is a small polynomial advantage in the best circuit ansatz considered relative to a standard classical MPS. For example, for the standard MPS, the theoretical cost in the Heisenberg model (a rectangular strip, fixed system size) the relative precision of ε was estimated to be roughly C ∼ O(1/3.1), while with the best circuit structure, it was found that C ∼ O(1/2.9).
The above study provides partial evidence of a very modest polynomial quantum advantage in variational quantum algorithms coming from the error refinement task. But it should also be noted that the advantage is achieved in part because the chosen quantum circuit is a more structured and restricted ansatz than the classical MPS to which it is compared and it remains to be seen whether analogous simplifications used in the classical setting eliminate this advantage. Further studies of this kind, which compare the asymptotic expressiveness and computational costs of different variational circuits against competing classical methods, are clearly needed.
C = Rstate prep × Cphase estimation | (29) |
C ∼ poly(1/S) × poly(L) × poly(1/ε) | (30) |
Much has been made of the complexity of ground-state preparation for phase estimation because for an arbitrary Hamiltonian it is known that preparing a state with good overlap with the desired ground-state is generally hard even for a quantum computer (i.e. QMA hard). But, assuming the intuitions we have argued above, this is a red herring for chemical matter: no problem of real-world chemical relevance appears to have QMA hard state preparation complexity.
As an aside, we note that preparing a state for phase estimation is both a harder and (potentially) an easier task than preparing a standard classical reference state. It is a harder task because one wants to prepare a state with asymptotically large overlap with the ground-state (i.e. at worst O(poly(1/L)), while the usual standard classical reference states (such as a mean-field state, a coupled cluster wavefunction with fixed truncation order, or a tensor network state of some fixed bond dimension) actually have exponentially small overlap with the true state as L → ∞, giving exponential cost in the phase estimation algorithm from the poly(1/S) factor. Thus one needs to prepare an initial state that is asymptotically better than standard classical reference states.
On the other hand it is potentially an easier task, because the task of preparing a state of good overlap could, in principle, be easier than finding a reference from which a classical heuristic can easily refine an energy. To see an example of this, we can consider the difficulty of preparing a classical reference state such that perturbation theory converges, compared to that of performing adiabatic state preparation. In the perturbation theory case, we consider the Hamiltonian H(λ) = H0 + λH1 (with the desired Hamiltonian corresponding to λ = 1) and the energy E(λ). Then for perturbation theory to converge, E(λ) must be analytic in the complex circle for |λ| < 1, which requires the gap not to close in the complex circle. On the other hand, in the case of adiabatic state preparation, we consider an adiabatic path such as H(λ) = (1 − λ)H0 + λH1, and we only require that the gap does not close along the path, a presumably weaker requirement.
However, in ref. 58, we suggest that at least in chemical matter, the difficulty of state preparation for classical heuristics and for phase estimation are closely related. For example, systems in which a classical search over reference states needs to be performed to find a reference for classical heuristics (such as the Fe–S clusters in nitrogenase) appear to engender a similar level of complexity in quantum state preparation methods for phase estimation, such as adiabatic state preparation. On the other hand once a suitable classical reference for classical heuristics is known (which can be expected to have exponentially small overlap for large systems) the subsequent application of classical refinement not only rapidly improves the energy, but also rapidly improves the overlap, yielding a state that may then be prepared on a quantum computer for phase estimation.
Moving on from the state preparation aspect, we now consider the energy refinement which is performed by the phase estimation measurement. Phase estimation gives what is termed the Heisenberg limited error cost O(1/ε) – and this is generically the best error scaling one can get for a quantum algorithm. Under the classical heuristic cost conjecture, we assume a classical error scaling of O(poly(1/ε)), but it is possible for the polynomial exponent to be both <1 and thus better than what is seen in phase estimation (e.g. as observed in the 1D Heisenberg model in ref. 55) and also much greater than 1, as seen in some of the examples in Section 4.5.
Overall, however, the well-understood and provable error scaling of quantum phase estimation provides a potential source of polynomial quantum advantage in a wide range of chemical matter even under the assumptions of the classical heuristic cost conjecture (and of course potentially even more advantage if the conjecture breaks down). Given the diversity of the polynomial inverse error dependence seen in classical simulations, establishing this advantage concretely will require studying a wide range of examples. Because of the non-asymptotic nature of the actual chemical problems of chemical matter, the quantum algorithms must also be fully characterized with respect to their non-asymptotic costs.
We apply this reasoning to two families of quantum algorithms, variational quantum algorithms and quantum phase estimation (and related quantum algorithms). Both can be formulated in terms of a state preparation piece and a refinement component, thus creating a parallel functional form of the cost between classical heuristics, variational quantum algorithms, and quantum phase estimation. While the error convergence of variational quantum algorithms must be better established to understand the advantage that can be achieved, algorithms such as phase estimation, through their rigorous Heisenberg limited error scaling, present a strong case for looking for quantum advantage in refinement. Certainly we can expect to see polynomial speedups in some problems, and searching for a breakdown of classical refinement and the classical heuristic cost conjecture is an avenue to exponential speedup. However, understanding the specific type of advantage will require analysis beyond asymptotics.
Ultimately, chemistry is an experimental science, and the domain of its problems and the associated complexities has been defined not by mathematics, but by the examples in nature and our experimental control. In formulating the conjectures of this essay, we have tried to capture some essence of the behaviour of chemical matter that is amenable to mathematical analysis. We look forward to new developments in these directions.
This journal is © The Royal Society of Chemistry 2024 |