Supplementary Information for: Efficient and low-scaling linear-response time-dependent density functional theory implementation for core-level spectroscopy of large and periodic systems

We give here the K-edge basis convergence results. The energy difference between the first and second (ω2-ω1) excitation energies are reported across the pcseg and pcX basis sets using different standard hybrid functionals. The EOM-CCSD values comes from Ref. 1 and the experimental values from Refs. 2, 3, 4 and 5. The H2O oxygen K-edge convergence test is displayed on figure 1, NH3 nitrogen K-edge on figure 2, the CO carbon and oxygen K-edge on figures 3 and 4 and the C2H4 carbon K-edge on figure 5. Note that on all figures, basis sets range from doule zeta quality (pcseg-1, pcX-1) to quintuple zeta quality (pcseg-4, pcX-4). Wide clear bars correspond to the pcseg basis and thin dark bars to pcX.


Detailed SOC matrix elements
We give here the detailed spin-orbit coupling matrix elements in the auxiliary many electron wave functions 6 (AMEW) basis. To keep the equations concise, we work within the Tamm-Dancoff approximation and restricted closed-shell systems (extensions to full TDDFT and/or open-shell systems are straight forward).
We assume that the ground state densities and KS orbitals are the same for both spins, i.e. n 0 α (r) = n 0 β (r) and φ 0 jα (r) = φ 0 jβ (r). Then two types of excitation can take place, singlet excitation, where n − 7 . These conditions can be imposed by construction when taking linear combinations of the linear response orbitals, such that: describes singlet excitations and describes triplet excitations. This leads to two different eigenvalue problems, one for each type of excitation. From the solutions u − pi and v − pi , we define the singlet and triplet LR orbitals, respectively: The corresponding AMEW expansions for singlet and (spin-conserving) triplets can then be written as: where |Ψ 0 is the Slater determinant of the ground state orbitals,â iσ is the annihilation operator for a ground state KS orbital i with spin σ andr t,s iσ the creation operator for the corresponding singlet/triplet LR orbital. Explicitly , if we write the ground state SD as: the combined effect of r s,t iα †â iα on it is: The spin-flip triplets (with spin quantum numbers S = 1 and m s = ±1) are expressed as: and Given a one electron spin-orbit Hamiltionian H SO , matrix elements of the type Ψ S |H SO |Ψ 0 T need to be evaluated. Because triplet and singlet LR orbitals are not necessarily orthogonal to each other, we cannot simply apply the Slater rule for evaluating matrix elements. Instead, we have to use Löwdin's rule 8 . Note that within XAS LR-TDDFT, the sum over orbitals i in equations (5), (6), (9) and (10) is restricted to the core excited states, which amount to 3 in case of L 2,3 -edge spectroscopy. We label those core states with capital letters I, J to distinguish them. All non-zero necessary SOC matrix elements are given below.
SOC matrix elements between the ground state SD and excited triplets with secondary quantum number m s = −1, 0, 1: SOC matrix element between excited singlets and excited triplets with secondary quantum number m s = −1, 0, 1: SOC matrix elements between excited triplets (some combinations of m s quantum numbers yield zero coupling): One possible Hamiltonian to include the SOC effects is the ZORA Hamiltonian 9 which is expressed as: where c is the speed of light, V formally contains the nuclear field, the electron Coulomb and the xc potentials (although it is usually replaced by van Wüllen's model potential 10 ), p is the three-dimensional momentum operator and σ is the three-dimensional vector of the Pauli matrices.
It has been shown in the above cited paper that the matrix elements of the ZORA SOC with respect to spin-orbitals can be expressed as: where i is the unit imaginary number and ε lmn is the Levi-Civita tensor.
The CP2K implementation combines the general SOC matrix elements as listed above with the ZORA Hamiltonian.

Detailed kernel scaling
Evaluating the RI 3-center integrals (pq|µ) for the kernel involves a few optimization steps, which themselves come at a cost. First of all, a lot of screening takes place, as not all overlapping basis functions ϕ p (r), ϕ q (r) contribute to a particular kernel integral. In particular, for the Hartree and exchange-correlation kernel, the AO integrals are eventually contracted into a mixed AO and MO integrals (pi σ |µ), where φ i σ (r) is a core KS orbital. Since the core orbital is local in space, only a few AOs are actually involved. Thus, to save on expensive integral evaluations, a loop over all overlapping ϕ p (r), ϕ q (r) is done to check which need to be considered. This is also very present when projecting the density on the RI basis and 3-center overlap integrals (pqµ) need to be evaluated. In a certain measure, this also happens for the HFX kernel because of the finite range of the exchange potential.
In CP2K's matrix library DBCSR, matrices are split in atomic blocks which are themselves distributed over the processors. The default distribution is optimized for the load balance of the overlap matrix. This is however not the best for the evaluation of our RI integrals which are heavily localized around the current excited atom. Thus, an a newly optimized distribution for the calculation of the (pq|µ) integrals is determined for each excited atom. However, after evaluation, integrals must be redistributed to match the standard distribution, which comes with an overhead. Figures 6 and 7 show the detailed timings and scaling of integral evaluations, screening and redistributing for the water and sodium aluminate systems respectively. As expected, the actual integral evaluation scales linearly (measure scalings are 1.0 and 0.8). Due to increasing sparsity, the number of overlap matrix elements increases linearly with system size. However, since screening happens for each excited atom, its overall scaling becomes quadratic (measured at 2.2 and 2.1). Finally, the scaling of integral redistribution was measured at 1.8 and 1.6.

Representative input files
In the following pages, five representative CP2K input files are given. The first two files, in sections 4.1 and 4.2, were used for K-edge and L-edge validation, respectively. Because these are benchmark calculations, numerical accuracy is primordial. This is ensured by very high plane wave cutoffs, dense grids and minimal integral screening. The L-edge calculation has spin-orbit coupling enabled and therefore requires both singlet and triplet calculations (default is singlet only).
The input file in section 4.3 is representative for the ADMM acceleration benchmark study. It is essentially the same as the first two, with the addition of of the &AUXILIARY_DENSITY_MATRIX_METHOD section and AUX_FIT auxiliary basis sets.
The last two input files, in sections 4.4 and 4.5, are representative of periodic boundary calculations such as those for extended system and scaling studies, respectively. They are characterized by lower plane wave cutoffs, less dense grids and increase integral screening. Note that because of the PBCs, the truncated Coulomb potential is used. CO