Tracing potential energy surfaces of electronic excitations via their transition origins: application to Oxirane

We show that the transition origins of electronic excitations identified by quantified natural transition orbital (QNTO) analysis can be employed to connect potential energy surfaces (PESs) according to their character across a widerange of molecular geometries. This is achieved by locating the switching of transition origins of adiabatic potential surfaces as the geometry changes. The transition vectors for analysing transition origins are provided by linear response time-dependent density functional theory (TDDFT) calculations under the Tamm-Dancoff approximation. We study the photochemical CO ring opening of oxirane as an example and show that the results corroborate the traditional Gomer-Noyes mechanism derived experimentally. The knowledge of specific states for the reaction also agrees well with that given by previous theoretical work using TDDFT surface-hopping dynamics that was validated by high-quality quantum Monte Carlo calculations. We also show that QNTO can be useful for considerably larger and more complex systems: by projecting the excitations to those of a reference oxirane molecule, the approach is able to identify and analyse specific excitations of a trans-2,3-diphenyloxirane molecule.


I. Introduction
Kohn-Sham density functional theory (DFT) [1][2][3][4] can accurately reproduce many properties of molecular and solid state systems 5-7 as compared to experimental methods and higher-level theory, e.g. molecular structures, vibrational frequencies, atomisation energies and ionisation potentials. Meanwhile, its time-dependent extension, TDDFT [8][9][10][11] , particularly in the linear response (LR) formalism 10 , can deliver properties of excited states and optical absorption spectra of molecular systems. DFT and TDDFT calculations are increasingly able to treat large, complex biological systems, such as proteins, enzymes, and nucleic acids, due to the increasing availability of large computational facilities in recent years [12][13][14][15] . However, there remain serious obstacles to such uses: the intrinsic cubic scaling of traditional KS-DFT with respect to the number of electrons; the explosion of the size of the available configurational phase space associated with large, flexible molecules; and the challenges of accurate electrostatics of solvated systems 16 . When excited state properties are considered, further challenges come into play such as the proliferation of spurious charge transfer excitations, and the difficulty of identifying connected excited states on a potential energy surface.
Recent advances in linear-scaling (LS)-DFT [17][18][19][20][21][22][23][24] have made the study of very large molecular systems much more feasible. LS-DFT relies on the exponential decay of the density matrix, ρ(r, r ), with respect to |r − r | in a system with a finite band-gap [25][26][27] . In such systems, the density matrix, expressed using local orbitals, can be truncated by value or by range to render it local. In this work we use the ONETEP code 24 , where a minimal number of local optimisable functions, called nonorthogonal generalised Wannier functions (NGWFs), are optimised in situ to best represent the density matrix of the system. Recently, Linear-Response TDDFT Tamm-Dancoff approximation (TDA) calculations have also become available in ONETEP, using a similar density matrix formalism 28 . Such methods can provide valuable knowledge about the electronic excitations of very large molecular systems such as proteins and DNA segments. However, the low symmetry and complex geometry of biological systems exacerbates the aforementioned problem of tracing the potential energy surface of specific excited states. This work therefore focuses on quantified natural transition orbital (QNTO) analysis, recently proposed by Li et al. 29 , which offers progress towards this goal.
QNTO analysis rewrites the transition vectors of an electronic excitation identified by a theoretical calculation, e.g. TDDFT, in terms of the natural transition orbitals (NTOs) 11,30,31 . The projection of hole (NTOn-H) and electron (NTOn-E) orbitals of the nth NTO pair onto a standard orbital basis set such as natural bond orbitals (NBOs) 32-34 then helps quantify the transition origins of the NTOs of the excitation. Alongside the absorption energy and oscillator strength, knowledge of these transition origins helps provide a physical interpretation of the role of a given excitation dominated by single excitation character. Similar analysis may also be developed for single de-excitations, which is not as important in generic biological systems where de-excitations are usually negligible 11 , signified by the good approximation of TDDFT/TDA to TDDFT. In addition, even in large and complex systems, electronic excitations may involve numerous electronic transitions between valence and conduction orbitals; using NTOs as the orbital basis, the number of dominant electronic transitions is often reduced to one or two, helping clarify what transition components contribute to the electronic excitations. QNTO analysis can help to assess the variation of transition origins determined by the different theoretical methods available in an electronic excitation calculation. For example, this approach has been used to assess the per-formance of different exchange-correlation functionals in predicting transition origins by comparing to a high-level theoretical calculation 29 . The variations of transition origins caused by the presence of different molecular environments and/or different molecular geometries can also be investigated. For example, QNTO has been used to study the role of the DNA backbone in mediating the transition origins of electronic excitations of B-DNA 35 .
In the present paper, we introduce theoretical background to Linear-Response calculations in ONETEP (Sec. II), and show that QNTO analysis can be implemented with a non-orthogonal basis set such as the NGWFs used in ONETEP (Sec. III). In Sec. IV we then demonstrate two applications of LR-TDDFT/TDA/QNTO: (1) studying the photo-driven ring opening reaction of oxirane, and (2) identifying and comparing similar excitations between trans-2,3-Diphenyloxirane and oxirane. We demonstrate that QNTO analysis can be readily used to locate state crossings that are otherwise unclear from plotting potential energy curves. The results are consistent with previous experimental and theoretical studies, showing that the present method is a feasible and reliable approach for investigating excited state related reactions.

II. LR-TDDFT in ONETEP
Linear-scaling DFT as implemented in ONETEP 24,36 relies on a reformulation of the single-electron density matrix in terms of local orbitals, as follows where ψ p (r) are single-particle eigenstates with occupation f p , while φ α (r) are spatially-localised nonorthogonal orbitals. K αβ is a generalised matrix representation of the occupancies, known as the density kernel. Note that implicit summation over repeated Greek indices is used throughout this work. The density kernel is defined by K αβ = φ α |ρ KS |φ β , withρ KS being the KS singleparticle density operator. The nonorthogonal orbitals {φ α } and their duals {φ α } form a biorthonormal basis set such that φ α |φ β = δ β α . ONETEP employs a minimal number of NGWFs, each expressed in terms of periodic bandwidth limited delta functions (psincs) centred on the grid points of the simulation cell, with a spacing determined by a kinetic energy cutoff akin to that in a plane-wave code. 37 NG-WFs are localised to atom-centred spherical regions defined by a cutoff R φ . A total energy calculation proceeds by minimising the total energy simultaneously with respect to the matrix elements of the density kernel and the expansion coefficients of the NGWFs in terms of the psinc functions, via nested loops of conjugate gradients optimisations 38 . NGWFs are initialised to pseudoatomic orbitals which solve the pseudopotential for a free atom. We confine our attention to closed-shell systems (even number of electrons) with non-fractional occupancy, and the total electronic energy of the real system is written as The factor 2 accounts for the spin degeneracy and E DC is the standard "doublecounting" correction. An FFT box technique 36 is employed to allow one to construct matrix elements and density components in O(1) computational effort independent of system size.
As described in detail in Ref. 39, the minimisation is carried under the constraint of conserved total electron number, where S {v} αβ = φ α |φ β , and also the constraint of idempotent density matrix, which is equivalent to requiring the orthonormal orbitals have occupancy of 0 or 1. An important advantage of optimising the NGWFs in situ is that basis set superposition error (BSSE) is eliminated. 40 Sophisticated algorithms have been implemented so that a high performance can be achieved running on parallel computers 41 . Because the NGWFs are optimised in situ to describe the occupied part of the density matrix, they do not in general provide a good description of unoccupied states. Therefore, we instead generate and optimise in situ a second set of NGWFs, denoted as {χ α }, obtained by minimising the bandstructure energy associated with a chosen range of low-energy conduction band states, having first projected out the valence states. The bandstructure energy to be minimised is written as where H {c} αβ = χ α |Ĥ proj |χ β and K αβ {c} = χ α |ρ {c} |χ β . A full description of this procedure can be found in Ref. 42.
Recently Zuehlsdorff et al. proposed a formulation for linear-scaling LR-TDDFT/TDA (hereafter simply denoted as TDDFT) calculations, which uses the conduction and valence NGWFs as a basis for describing the response densities of electronic excitations 28 . This approach adapts the Casida formulation for finding transition vectors, X I , and excitation energies ω I by using the TDDFT equation, AX I = ω I X I , to a nonorthogonal local orbital basis. The quantity sought is the sum of the energies of the N ω lowest excitations A conjugate gradients minimisation of Ω is performed subject to the constraint of keeping the eigenvectors orthonormal, expressing each transition I using a transition density kernel K {1}I The transition density kernel, K αβ and the Hartree-exchange-correlation self-consistent field matrix is defined by: SCF (r) being the Hartree and XC response to the transition density as defined in Eq. 9 of Ref. 28.
The minimisation of Eq. 6 proceeds as described in Ref. 28. To avoid unwanted valence-valence transitions, we project out these unwanted components by constructing the kernel as Additionally, we define a combined NGWF set by amalgamating valence and conduction NGWF sets to create a joint set, {ζ α } = {φ α } ∪ {χ α }. Using this representation, and the corresponding overlap matrix S {j} , in place of {χ α } and S {c} , means that we no longer need the explicit representation of the conduction space density operator. Instead, the conduction space projector can be replaced by subtracting the valence density operator from the identity operator within the subspace of the full joint NGWF representation. This defines where S {jv} αβ = ζ α |φ β and S −1 {j} is the inverse of S {j} . We also note that in the TDDFT the KS-system is linked to the real-system by where |Ψ 0 and |Ψ I are the 'exact' many-body wavefunctions that are obtained from diagonalising the exact Hamiltonian in the configuration space. The linear dependence among the products {ψ * v (r)ψ c (r)} becomes likely as the orbital basis approaches completion 43 and the simple assignment Ψ 0 |â † vâc |Ψ I = x I cv may not be justified. Nevertheless, this assignment would become reasonable 10 if the number of orbitals involved in Ψ 0 |â † vâc |Ψ I is not too large, e.g. for low-lying excitations, and therefore the study of transition origins becomes possible via directly analysing x I cv or K αβ {1}I from the TDDFT calculation.

III. Quantified natural transition orbital (QNTO) analysis
In this section, we first introduce the basic formulation of QNTO analysis in terms of an orthonormal orbital basis such as KS-orbitals. We then detail how it is implemented in ONETEP based on a non-orthogonal NGWF basis. The standard notation ' †' is used throughout to denote conjugate transpose.

A. Natural transition orbitals
The NTO basis is an orbital basis that makes Σ = U † X I V diagonal, with non-negative real numbers √ λ n on the diagonal. In this representation the transition density operator is simplified to require summation over just one index: N N denotes the number of non-zero values √ λ n , which are ranked by magnitude.
√ λ n are called the singular values from the singular value decomposition (SVD) of X I = UΣV † . We also see that the U and V respectively diagonalise X I X † I and X † I X I , with the same eigenvalue set {λ n }, where X I X † I and X † I X I are density matrices for the electron and hole orbitals, respectively.
In TDDFT calculations in ONETEP, the local orbitals (NGWFs) are not an orthonormal basis. If the NTOs are expressed in terms of NGWFs as X I is then rewritten as and the target for decomposition becomes If we hope to directly employ an SVD algorithm for orthonormal basis, we would have to first construct Once the singular vector matrices MU and NV are obtained, we would then need to calculate M −1 and N −1 in order to solve for U and V. However, since the square root and inverse of a sparse matrix are in general much less sparse, their calculation would be very costly when dealing with large matrices. As an alternative approach, we can instead choose to diagonalise the X I X † I and X † I X I as shown in Eq. 13.
equations can be derived as Note that in the SVD of X I = UΣV † , the relative phase between each pair of left (NTOn-E) and right (NTOn-H) singular vectors are fixed: if the left vector is multiplied by (−1), the right vector must be multiplied by (−1) to remain a solution. Hence, the relative phase between each pair of eigenvectors needs to be determined after solving Eq. 16. The following relation from Eq. 15 can be used: If we define a matrix d n± as there is the relation: We may start from NTO1 and proceed iteratively through higher NTOs, checking which of the associated two eigenvectors obtained in Eq. 16, denoted as U α n and V β n , produces the smaller value in Eq. 19. If not, we simply multiply the whole vector V  to obtain a correct pair of V β 1 and U α 1 . As this is a separate operation from the calculation of √ λ n (Eq. 16), we can first solve for a number of eigenvalues n, where n ≤ N N , and subsequently define a number m, where (m ≤ n) of these where the associated singular vectors are of interest. This can reduce the computational effort compared to directly running an SVD algorithm given that many excitations will be dominated by just NTO1 or a small number of NTOn pairs.

B. Interpretation of the transition origins of NTOn electron-hole pairs
After obtaining the NTO-based transition vector element to any NTOn, n ≤ N N , next we want to interpret the transition origins of each NTOn electron-hole pair from the projection to a standard orbital set. In the present work, we use the natural bond orbitals (NBOs) 34 as the standard orbital set. By such a projection the composition of NTOn-H and NTOn-E can be quantitatively defined, and, for example, the precise contribution of an orbital of interest to an excitation can be computed. We may be interested in the hole-orbital involved in a 1 ππ * excitation of a molecule, and can compute its quantitative involvement in an excitation of the same molecule embedded in a chemical/physical environment with/without covalent bond formed.
Note that the transition vector, X I , can be based on different sets of reference orbitals in different calculations. Rewriting the transition vector using the unique NTO basis removes this reference dependence. Projection of NTOs onto a common orbital set such as NBOs then provides a unification of interpretative orbital set for these NTOs. For example, excitations in two systems of different environments and/or geometries can be compared. Using the root mean square deviation, σ, of the projection coefficients with respect to a common standard orbital set, two NTOs can be compared 35 . The transition origins of each excitation can be seen as its fingerprint, helping to identify excitations of similar character. The potential energy surfaces of excited states can then be constructed. We also note that the similarity of two NTOn-H(E) orbitals can be quantified by the magnitude of projection from one to the other. The projection sign can be arbitrary, as switching the overall phases of both NTOn-H and NTOn-E orbitals at the same time will result in them remaining a singular vector pair in Eq. 15. We will use this method for comparing two NTOn-H(E) orbitals in the present work.
We generate NBOs by applying the NBO 5.0 44 program as a postprocessing step after the ONETEP calculation. A relatively large atomic orbital (AO) basis set is used in ONETEP for generating NBOs, which reduces the electron spillage to a low level (typically 0.006%). In the systems studied here, the HOMO energy deviation is less than 0.03 eV when compared to a calculation using standard in-situ optimised NGWFs.
As an example we can examine the 2nd excitation of an oxirane molecule using TDDFT/QNTO. The geometry of the oxirane molecule, with CCO angle=100 • , is taken from a scan along the CCO angle as will be discussed in greater detail in Sec. IV. There are 9 valence (occupied) KS-orbitals in the calculation, and therefore the HOMO(LUMO) is indexed 9(10) etc. The transition density operator expressed in the KS-orbital basis results from summation over several pairs of orbitals: By contrast, in the NTO basis it is mostly accounted for by a single NTO pair: The transition density operator is dominated by the NTO1 transition, which accounts for about 97% of the total transition density, whereas in the KS-orbital basis the largest component, |ψ 10 0.72 ψ 8 |, only accounts for about 51% of the density. In Fig. 1 we plot the NTO1-H and NTO1-E orbitals as well as the most dominant NBOs for each.

IV. Results and Discussion
In this section, we discuss two applications of QNTO. In subsection A, we focus on tracing the potential energy surfaces associated with the ring opening process In all calculations a cut-off Coulomb approach 45 has been employed to avoid the influence of periodic images from neighbouring simulation cells. A minimal set of 1 NGWF for H atom and 4 NGWFs for C, N, and O atoms is used. All calculations are well-converged with respect to the variational parameters of the localisation radii: suitable values are 10 bohr for valence NGWFs and 16 bohr for conduction NGWFs. The kinetic energy cutoff for the grid spacing is well-converged at 1000 eV. Normconserving pseudopotentials are used to replace core electrons. The (A)LDA functional in the Perdew-Zunger parameterisation 46 is used throughout.    (Fig.  3).

A. Photoinduced ring opening of oxirane
The Gomer-Noyes mechanism, as depicted in Fig. 2, was postulated and experimentally confirmed 47,48 as a way to explain the photoinduced ring opening process of the oxirane molecule. More discussions about this textbook molecule of stereochemistry can be found, for example, in Ref. 49. In particular, the detailed statespecific information of the Gomer-Noyes mechanism has been examined in some detail through TDDFT excited state molecular dynamics (MD) simulations validated by high-quality quantum Monte Carlo calculations 50,51 , and several states have been found to be involved in mediating the ring-opening reaction.
Here, we perform a constrained optimisation scan along the CCO bond angle of oxirane. We relax all internal coordinates except for a fixed CCO bond angle, which is set to values in the range 65 • to 110 • . For each oxirane geometry in the scan, a TDDFT calculation is performed followed by QNTO analysis to obtain the transition origins of the first few low-lying excitations.
The potential energy curves of the low-lying singlet and triplet states are plotted in Fig. 3. For reference we also show the negative HOMO energy and the value of the HOMO-LUMO gap at each angle. Note that in TDDFT, use of the TDA avoids the problems associated with singlet and triplet instabilities 52 . The former can occur for systems which have strong static correlation, signified by the vanishing HOMO-LUMO gap, whereas the latter can occur for systems which have an unstable closed-shell ground state, reflected by a negative triplet excitation energy within the TDDFT/TDA. A caveat is that the calculation quality would deteriorate for those excitations of energies higher than the negative HOMO energy. As this tends to be underestimated by the TDDFT/LDA 53 , one may obtain fewer bound states than are present in the real system.
We note that from the excitation energies alone there

NTO1-H NTO1-E NTO1-H NTO1-E
FIG. 6. The corresponding potential energy curves for the map shown in Fig. 5 To put this in a form suitable for locating CX points, we construct a map (Fig. 4) showing the projections of the NTO1 pairs for each of the first three excitations (S 1 , S 2 , S 3 ) at each geometry onto their equivalents at different geometries, forming a 3 × 3 block for each pair of geometries. We refer to the geometry listed down the row of the map as the 'System' (Sys) and compare it a 'Reference' (Ref) geometry along the columns. To obtain a meaningful projection while accounting for the geometry change, the NTO1-H(E) orbitals are translated from the Sys-geometry to the Ref-geometry and renormalised before being used for projection.
The optimised atom-centred NGWFs |φ α;r from the calculation in the Sys-geometry are translated to the corresponding atom centres in the Ref-geometry to produce NGWFs |φ α;s→r . The NTO1-H orbital (Eq. 14) is then re-expressed in terms of these using the original coefficients V The translated orbital is then renormalised:

NTO1-H NTO1-E
FIG. 8. The corresponding potential energy curves for the map shown in Fig. 7. Unlike the case in Fig. 5, the switch of dominant NTO character cannot be seen from the plot of NTO1-H and NTO1-E, reflected also by the map (Fig. 7) where the projection results among 65, 66, and 67 cannot be seen to reveal a switch due to a smoother transition of dominant NTO1 character.
from the other is higher than 50%. One then sees where there are switches of transition origins of these adiabatic excited state curves, or changes of transition character of all excited states, from one block to another. The former case can indicate that there has been one or more CXs passed between the two excitations. The latter would be caused by a character change of the ground state itself. This can be due to the fact that a CX between S 0 and S 1 has been passed, e.g. 105 and 110, or that the molecular structures have been drastically different, e.g. 105 and 65.
One can further subdivide the geometry scan to precisely locate a given CX point. We demonstrate two such examples of switches of excitation which show subtly different behaviour. We also note that due to the scheme of TDDFT and the approximations used, the behaviour of TDDFT around CXs can be incorrect 55 , e.g. having too rapid energy variation. TDDFT has, however, been able to offer a reasonable description of oxirane photochemistry 50 , and we stress that the current method can help us locate those reaction-facilitating CX points, whose precise locations, if one is interested, may further be searched for using other high-level theoretical methods 50 . Fig. 5 shows an example where the switch of excitation character previously shown from the coarser map to occur between 60 • and 65 • is narrowed down to occurring abruptly between 62 • and 63 • . The corresponding potential energy curves for the two states involved, and the NTO1-H and NTO1-E orbitals before and after the switch, are plotted in Fig. 6, showing that the bands approach very closely around this point.
In other cases, however, when a more closely spaced geometry map is considered, the change of excitation character may be so smooth that there is no abrupt character switch occurring between two slightly differentiated geometries is found. An example is shown in Fig. 7. The two adiabatic excited state curves as plotted in Fig. 8 also do not come closer than 0.2 eV to each other. However, as the transition characters have switched for the two adiabatic excited states curves between CCO=65 • − 70 • , there must be a CX located nearby, even if it does not occur exactly along the scanned coordinate. For further analysis, the NTO1-H(E) orbitals of excitations from a geometry somewhat further away from those being considered can be used as a reference set, similar to a diabatic configurations set used for locating CXs 54 . For the map shown in Fig. 7, the NTO1-H and NTO1-E orbitals of the first two excitations at CCO=70 • can provide an appropriate reference set. We thus project the NTO1-H and NTO1-E orbitals at varying geometries onto these reference orbitals, giving: . (25) This shows that the two reference electron orbitals from CCO=70 • switch domination between CCO=65 • and 66 • , which could not have been determined from Fig.  8 alone.
We note that the exact point where the domination switches occur can be somewhat affected by the choice of reference orbitals used, although this does not in general prevent us from tracing down the CX to a small region in the scanned coordinate. In the language of diabatic configurations 54 , the situation can be analysed by considering the matrix elements of a HamiltonianĤ acting on a reduced subspace consisting of just two such configurations, denoted as |Φ 1 and |Φ 2 . The switch of domination of two diabatic configurations as a given geometry path is traversed indicates that the surface H 11 −H 22 = 0 has been crossed, though the exact shape of the surface of H 11 −H 22 = 0 will be affected by the reference diabatic configurations used.
For a CX between S 1 and S 0 , on the other hand, it would be reflected in the map by the fact that a whole column of excitations change transition character at the same time, because the ground state itself on which excitations are based has changed character. We are able to locate two such points in Fig. 4: between 107 • and 108 • , and between 111 • and 112 • , both related to the change of the ground state character expected from the Gomer-Noyes mechanism.
After locating the regions where two adiabatic excited state curves switch their characters, we can plot the adiabatic singlet states again, with the curves for the various excitations 're-connected' based on their transition origins, as shown in Fig. 9. State labels of 1 (n,3p z ), 1 (n3s) and 1 (n,3σ) based on the symmetry at the CCO=60 • geometry and propagated to other geometries based on the connection of the states.
It then becomes clear that population of the S 3 state at CCO=60 • can directly lead to a crossing between S 0 and S 1 . The energy decreases monotonically along with the widening CCO angle, which will result in a rapid ring opening reaction. This is consistent with the previous excited states MD study 50 which found that the population of the 1 (n,3p z ) state would facilitate the oc-TABLE II. Decomposition of NTO1-H(E) into NBOs for the first three excitations of oxirane at CCO=60 • . N indicates the excitation number. Proj denotes the projection result. NBO type gives information on the NBO feature. For each excitation, the first 6 ranked NBOs of lone pair (LB), bonding (BD), and anti-bonding (BD * ) types have been listed. The highly basis-dependent Rydberg-type NBOs (RY * ) that accounts for most of these excitations are not listed. Note that the projection magnitude of the 6th-ranked NBO has gone down to below 0.1 for all cases. Total means the total density contribution (square sum) of all the 6 NBOs listed. These NBOs are plotted in Fig. 10 currence of ring opening process. If the 1 (n,3s) state is populated instead, it has to overcome an energy barrier to a widening of CCO angle before meeting a crossing with the 1 (n,3p z ) state which can then lead to the ring opening of oxirane. Thus, this chemical process is more difficult and would produce a lower quantum yield. We identify that the ground state is re-populated from the 1 (n,3p z ) state after oxirane passing the critical point between 107 • and 108 • . Meanwhile, passing the critical point between 111 • and 112 • leads to another change of ground state character, which corresponds to a proton transfer process producing acetaldehyde, the product in the 3rd stage of the Gomer-Noyes mechanism (Fig. 2). This is also consistent with the MD study 50 finding that the proton shift occurs after the oxirane has been de-excited to the ground state. Under thermally elevated conditions this proton shift will likely occur at lower CCO angles: here we address only the quasistatic constrained geometries. A follow-up reaction breaking the C-C bond, the final stage in Fig. 2, would then be possible while remaining in the ground state, given an appropriate combination of inter-

Ref: Sys:
FIG . 11. A plot of the system molecule, trans-2,3-Diphenyloxirane, and the reference oxirane molecule that is extracted from the trans-2,3-Diphenyloxirane. The five atoms labelled are the common (core) atoms. nal motion 50 .
To demonstrate the chemical composition of these transitions, a decomposition of NTO1-H and NTO1-E into Natural Bond Orbitals for the first three excitations at CCO=60 • is shown in Table II. The AO basis used for generating NBOs is fairly large, producing 525 NBOs, but these are mostly of Rydberg (RY * ) type. In practice, the lone-pair (LP) and bonding (BD) NBOs describe the majority of the NTO1-H excitation, which in all three cases is well-described by just five main NBOs. For the NTO1-E orbitals of the three excitations, however, a substantial contribution comes from RY * orbitals, which are highly dependent on the AO basis set used in generating NBOs. The LP and BD NBOs for NTO1-H, and the most relevant BD * NBOs for NTO1-E are shown in Fig.  10.

B. Trans-2,3-Diphenyloxirane
In this subsection we focus on a significantly larger system, the trans-2,3-Diphenyloxirane molecule, which contains the same CCO subsystem as oxirane as its central motif. We show that we can use the QNTO method to compare the excitations with those of an oxirane molecule. As would be expected, molecules with different physical/chemical environments can have distinct properties. For example, aryl substitution of oxirane can lead to a different ring opening mechanism 56 . The fact that the trans-2,3-Diphenyloxirane, referred to as the system (Sys) molecule, is formed by two aryl substitutions on oxirane makes it a good example for examining the influence of chemical environment on the excitations of the core oxirane moiety using TDDFT/QNTO.
The structure of the Sys-molecule is first fully optimised without constraints. A equivalent geometry for the oxirane molecule, referred to as the reference (Ref) molecule, is then extracted from the optimised Sysmolecule by replacing the two aryl substitutents with hydrogen atoms and optimising the C-H bond lengths, at fixed bond angles. The widespread problem that the energy of charge-transfer excitations is underestimated by local density functionals such as the LDA 11 means in this case that we expect to find large numbers of chargetransfer excitations in the energy window of interest. In followed by renormalisation as in Eq. 23 to obtain |s , |sc , and |rc , respectively. The "s→r" in Eq. 26 denotes that the orbital is translated from the Sys-molecule to the Ref-molecule. U α 1 and V α 1 are defined by setting respectively the coefficients U α 1 and V α 1 in Eq. 14 to zero for those NGWFs not centred on the core atoms; the subscript 's' for Sys-excitations and 'r' for Ref-excitations indicate that the original NGWF coefficient matrices have been used.
By evaluating the overlap rc|sc between an NTOn-H or NTOn-E orbital of the Sys-molecule and the corresponding NTO of the Ref-molecule, one can then tell √ 3 for both the NTO1-H and NTO1-E. The first and second number in a "(.)" is the projection result (%) for NTO1-H and NTO1-E, respectively. 00 denotes 100. Numbers shown in boldface are the excitations plotted in Fig. 12. Given the freedom of orbital shape outside the core region, it would be no surprising to see that the 2nd and 4th Ref-excitation match with multiple Sys-excitations.
whether there is a strong overlap of the character in the core region. In addition, the overlaps sc|s and rc|r allow one to determine how large a proportion of the total excitation is accounted for by the component within the core region.
In the general cases where excitations are dominated by the NTO1, one can then say that the two excitations from different molecules are similar if the following three conditions hold: (1) a Sys-excitation must have a substantial sc|s for both NTO1-H and NTO1-E, (2) a Refexcitation must have a substantial rc|r for both NTO1-H and NTO1-E, and (3) the magnitude of rc|sc must be large.
We find that of all the 100 excitations of the larger system, there are 16 whose cNTO1-H and cNTO1-E respectively match with the cNTO1-H and cNTO1-E of one or more Ref-excitations. We choose a threshold of rc|sc ≥ 1/ √ 2 for both cNTO1-H and cNTO1-E. Overall there are 20 matching combinations, as some excitations overlap significantly with several excitations of the reference system. The 1st, 2nd, 4th, and 8th Ref-excitation are involved in these matching pairs.
If the threshold is lowered to rc|sc ≥ 1/ √ 3, the number of involved Sys-excitations becomes 25 and the number of matchings becomes 35. The 3rd and 7th Refexcitation also now match with some Sys-excitations. All the 35 excitation matchings are listed in Table III.
In Table III, we see that | rc|r | ≥ 1/ √ 2 for both NTO1-H and NTO1-E of the 1st-6th and 8th Refexcitation. As for sc|s , of all the involved 25 Sysexcitations discussed above only the 34th Sys-excitation (hereafter denoted as Sys-34) satisfies | sc|s | ≥ 1/ √ 2 for both NTO1-H and NTO1-E. In other words, for the re-maining 24 Sys-excitations there are ≥ 50% density contributions of NTO1-H and/or NTO1-E which come from outside of the core region. This would be unsurprising as there remain 22 atoms outside of the core region compared to 5 atoms in the core, and the de-localisation of the density contribution for many Sys-excitations is anticipated. If we lower the threshold to | sc|s | ≥ 1/ √ 3 for both NTO1-H and NTO1-E, the Sys-37 and 44 also fall into the set. Some excitation properties of the Sys-34, 37, and 44 along with two more excitations, Sys-71 and 81, respectively for comparing with the Ref-08 and Ref-03, are listed in Table IV and the NTO1-H(E) orbitals plotted in Fig. 12. It can be seen in Table IV that the NTO1 transition vector component, √ λ 1 , has generally decreased for the larger-sized Sys-molecule, although the NTO1 still accounts for at least 60% density contribution.
In Fig. 12 we see that the NTO1-H orbitals are essentially the same for the first 7 Ref-excitations. Moreover, the core parts of the Sys-excitations, Sys-34, 37, 44, and 81, fully coincide with the core parts of this NTO1-H orbital shared by the first 7 Ref-excitations, namely, | rc|sc | = 1.00. The differences between Sys-excitations and Ref-excitations mainly come from the more diffuse NTO1-E orbitals.
Strictly speaking, only the Sys-34 has a significant orbital component of NTO1-E coming from the core region, namely | sc|s | 2 ≥ 0.5. In Table IV Since the LDA can be inaccurate for reproducing the transition origins of molecular excitations, as demonstrated on thymine and thymidine molecules 29 , it would be interesting to perform a similar study using selfinteraction error corrected or reduced functionals incorporating the Hartree-Fock (HF)-exchange, such as the ω97X 7,29 , in the near future 57 , or using some technique to confine electronic transitions to a small spatial range to counteract the de-localisation error from the LDA.

V. Conclusions and future work
In this work we have developed QNTO analysis based on a non-orthogonal basis, as implemented in the linearscaling DFT/TDDFT code ONETEP. We use the QNTO method to analyse the transition vectors of electronic excitations of various molecular systems offered by TDDFT calculations.
We first studied the photo-induced ring-opening process of the oxirane molecule. Through a scan along the reactive CCO angle of an oxirane molecule followed by TDDFT calculations, a profile of several low-lying adiabatic excited state curves on which the excited state driven reaction may be carried can be constructed. However, these adiabatic excited state curves ranked by energies do not give us very much knowledge on how the reaction will proceed. We stress that the reaction process can be made much clearer if the excited state potential energy curves are constructed according to their excitation character instead of their energies. We demonstrated that the transition vectors of these low-lying excitations are dominated by the first NTO electron-hole pair (NTO1), and therefore the transition origins of the NTO1 can be used to represent the character of these excitations. Based on this approach, we have successfully located several special geometrical points along the scanned CCO coordinate where the transition origins of two adiabatic excited state curves switch, thus indicating that there must be a conical intersection located nearby. By reconnecting the adiabatic excited state energy points (calculated by TDDFT) based on transition origins instead of energetic ordering, we are then able to depict how the excited state-driven ring-opening process can proceed. This picture is consistent with the previous excited state molecular dynamics study that found that the population of the 1 (n,3p z ) state facilitates the ring-opening process, while the population of 1 (n,3s) state does not. We have demonstrated that the 1 (n,3p z ) state exhibits a monotonicallydecreasing energy curve along the widening CCO angle, leading to a ring-opening reaction.
QNTO analysis has also been employed to study an aryl-substituted oxirane, the trans-2,3-Diphenyloxirane molecule. The results demonstrate that the approach can be used to identify specific excitations among a large number of other excitations of a large/complex molecular system. By comparing the excitations of this larger molecule to those of oxirane, we have shown that a small number of mutually similar excitations can be identified and compared between the two molecules. The matching conditions that define similar excitations can be tuned by the magnitudes of: (1) the projection result of the core orbitals, cNTO1-H(E), between Sys-excitations and Refexcitations; (2) the proportion of the cNTO1-H(E) accounting for the whole NTO1-H(E) in the Sys-excitation; (3) the proportion of the cNTO1-H(E) accounting for the whole NTO1-H(E) in the Ref-excitation. For example, a Sys-excitation can have a similar cNTO1-E to that of a Ref-excitation, but the cNTO1-E of the Sys-excitation only accounts for a tiny proportion to the whole NTO1-H(E) of Sys-excitation. In such a case, the orbital-section of the NTO1-E of Sys-excitation coming from outside of the core region cannot be ignored and the two excitations can not be assigned as similar excitations. The same approach for finding similar excitations through the transition origins would be particularly powerful when there are numerous excitations in a large molecular system. For example, those excitations that are located in a specific geometric region can be readily identified, helping to analyse the chemical properties of that region in a specific energy range of light absorption.

VI. Acknowledgement
The authors would like to thank Louis Lee for very helpful discussions. All calculations in this work were carried out using the Cambridge HPC Service under EP-SRC Grant EP/J017639/1. JHL acknowledges the support of Taiwan Cambridge Scholarship. TJZ acknowledges the support of EPSRC Grant EP/J017639/1 and the ARCHER eCSE programme. NDMH acknowledges the support of the Winton Programme for the Physics of Sustainability.