Sajal Kumar
Giri
*,
Lazaro
Alonso
*,
Ulf
Saalmann
* and
Jan Michael
Rost
*
Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, Dresden, 01187, Germany. E-mail: sajal@pks.mpg.de; lalonso@pks.mpg.de; us@pks.mpg.de; rost@pks.mpg.de
First published on 15th December 2020
We have constructed deep neural networks, which can map fluctuating photo-electron spectra obtained from noisy pulses to spectra from noise-free pulses. The network is trained on spectra from noisy pulses in combination with random Hamilton matrices, representing systems which could exist but do not necessarily exist. In [Giri et al., Phys. Rev. Lett., 2020, 124, 113201] we performed a purification of fluctuating spectra, that is, mapping them to those from Fourier-limited Gaussian pulses. Here, we investigate the performance of such neural-network-based maps for predicting spectra of double pulses, pulses with a chirp and even partially-coherent pulses from fluctuating spectra generated by noisy pulses. Secondly, we demonstrate that along with purification of a fluctuating double-pulse spectrum, one can estimate the time-delay of the underlying double pulse, an attractive feature for single-shot spectra from SASE FELs. We demonstrate our approach with resonant two-photon ionization, a non-linear process, sensitive to details of the laser pulse.
In a different vein, a trained neural network has been proposed to represent a (semi-)classical path integral for strong-field physics,10 replacing the need to explicitly calculate a large number of classical trajectories to eventually determine the photo-ionization cross section, which is, however, still an approximation as it is constructed semi-classically. To supply training data for a network which can represent the full quantum path integral, implies most likely use of a numerical effort that would be higher than calculating observables directly.
In general, training of a deep neural network needs a very large amount of non-trivial training data. To generate them experimentally requires substantial additional effort (see the streaking example above). To obtain such data without serious approximations within theory is often prohibitively expensive, as in the second example.
Acknowledging this situation, we have invented another approach: to calculate exactly and explicitly (with the time-dependent Schrödinger equation) photo-electron spectra with a large number of pulses and artificial systems, for which the calculation can be done very quickly. In this way we are able to supply learning data consisting of about 107 spectra. A network trained with these synthetic systems, is not only able to purify noisy test spectra unknown to the network, but from the same class of synthetic systems the training was performed with. Also “real” spectra can be purified, which could come from experiment, or for this work, from a realistic full calculation with parameters for the helium atom. Moreover, noise is, in the context of machine learning, helpful when applied to non-linear photo-ionization: photo-excitation and ionization processes are subject to strict angular-momentum selection rules, thereby limiting the coupling of light to matter. If a light pulse contains noise and operates in a non-linear (at least two-photon absorption) regime it will couple to a much larger part of the electron dynamics of the target. This helps to train the mapping better and enlarges the pool of training spectra naturally.
In general, all trained networks we will present, map one type of spectrum into another (desired) one for a photo-ionization scenario of which only a few key elements need to be specified: the target system should have an excited state around the photon energy ω*, above the ground state, and intensities of the light pulse should be such that two-photon processes dominate. It is not necessary to know more about the target system, as ideally all target systems accessible by the light as specified are covered by the learning space for the synthetic systems, represented by synthetic Hamilton matrices (SHMs). Therefore, one can apply a trained network also to an experimental spectrum from noisy pulses without detailed knowledge of the target system.
Once the design for training such networks with SHMs is set up, that is, the spectra for learning have been computed, it is not difficult to construct other maps with new networks, as the major effort is to supply the learning data which do not have to be changed, while training new networks is computationally relatively cheap. This allows us to provide several mappings in the following to predict spectra for ideal double, chirped and even highly structured partially coherent pulses from noisy spectra. Finally, we will introduce network based mapping for a typical SASE FEL situation: there, single-shot noisy spectra are recorded which depend on further, not explicitly known parameters, e.g., the geometrical orientation of the sample or the time-delay of double pulses used. Considering the latter situation, we reconstruct from noisy spectra simultaneously the noise-free spectra and the time-delay of the double pulse. While we cannot do this with the accuracy of the designated algorithms as described in the context of streaking above, we do not need any additional information but the spectrum itself.
The paper is organized as follows: in Section 2 we give details on the representation of the noisy pulses, explain how to construct the SHMs and describe our fast propagation scheme to solve the electronic Schrödinger equation to obtain the photo-ionization spectra. Section 3 details how the network is trained and set up, including measures on how to quantify errors in the reconstruction of spectra and a convenient way to parameterize them. In Section 4 we present the predictions of the photo-ionization spectra for various pulse forms. Section 5 discusses the single-shot FEL scenario. The paper ends with conclusions in Section 6.
There are many different possibilities for incorporating noise into a signal. We choose the partial-coherence method.11,12 With this method one can create noisy pulses whose average over an ensemble has a well-defined pulse shape. As experimentally demonstrated,12 these kinds of pulses represent pulses from SASE FELs well. In the following, we will use the pulse parameterisation
f(t) = NGT(t)Fτ(t), | (1a) |
(1b) |
Any reasonable pulse can serve as a reference pulse, for which the map created by the network can predict the spectrum. Reasonable means in the present context that the reference pulse’s frequency spectrum is covered by the learning space of fluctuating spectra. The simplest choice is the Gaussian GT(t) in eqn (1) itself rendering the prediction equivalent to removing the fluctuations from the spectrum. Therefore, we call this type of map “purification”.6 In Section 5 we will purify fluctuating spectra from double pulses.
A simple and convenient way to realize this concept is to consider 1D dynamics with a soft-core potential. The corresponding active one-electron Hamiltonian for helium is given by
(2) |
With these eigenstates we calculate the matrix of the time-dependent Hamiltonian H(t) = H0 + A(t) in velocity gauge
(3) |
Eα = 3[ξ1−γ]Ẽα for Ẽα < 0, α > 0, | (4a) |
V0α = 3ξ2Ṽ0α for Ẽα < 0, | (4b) |
Vαβ = 3ξ3Ṽαβ for Ẽα < 0, Ẽβ > 0, | (4c) |
Vαβ = 3ξ4Ṽαβ for Ẽα > 0,Ẽβ > 0. | (4d) |
The idea of SHM is an essential part of our approach which serves two purposes: (i) it allows us to supply a sufficient number of theoretical learning data for the network and (ii) it represents a large variety of systems which could exist in nature but do not necessarily. The SHM should be “dense enough” in the parameter space such that always the Hamilton matrix of a real system one is interested in can be interpolated between SHMs, as interpolation capability is a strength of neural networks (in contrast to extrapolation). Of course, one can formulate more sophisticated SHMs with more parameters, but for the present case the four random parameters are sufficient.
Yet, we need to overcome one final obstacle, and that is the calculation of the spectra based on the SHMs. To obtain those spectra for arbitrary pulse forms A(t) requires us to solve the time-dependent Schrödinger equation (TDSE) which in turn implies that we need an extremely fast propagation scheme to be able to solve the order of 107 TDSEs in a reasonable time.
With the time-independent Hamiltonian Hj = H0 + Aj, we can construct a short-time propagator which is valid over a time span δtj short enough such that a fixed Aj is a reasonable approximation. Therefore, the unitary short-time propagator can be obtained by direct integration,
Uj = e−iHjδtj. | (5) |
The full propagator , is now simply a concatenation of the short-time propagators over respective time spans δtk (with k = 1, …, kmax) over which the discretized Aj hold, where δt1 = t1 − ti and δtkmax = tf − tkmax−1.
To make efficient use of the SHMs, it is imperative that we use the matrix elements from eqn (4) as they do not require explicit integration over wave functions. Hence, we diagonalize 〈α|Hj|β〉 = Eαδαβ + jδAVαβ in the basis of H0 to give its eigenenergies Ejγ and eigenfunctions leading to the short-time propagator
(6) |
Note, that over the entire pulse A(t) certain Aj may occur more than once with different time intervals over which they are valid (if the local derivative dA(t)/dt|Aj is large, the time interval will be small and vice versa). Therefore it is worthwhile to compute the Ujαβ beforehand and keep them stored. They can be used for all pulses (the fluctuating ones as well as the reference one) for a Hamilton matrix specified by the elements in eqn (4). Furthermore, we do not calculate the full matrix of the propagator which would involve many matrix products. It is sufficient to propagate the vector |0〉 of the initial state (the ground state of the system) which requires only the computation of matrix-vector products. Only in this way were we able to calculate the millions of spectra, necessary to train the network.
We calculate nmat = 40000 reference spectra from the same number of SHMs. For each reference spectrum, we calculate npul = 200 spectra (“fluctuating spectra”) from noisy pulses obtained with the partial-coherence method11 using a different noise realization for each SHM. Since solving the TDSE for a single spectrum takes only a few seconds thanks to the highly-optimized propagation scheme outlined in Section 2.4, this procedure can be executed despite the need to solve about 107 TDSEs.
For each SHM, we average over all fluctuating spectra instead of using the individual fluctuating spectra Pkl(E) computed from Hkl(t), where k labels the Hamilton matrix and l the noisy pulse. We normalize all averaged fluctuating and reference spectra, i.e., .
The resulting set of 40000 averaged fluctuating spectra constitutes a major part of the learning space to train the networks in Section 4 for the prediction of spectra from different pulse shapes.
(7) |
The network maps the coefficients of the fluctuating spectra to those of the predicted underlying noise-free spectrum, {k} → {Ck}. The goal of the training is to minimize the difference between the predicted vector Ck for the noise-free spectrum and Crefk of the expected reference spectrum. The coefficients allow us to define a difference familiar from vector spaces as
(8a) |
(8b) |
(8c) |
(8d) |
Fig. 1 Sketch of training and use of a deep neural network with synthetic Hamilton matrices and noisy spectra. |
Implemented with the deep-learning library KERAS,13 a fully connected feed-forward artificial neural network is used to establish the mapping. It contains 5 layers with 60 neurons on each and was trained at a learning rate of 0.001 with 100 epochs, a batch size of 200 and a learning patience of 25. Each hidden layer neuron contains a ReLU activation function.14 The Adam optimizer15 is used to minimize the cost function, eqn (8a). The training success is quantified with the error functions eqn (8a) and (8d), which both decay logarithmically with the size of the learning data, typical for deep learning.16,17
For further reference and to give an overview how successfully the trained networks can predict spectra for the different pulse shapes from the fluctuating spectra, we show to begin with in Fig. 2 the absolute distance errors (ε ≤ 2) of all predicted spectra. Note that for double pulses, the error decreases with increasing time-delay which is probably to be expected since it is easier to identify the time delay if it is larger. The smallest one Td = 4 fs, basically corresponds to a single pulse (recall that the width of each individual pulse is T = 3 fs). Interestingly, the sensitivity to the amplitude ratios of the double pulses is even larger than to the time delay: the spectrum from a 1st pulse which is stronger than the 2nd one is easier to predict than vice versa, with pulses of equal strength taking the middle position in terms of the error.
Fig. 2 Absolute distance error εtest eqn (8d), of the different predicted spectra for test data: double pulses with time delay Td and amplitude ratios A1:A2 as indicated, chirped pules with chirp parameter β, and partially coherent pulses with coherence time τ. |
The strongest sensitivity occurs for spectra from chirped pulses where the ones with the most positive chirp (β = +3) are twice as difficult to predict than ones with β = −3. We will come back to this point later. Finally, it is surprising that a spectrum from a partially-coherent pulse, which is naturally very “busy”, can be identified and therefore predicted from the (averaged) fluctuating spectra, even if the coherence time is shorter than that of the noise (τ = 0.5 fs) with similar accuracy as for longer coherence times of the reference spectrum. We will discuss the spectra from the different pulse forms now in detail.
fd(t) = Nd[A1GT(t + Td/2) + A2GT(t − Td/2)]cos(ω*t), | (9) |
Fig. 3 shows predicted spectra for exemplary double pulses with pulse shapes indicated in gray. Comparison of black and blue curves also helps to develop a sense for what the quantitative distance errors in Fig. 2 mean for the quality of the predictions. The generally good agreement proves that the training of the network was successful and has generated an accurate map.
Fig. 3 Predicted photoelectron spectra (black) are compared to reference spectra (blue, dashed). All spectra are normalized. The corresponding reference pulses (gray) are shown in each panel. In each of the three figure matrices with 3 × 3 panels, time delays Td are 4 fs, 8 fs, and 12 fs, respectively, from left to right; and from top to bottom, pulse amplitude ratios A1:A2 are 1:2, 1:1, and 2:1, respectively, see eqn (9). Left matrix: prediction for a SHM from test data. The SHM is chosen such that Ep ≈ 3.84 × 1016 W fs cm−2 and each prediction returns an absolute distance (numbers in the panels) Dkref, cf. eqn (8b), within the range of 30–70% in the error distribution. Middle matrix: prediction of noisy 3D helium spectra (composed of the sum of the two relevant angular-momentum channels s and d) through the trained network for pulses of pulse energy Ep = 1.6 × 1016 W fs cm−2. Right matrix: same as middle matrix but for an energy of Ep = 6.4 × 1016 W fs cm−2. |
However, the test data, although not used for training, belong to the same class of SHM that are used for training. A more realistic test is the prediction of a 3D helium spectrum as shown in Fig. 3 (middle), as this is similar to predicting spectra from experimental fluctuating pulses. In general, the prediction works very well, as one can see—only small details of the spectral structures are sometimes not resolved. This is remarkable, as the shapes of the spectra from the same reference pulses are quite different for the 1D system used for training and the 3D helium (compare the individual equivalent panels on the left and in the middle Fig. 3). This confirms the transferability of the network and underlines its interpolation capability.
Predictions become worse for increasing pulse energy as shown in the right part of Fig. 3. This is also true for the test data (not shown) but to a slightly lesser extent. While features are still reproduced, the predicted spectra are in general slightly too wide compared to the reference spectra.
fβ(t) = NβGβ(t)cos(φβ(t)), | (10a) |
(10b) |
Fig. 4 Prediction of 3D helium spectra (black, solid) for chirped pulses, eqn (10). The reference 3D helium spectra are shown with blue dashed lines. |
Fig. 5 Prediction of 3D helium spectra (black) for partially coherent pulses (1). The reference 3D helium spectra are shown with blue dashed lines. |
Fig. 6 Properties of test reference spectra for the pulses from Fig. 2. Average ionization yield Pion (blue, right axis) and average mutual distance test (red, left axis). |
This section has shown that the trained networks can predict spectra from widely varying pulse forms well. The effort one has to invest into the deep neural networks for the prediction of the spectra depends on the diversity of spectra a certain pulse form is capable of generating.
We model fluctuating double pulses with noise-free double pulses and admixture of noisy double pulses,
fdq(t) = Ndq[GT(t + Td/2) + GT(t − Td/2)][cos(ω*t) + qFτ(t)], | (11) |
Since so far we have not extracted the time-delay of the pulses from the spectra, we verify in Section 5.1, that it is possible to identify the time-delay of double pulses from noise-free spectra generated by those pulses. In Section 5.2 we will address fluctuating spectra. We first determine the pulses’ time-delay Td encoded in single-shot spectra generated with noisy double pulses. Subsequently, we average the single-shot spectra with identified Td over small intervals of time-delay (1 fs) and purify these averaged spectra. Recall, that purifying means that we remove the fluctuations from spectra by predicting the spectra generated from the respective noise-free pulse forms, in the present case from the noise-free double pulses.
Fig. 7 shows the training success with the SHMs as well as the transfer of the network to unknown 3D helium spectra. The trained network reproduces well the delays (results scatter along the ideal red line with an error given in the inset). For short Td the results deviate from the ideal line since the individual pulses in the double pulse have a width of T = 3 fs which limits the resolution towards small time-delays. Results for the reconstructed time-delay for full 3D helium spectra are given for Td of 4, 8, and 12 fs, respectively, and demonstrate the transferability of the network. The upper row shows the corresponding 3D helium spectra. Given the similarity of these spectra for different time-delays it is remarkable that the trained network can reliably extract the time-delays. We may conclude that we can map out the delay of the pulse from the spectrum it has produced with the help of the trained network.
The last step is to prove that the reconstruction and purification can be transferred to spectra unknown to the networks. To this end we take noisy single-shot spectra of 3D helium with three well-defined time-delays and pass them through the trained network for reconstruction of the time-delay. The scattered points in Fig. 9 show the reconstructed time-delays. We average the corresponding spectra over 1 fs around the three peak time-delays in the scattered points and pass the averaged spectra through the purification network to arrive at the three spectra on the right in red. They agree well with the corresponding reference spectra, averaged over the same intervals of time-delay (black). Hence, the trained networks should be able to reconstruct the time-delay and purify the corresponding fluctuating experimental spectra as they are produced by SASE FELs.
Fig. 9 Same as Fig. 8 but for 3D helium for which the network was not trained. The distribution of predicted time-delays shows three main peaks at 4, 8, and 12 fs. The single-shot spectra are averaged over all spectra with time-delays in an interval of 1 fs about the three peaks. The averaged spectra are passed through the trained network to obtain the corresponding purified spectra shown on the right (red). The three averaged reference spectra (black) are obtained in the same way. |
Here, we have taken this mapping capability to a new level by predicting from fluctuating spectra—which should come ultimately from experiment—the spectra which would be obtained with specific noise-free pulses, namely double pulses, chirped pulses and chaotic (partially-coherent) pulses. While generally the prediction works as well as the purification (prediction) for simple Gaussian pulses, the error analysis has revealed interesting differences for the different pulse shapes.
In a second application we have constructed a neural-network-based map which can extract the time-delay of double pulses from fluctuating single-shot spectra generated by those noisy double pulses. Finally, we could demonstrate that suitably trained networks can achieve both, purification and extraction of the time-delay, from fluctuating single-shot spectra as typically produced by SASE FELs. Clearly, neural networks open promising new ways to analyze particular noisy data with a potential which has been by far not exhausted.
This journal is © The Royal Society of Chemistry 2021 |