Robert H.
Coridan
ab
aDepartment of Chemistry and Biochemistry, University of Arkansas, Fayetteville, AR 72701, USA. E-mail: rcoridan@uark.edu
bMaterials Science and Engineering Program, University of Arkansas, Fayetteville, AR 72701, USA
First published on 5th August 2020
Disordered nanostructures in photoelectrodes can increase light absorption in photoelectrochemical system designs. Predicting their optical properties is an elusive task due to the immensity of unique configurations and the intrinsic variance of each. A neural network trained from a small subset of simulations can emulate the complex absorption properties of the entire configuration space for a model disordered system with quantifiable accuracy and computational efficiency.
Predictions of the optical or other properties of an inverse opal photoelectrode are relatively easy because it is crystalline. The optical properties of a disordered photoelectrode are determined by an average over all possible configurations of the electrode, or the ensemble. The number of possible configurations can be extremely large even for disorder on simple scales, and there is no a priori method of simplification. Designing an electrode structure that maximizes light absorption in a semiconductor device requires a search through a large number of structural parameters, with each point in the search involving a unique ensemble calculation. It is necessary to identify methods to approximate ensemble calculations for predicting the properties of a disordered material.
To address this need, this work describes a method for approximating the ensemble properties of disordered light-absorber structures of interest to PEC and other applications based on libraries of finite-difference time-domain (FDTD) simulations and machine learning (ML)-based emulation. A two-dimensional model for a photoelectrode is proposed, comprised of a single semiconductor light absorber embedded in a lattice of close-packed dielectric scatterers. The model is referred to here as an omission glass photoelectrode. An ensemble of the omission glass is the set of all possible configurations of dielectric cylinders on n lattice sites with k of the cylinders omitted from the structure. An example of one configuration of an (n, k) = (41, 3) omission glass is shown in Fig. 1a. In the context of disordered photonics, k determines the density of scatterers, and therefore tailors the mean free path of light propagation. The omission glass photoelectrode is a useful system for studying ensemble calculations because the disorder is discrete and definite. The exact ensemble average of a property such as the absorption spectrum in the light absorber is a straightforward calculation, yet becomes computationally intractable for even small values of k. An ML algorithm infers otherwise hidden correlations between absorption profiles from a training set of simulations to act as an approximation of one or even many entire ensembles. A neural network regression algorithm was used to emulate the two-dimensional spatial distribution of absorption in the semiconductor for every possible configuration of a given (n, k) ensemble for the omission glass. This emulator acts a function mapping the relationship between a specific omission glass configuration to the spatial distribution of absorption in the embedded light absorber. The statistical accuracy of the emulator is measured by comparing its predictions to a subset of the ensemble not used in its training (a test set) or on an entire ensemble where feasible. The result is an emulator that can predict the optical properties of a combinatorial number of disordered electrode structures with quantifiable accuracy.
Here, a single omission glass photoelectrode example is used to illustrate the ML emulator ensemble approach. The geometry of the omission glass photoelectrode is a 250 nm GaAs cylinder centered in a close-packed lattice of 250 nm SiO2 cylinders (n = 41, organized into 7 close-packed layers, Fig. 1a). FDTD simulations for calculating the steady-state electric field and absorption in the simulation volume were performed using the software package MEEP.17 Details of the implementation of the FDTD simulations and optical characterization of the close-packed, k = 0 photoelectrode are included in the ESI.† Each omission configuration for the k = 1, 2, 3, and 4 ensembles was simulated at incident wavelengths λ = 600 nm, 700 nm, and 800 nm (Fig. 1b). The brute-force, ensemble-averaged absorption spectra for k = 1 to k = 4 showed that increasing k from 0 to 4 has a small effect on absorption in the GaAs cylinder. Absorption increased at λ = 600 nm, from 0.231 to 0.276 (a 20% increase) and decreased at λ = 700 nm, from 0.311 to 0.264 (a 15% decrease). The per-configuration variability and absorption extrema for integrated absorption increased at all wavelengths for increasing k. The spatial organization of the k voids therefore can have a significant effect on the absorption for a given configuration within a single ensemble.
A multilayer perceptron (MLP)-based emulator was used to predict the two-dimensional absorption profile in the GaAs absorber for a given configuration of SiO2 scatterers. An MLP is an example of a supervised machine learning algorithm, meaning that it is trained on a subset of input–output observations to allow for predictions on all possible input signals. Assigning an addressable index to each of the n lattice positions provides a unique 41-bit, binary input representation for each configuration in the ensemble: ‘1’ for present scatterer at that site and ‘0’ for omission (Fig. S1, ESI†). For k = 3, each configuration is represented by a unique list of 38 ones and three zeros. An MLP emulator trained on a set of FDTD simulations acts as a function that can predict the absorption profile for the 41-bit representation of any omission glass configuration. The emulator studied here were implemented using the MLPregressor function in the scikit-learn Python library.18 Details of the representation of the simulation data and the MLP implementation parameters are described in the ESI.†
Beginning with the complete k = 1–4 ensembles allowed for the evaluation of the statistical accuracy of the MLP emulator compared to the true physical behavior of the ensemble. It also allowed for the evaluation of predictive accuracy in relation to the size of the training set and choice of included FDTD simulations. To supplement the k = 0–4 ensemble data, a library of randomly chosen FDTD simulations from each of the k = 5, k = 6, k = 8, and k = 10 ensembles was generated, including 20000 examples from each ensemble.
The per-pixel variance of a prediction to the true value, σpp, is a metric for quantifying the prediction accuracy of a trained emulator. The mathematical definition of σpp is included in the ESI.†Fig. 2 shows examples of Apred and AFDTD for randomly chosen k = 3 configurations by an MLP emulator trained on the complete set of k = 0–2 FDTD absorption profiles (862 total simulations) for λ = 600 nm. Additional examples of these predictions are shown in Fig. S2–S4 (ESI†). The spatial distribution and magnitude of Apred and AFDTD agreed qualitatively in general, but σpp quantified the accuracy of the prediction. A discussion regarding the statistical accuracy of MLP emulator predictions on single ensembles are included in the ESI.†
To measure large scale prediction accuracy, the library of simulations (k ≤ 10) was used to train MLP emulators to measure prediction accuracy across each ensemble. The test set for each ensemble calculation of σpp used all simulations from the library excluding the ones used in the training set. Fig. 3a shows the effect that the distribution of simulations (λ = 600 nm) in a training set (Nset) of fixed size has on the accuracy. Each emulator was trained with a training set comprised of the complete k = 0–2 ensembles and Nset = 3000 simulations from the rest of the library, randomly selected in equal number from the contributing ensemble. For example, ‘k ≤ 4’ included 1500 simulations each from the k = 3 and k = 4 ensembles, and ‘k ≤ 10’ included 500 simulations each from the k = 3, k = 4, k = 5, k = 6, k = 8, and k = 10 ensembles. σpp values were low and relatively constant for predictions on the k = 2–5 ensembles regardless of the training set composition. As the number of ensembles represented in the training set increased, the accuracy of predictions for ensembles with larger k-values improved. For k = 10, σpp decreased from 0.192 for the ‘k ≤ 4’ emulator to 0.153 for the ‘k ≤ 10’ emulator. σpp showed similar decreases for simulations from ensembles (k = 7, k = 9) that were not included in the training set.
Increasing Nset showed a nearly uniform decrease σpp for most of the ensembles (Fig. 3b). σpp increased slightly for predictions in the k = 2 ensemble, for which all of the simulations are included in the training set. The increase in σpp can be attributed to the relative decrease in the total fraction of k = 2 simulations in the total training set. Using the same procedure, the distributions of random samples from each of the ensembles (k = 3–10) were modified to include more simulations from ensembles with larger k-values. Increasing the relative number of contributions to the training set from ensembles with large values of k resulted in a further reduction of σpp compared to the uniformly distributed examples. σpp slightly increased for predictions on the k = 2–6 ensembles due to the relative decrease in the representation of those ensembles in the training set.
The integrated absorption (λ = 600 nm) for Apred and AFDTD for each ensemble in the test set is shown in Fig. 4. Each unique simulation in the test set is represented by a point in the scatter plot. The absorption values predicted are in statistical agreement for configurations with low true values of absorption, as indicated by the symmetric and narrow clustering around the diagonal line (slope = 1). The clustering tends to be lower than the diagonal line for high true values of absorption, indicating that an MLP emulator tends to underestimate the absorption for those configurations. The as-trained emulator predicts that the ensemble-averaged absorption in the omission glass will increase with increasing k, which is consistent with the FDTD-derived absorption. At k = 10, the emulator predicts an absorption of 0.292, a 27% increase over the k = 0 electrode. While there is a significant difference between the predicted and true absorption values (0.310), the emulator captures the relationship between k and total absorption (see Fig. S8 and S9, ESI,† for λ = 700 nm and 800 nm).
Here, this work has demonstrated that a neural network algorithm can be used to emulate the complex optical absorption properties of a disordered electrode design. An ML algorithm can infer the hidden correlations between different electrode configurations to predict the spatial distribution of absorption in a small-volume light absorber. Entire sets of ensembles can be approximated by this method with quantifiable accuracy while reducing the simulation cost (number of total simulations for ensemble calculations) by more than five orders of magnitude over the brute-force approach of simulating entire ensembles. Specifically, an MLP emulator can predict the absorption profiles and the assess the accuracy of prediction for the k = 0–10 ensembles of an omission glass (1.5 × 109 unique configurations) with fewer than 104 simulations.
These results demonstrate the potential for using ML in the design of photoelectrodes for PEC applications. This can improve the computational efficiency for predicting the optical performance or incident photon-to-carrier conversion efficiency (IPCE) performance of device designs based on disordered materials. Emulation may have significant impact for IPCE prediction, which in general depends on the spatial distribution of absorption. This is particularly important in semiconductors with low minority carrier diffusion lengths where light trapping strategies are commonly used.19,20 As a direct application of this work, the MLP emulator can speed up calculations to identify the best performing combination of n (or density of GaAs cylinders), k (the omission fraction), and diameters of GaAs and SiO2 cylinders for a PEC solar-to-fuels electrode. Emulation can be applied more generally to continuously disordered materials, though it is necessary to identify a representation that uniquely describes the system structure and corresponding absorption profile.
A drawback for the MLP approach, and most neural network ML algorithms in general, is that the trained emulator does not provide physical insight to high or low performance structures or produce an analytical model that can be generally applied. An MLP algorithm simply acts as a highly parameterized fitting function that can infer correlations between electrode structures and absorption profiles. The ability to make structure–function predictions as described here can spur further statistical analysis to identify the principal factors affecting variance in the absorption profile or otherwise be used to improve the accuracy of the emulators through rational choice of simulations for the training data. Understanding the factors that affect the absorption profile may encourage the development of new fabrication methods to enhance power conversion efficiency in PEC or photocatalytic devices.
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Award Number DE-SC-0020301.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d0cc04229c |
This journal is © The Royal Society of Chemistry 2020 |