Sudhanshu
Singh‡
*a,
Rahul
Kumar‡
*a,
Soumyashree S.
Panda
b and
Ravi S.
Hegde
c
aDepartment of Physics, Indian Institute of Technology, Gandhinagar, 382355, India
bDepartment of Information and Communication Technology, Pandit Deendayal Energy University, Gandhinagar, 382007, India
cDepartment of Electrical Engineering, IIT Gandhinagar, India 382355. E-mail: hegder@iitgn.ac.in
First published on 1st July 2024
The vast array of shapes achievable through modern nanofabrication technologies presents a challenge in selecting the most optimal design for achieving a desired optical response. While data-driven techniques, such as deep learning, hold promise for inverse design, their applicability is often limited as they typically explore only smaller subsets of the extensive range of shapes feasible with nanofabrication. Additionally, these models are often regarded as ‘black boxes,’ lacking transparency in revealing the underlying relationship between the shape and optical response. Here, we introduce a methodology tailored to address the challenges posed by large, complex, and diverse sets of nanostructures. Specifically, we demonstrate our approach in the context of periodic silicon metasurfaces operating in the visible wavelength range, considering large and diverse shape set variations. Our paired variational autoencoder method facilitates the creation of rich, continuous, and parameter-aligned latent space representations of the shape–response relationship. We showcase the practical utility of our approach in two key areas: (1) enabling multiple-solution inverse design and (2) conducting sensitivity analyses on a shape's optical response to nanofabrication-induced distortions. This methodology represents a significant advancement in data-driven design techniques, further unlocking the application potential of nanophotonics.
In the current landscape, the surge of activity in machine learning, deep learning, and other data-driven7 techniques aimed at overcoming the challenges posed by high dimensionality warrants attention.8–15 These methodologies typically involve training deep neural networks (DNNs) through supervised learning processes. Given sufficient training data, DNNs can effectively map the empirical relationship between nanophotonic geometries and their optical behavior. Once a model is trained, design methodologies can primarily be categorized16 into two main streams: (1) surrogate optimization-based inverse design and (2) all-DNN-based inverse design. In surrogate optimization,17 a DNN capable of accurately predicting spectral behaviors from given geometries (forming a one-to-one manifold18) can serve as a surrogate for a full-wave electromagnetic solver. Conversely, in the all-DNN method, the DNN directly provides a solution without the need for an optimization procedure. Both approaches offer dramatic speed improvements compared to formal inverse design methods reliant on electromagnetic solver calls. However, it is imperative that expediting the search process does not compromise the attainment of an optimal design. As researchers transition from the initial exploration phase, where the feasibility of this approach was convincingly demonstrated, critical attention is now being directed towards addressing the shortcomings inherent in the data-driven approach.17
One glaring limitation of these approaches lies in the fact that nearly all models are trained on relatively small subsets of the extensive array of shapes accessible through fabrication. Most reports in the literature have relied on easily parameterized geometries like polygons. Typically, in such smaller subsets, the range of optical responses is limited, potentially leading to an overestimation of the technique's effectiveness. A rudimentary approach to expanding the shape set involves considering binary images where each pixel can be toggled on or off.19 However, most shapes within this set do not yield feasible designs due to the absence of coherent structure, rendering it impractical to train accurate models with reasonably sized datasets. Liu and colleagues20 have proposed a generative network approach employing unsupervised learning to generate larger and meaningful shape sets. However, the reliance on a separate network to recognize feasible shapes limits the shape sets to single “blob”-like shapes. Subsequent studies by Liu et al.21 have notably extended the size of feasible subsets. Jiang et al.22 used GAN-based inverse design techniques for metagratings,23 using topology-optimized geometries24,25 to generate complex structures. Recent advancements, such as the Progressive Growing GAN (PGGAN) method integrated with self-attention layers by Wen et al.,26 show promise in producing fabrication-feasible and robust shape sets; however these methods have complicated workflows and exhibit loss oscillations. The well-known challenges of adversarial training protocols27 and the need for a handcrafted network to guide the generation towards reasonable shapes are a shortcoming of these techniques. Thus, the challenge of training a DNN on a sufficiently broad shape set while simultaneously ensuring sample efficiency17,28 continues to be a significant hurdle to the widespread applicability of data-driven inverse design methods.
A second major challenge is that neural networks are essentially black boxes where the interpretability of neural network predictions remains a challenge.29 A crucial step in the process is representation learning, which is when machine learning algorithms take significant patterns out of unprocessed data to produce simpler representations. Some researchers have turned to the use of autoencoders,30 a form of representation learning, to extract valuable insights from trained models.18,31 Kiarashinejad and colleagues32 used autoencoders to solve the difficulty of computational complexity by reducing the dimensionality while also improving knowledge of design parameter responsibilities. Zandehshahvar and colleagues33 proposed a novel metric-learning technique combining triplet loss and mean-squared error, which can enhance machine learning methods for inverse design of nanophotonic devices and knowledge discovery. However, the work recognizes the need for additional research and optimization efforts for effectively addressing the issues related to metric learning in nanophotonics. Furthermore, the work does not specifically address the incorporation of structural features into the dataset, indicating a possible field of additional research. In the context of all-DNN-based inverse design, the issue of design “dead zones” resulting from the one-to-many nature of the response–structure mapping has been identified.34 The potential of conditionally trained generative adversarial networks (cGANs) or conditional (adversarial) autoencoders to dynamically encode multiple potential solutions, along with the benefits of representation learning,16,35 suggests that this approach warrants further refinement to facilitate inverse design within the all-DNN framework. In this contribution, we propose linked latent space representational learning to tackle the shortcomings mentioned above. The motivation for this approach stems from a simple observation – two structures, although geometrically different, may be considered similar if their optical responses share similarities.
Latent space representations provide a notion of similarity based on Euclidean distance between two shapes in the learned latent space. By simultaneously training a latent representation of shape and the optical response, and then linking them through cross-training, our approach is able to grasp similarity relationships not only in shape, but also in the optical response axis. The method does not place any restriction on the shape set used for training, allowing users to construct such a set based on their intuition and knowledge. Furthermore, using variational autoencoding, continuous latent representations are learned. We demonstrate that this approach leads to rapid inverse design with possible multiple candidate solutions ranked according to the sensitivity of each design to fabrication imperfections. While this concept has not yet been exploited in the context of inverse design, it is gaining foothold in other data-science domains. Jo et al.36 introduced a groundbreaking technique that merges cross-modal association with multiple modal-specific autoencoders, enabling seamless integration of various modalities while preserving their encoded information within individual latent spaces. Their model's efficacy on a modest dataset underscores its suitability for semi-supervised learning applications (Yu and co-workers37 in natural language processing, Stein and co-workers in conditioned image generation,38 and Radhakrishnan and co-workers39 in medical diagnostics using multiple modalities). Closer to our domain, two reports deserve special mention. (1) Lu and co-workers40 introduced a novel application of paired Variational Autoencoders (VAE) for integrating 2D small-angle X-ray scattering (SAXS) patterns and scanning electron microscopy (SEM) images; and (2) Yaman41 and co-workers have reported a shared dual-VAE approach to correlate the gold nanoparticle cluster geometry with optical responses in hyperstructural darkfield microscopy.
The rest of the paper is organized as follows: after this introduction, in Section 2, we summarize the salient points of the methodology; in Section 3, we first examine the characteristics of the linked latent space representations. Finally, we showcase the utility of our method for rapid inverse design before concluding in Section 4.
Following the training phase, it is essential to highlight the numerous possibilities for data flow, as depicted in Fig. 1C. Data path 1–2 solely utilizes a top variational autoencoder for the reconstruction of the input geometry image, while data path 5–6 solely employs a bottom variational autoencoder for the reconstruction of the input spectra data. The key innovation of our work lies in the dataflow paths 4–2 and 3–6. Path 3–6 enables us to retrieve the spectrum of a given shape via its latent vector, while 4–2 enables the recovery of a shape via a spectrum latent vector. Although a given shape possesses a definite and unique spectral response, a given response may be attributable to one or more shapes. The neighboring points of a given latent vector in the spectral latent space may thus decode to one or more shapes dynamically. In the shape variational autoencoder, shapes are arranged in the geometry latent space to ensure that similar shapes are positioned close to each other. Similarly, similar-looking spectra are neighbors in the spectrum latent space. However, the inclusion of cross-linkages enables the network to associate two distinct shapes that may still yield similar optical responses. Without training the cross-linkages, we would only be able to group shapes based on their geometric similarity. However, by incorporating the cross-linkages, we are now able to introduce a similarity metric based on the similarity of their responses as well. This enhancement allows for a more comprehensive understanding of the relationship between shapes and their optical responses, thereby enriching our ability to analyze and manipulate nanophotonic structures effectively. Fig. 1D illustrates the use-case of the trained encoders and decoders in nanophotonics inverse design. First, given a targeted spectral response, we can rapidly recover potentially multiple solution shapes. Second, each shape can then be assessed for the sensitivity of its optical response to fabrication-induced imperfections. Specifically, by sampling in the latent space neighbourhood of a given shape, we can simulate shape distortions and subsequently determine the variance in the spectral response. From a given set of target shapes, we can identify a shape least susceptible to fabrication-induced imperfections. This approach enables us to optimize the design of nanophotonic structures with enhanced robustness to fabrication constraints.
Encoder: the input of shape/spectra data is processed by the encoder network and transformed into a probability distribution within the latent space. This means that the VAE encodes the data as a range of possible values rather than a single point, allowing it to capture a variety of possible representations of the input shape/spectra data. To ensure that the latent space matches the desired distribution, usually a standard normal distribution, variational inference is employed to approximate the posterior distribution of the latent space representation. For this purpose, random sampling is performed to generate latent space points, and the distribution parameters, i.e., μ (mean) and σ (standard deviation), are optimized using KL divergence loss. Due to the presence of this sampling node (stochastic) in the computational graph, backpropagation is not feasible. To allow smooth optimization during training, the reparameterization trick is utilized, which allows gradients to backpropagate through the sampling process:
z = μ(x) + σ(x) × ε | (1) |
Decoder: the decoder network obtains samples from the latent distribution to reconstruct the original input shape/spectra data. During training, the model adjusts both the encoder and decoder to minimize reconstruction loss, which measures the difference between the original input and the reconstructed output. It also shapes the latent space to adhere to a specified distribution. This process balances two key components: reconstruction loss and the regularization term (often Kullback–Leibler divergence). Reconstruction loss ensures faithful input reproduction, while regularization molds the latent space to match the chosen distribution. By iteratively adjusting these parameters, the VAE learns to represent input data effectively, enabling accurate reconstructions and the generation of new samples by sampling from its learned latent distribution.
A Variational Autoencoder (VAE) not only converts the input data x into a latent space representation z and reconstructs it back to , but it also adds a regularization technique to the encoder. A prior distribution p(z) of the latent space is included in this regularization process. The purpose of this regularization is to restrict the latent representations to a particular distribution. Using an encoder function z ∼ Enc(x) = q(z|x) (posterior distribution of the latent variable z given the input variable x), the VAE learns how to encode input data x into latent variables z during training. The decoder
∼ Dec(z) = p(x|z) (ref. 46) (p(x|z) represents the likelihood of the input data x given the latent variable z) takes a latent variable z to reconstruct the original input data. Typically, these functions are designated as Enc for encoding and Dec for decoding. The reconstruction loss and the regularization term obtained from the prior distribution form the loss function of the VAE, which is represented by the symbol LVAE:
![]() | (2) |
![]() | (3) |
LTotal loss = LVAE1 + LVAE2 + D1‖X1 − ![]() ![]() | (4) |
In training the model, we have six loss terms. These include two reconstruction losses, the KL divergences for both shape and spectrum, and two cross-reconstruction losses. The third term of eqn (4) represents the cross-reconstruction loss from shape to spectrum. Specifically, X1 is the input shape data, and 1(Z2) is the reconstructed shape data. In this reconstruction, we sample random points from the learned latent space (Z2) of the spectrum and pass them through the shape decoder (see ESI Fig. S3†). The fourth term of eqn (4) represents the cross-reconstruction loss from spectrum to shape. Here, X2 is the input spectrum data, and
2(Z1) is the reconstructed spectrum data. We sample random points from the learned latent space (Z1) of the shape and pass them through the spectrum decoder. This cross-sampling technique ensures that each latent space can effectively reconstruct data from the other domain, enhancing the model's ability to handle heterogeneous data. C1, C2, D1, D2, β1 and β2 are the regularization coefficients of each loss term. We found C1 = 1, C2 = 1, D1 = 1, D2 = 1, β1 = 1 × 10−6 and β2 = 1 × 10−5 in our implementation as the well suited values for these weights are determined through experimentation. The detailed description of the training procedure is given in Fig. S4 of the ESI.†
Fig. 2 shows a scatter-plot visualization of the 2D projection of the 8-dimensional shape and spectral latent spaces colour-coded by the 21 classes (refer to this GitHub link which provides the latent representation for all 21 classes individually: https://github.com/22510064/UMAP-Representation-) of shapes considered in this study (see Fig. S5† for better visualization). A clustering of similar shapes and spectra, contrasted with the distinct separation of dissimilar ones within the embedding space51 is observed, underscoring the model's adeptness in capturing the intricate intra- and inter-relationships between shapes and spectra, effectively segregating them into distinct clusters. In the examination of the shape latent space, a cohesive clustering of various geometric entities is observed, including ellipses (blue), double ellipses (orange), Perlin noise (pink), 2-fold (cyan), plus shapes (magenta), triangles (salmon), half moons (gold), and their corresponding cavities. These entities coalesce into a unified cluster with overlapping boundaries, signifying shared geometric characteristics, see ESI Fig. S5A.† However, distinct clusters emerge for rings (green), L-shapes (light gray), and C-shapes (lime), revealing multiple clusters that intertwine with other shapes. Furthermore, upon closer inspection, two distinct subclusters are identified within the H-shape (dark olive green), delineated by the arrangement of transverse lines. Similarly, the polygon shape (black) displays two distinct subclusters, one for symmetrical polygons and another for asymmetrical polygons. This differentiation underscores the unique characteristics inherent within subsets of the H and polygon shapes, with each subcluster demonstrating clear separation, indicative of diverse structural configurations. Additionally, a small subcluster featuring multiple concentric rings is observed within the ring category, further accentuating its unique shape characteristic within the latent space, see ESI Fig. S5B.† We have added plots using t-distributed Stochastic Neighbor Embedding52 (t-SNE) in Fig. S5C.†t-SNE is known for effectively revealing local neighbourhoods, offering a potentially clearer picture of local clusters than UMAP.
![]() | ||
Fig. 2 Visualization of the 8-dimensional shape and spectrum latent spaces projected into a 2-dimensional space. (A) Showcases the shape latent space and (B) corresponding spectrum latent space. Both visualizations utilize UMAP projection techniques.50 (C) Display points in both shape and spectral latent space are colour-coded using 21 shape classes. |
Transitioning to the spectral latent space, a well-defined clustering pattern is discerned, encompassing spectral profiles such as ellipses (blue), double ellipses (oranges), 2-fold shapes (cyan), plus shapes (magenta), half moons (gold), and their corresponding cavities, alongside rings (green), triangle cavities (violet) and H-shapes (dark olive green). Delineating discrete category boundaries proves more challenging for spectral profiles like Perlin noise (pink) and its cavities, L-shapes (light gray), C-shapes (lime), ring cavities (brown), and triangles (salmon). Furthermore, the polygon (black) exhibits two subclusters that manifest clear distinctions from other clusters, mirroring the clustering patterns observed in the shape latent space. This divergence underscores the inherent complexity in spectral signatures, where distinct shapes may yield similar spectra, as evidenced by instances where multiple shapes correspond to the same spectrum.
Next, we examine the continuity aspects of the learned latent space representations. Continuity in the learned latent spaces is crucial to ensure that latent vectors decode to meaningful shapes or spectra. This continuity allows for the generation of novel and meaningful shapes and spectra beyond the original training dataset while also facilitating smooth interpolation between designs. We test the continuity at a local scale as well as a global scale.53–55
We utilised local interpolation within the latent space of shapes by focusing on a specific data point and its near neighbouring points. This procedure entails sampling data points from a distribution centred around the chosen point, typically utilising a normal distribution with a slight standard deviation, as illustrated in Fig. 3A for the 2-fold image. These sampled data points are then decoded using shape and linked spectrum decoders. Through this process, we can observe similar shapes and their corresponding reconstructed spectral responses for reflection in s and p-polarized light as depicted in Fig. 3B, validating the reconstructed spectra against the original spectra (generated using the S4 solver). This illustrates the efficacy of the training model in reconstructing spectra close to the original and highlights the smoothness and continuity of the latent space at the local level. By delving into the variability and diversity within this local region of the shape latent space, we can generate new shapes and corresponding spectra that maintain similarities to the original data points while introducing minor variations. This methodology facilitates the creation of diverse and novel samples, enriching the generative capabilities of the model and deepening our understanding of the underlying data distribution.16,56
For global interpolation, we select two distinct data points representing images with the half-moon cavity and L shapes, intentionally chosen to be distant in the latent space. Employing a linear interpolation algorithm,57,58 we sample the latent variables between these two points and subsequently decode the resulting sampled points using both the shape and linked spectrum decoders, as illustrated in Fig. 3C. The reconstruction of the interpolation between latent vectors of two geometrical shapes and their corresponding spectra reveals a smooth transition from one shape to another. Given that the Variational Autoencoder latent space follows a Gaussian distribution, it is expected to yield smoother and more diverse transitions between two geometrical shapes. Fig. 3D showcases the reconstructed shapes and corresponding predicted spectra alongside the original spectra (generated using the S4 solver) between the latent vector of the actual half-moon cavity and the L shape. As depicted, with each step, the cavity gradually shifts towards the L shape with slight noise at step 4 due to a small gap in the latent space. This suggests that the model has learned to disentangle the underlying factors of variation rather than merely memorising the training dataset.16,54 Similar continuity is also observed in the spectrum latent space at the local and the global levels (see ESI Fig. S6 and S7†). Due to the one-to-many mapping between the response and structure, neighborhood points in the spectral space can correspond to varying classes of shapes within the cross-link data path.
Fig. 4A and B showcase the results of an inverse design process for polarization-independent spectral filters. The spectral responses of the output geometry are compared with the target spectra, both predicted (given by DNN) and actual (evaluated using S4). Two sets of results are presented: one where the input tensor is the same for both Fig. 4A(i) and (ii), resulting in different classes of geometries (cross and ring shapes). Another set (Fig. 4A(iii) and (iv)) demonstrates a similar study but with only reflection spectra as input, generating two distinct classes of shapes. Moving to Fig. 4B, it illustrates the inverse design of a polarization-independent dual-band reflection filter (dual-band notches in transmittance) with Gaussian target spectra. Finally, Fig. 4C showcases the results of an inverse design for polarization-dependent color filters. A single geometry set produces bandpass and band-stop filter characteristics for s- and p-polarization in transmission and reflection modes. On our workstation, a single inverse design step which yields ten viable shapes takes an average of approximately 3.2 seconds. Additionally, each of the discovered viable shapes is passed through the full electromagnetic solver, and only those shapes whose spectra closely match exact calculations are retained (this verification step takes an additional 2 minutes for 10 shapes).
For this experiment, two classes of inverse designed geometries, the plus and circle shapes, are utilized as the base shapes. Fig. 1E outlines the encoding process of the base shape through the shape encoder into latent space. In this space, a random noise is introduced into the shape latent point, ranging from 0 to 1.8 in increments of 0.2. Subsequently, the corresponding shapes and spectra are reconstructed from these noisy latent points using the shape and spectrum decoders. The average time taken for the generation of these derived shapes is ∼42 s, whereas the verification of these spectra (that uses the S4 tool) takes ∼3.20 min. Illustrated in Fig. 5A and C are a diverse array of these derived shapes, accompanied by their predicted spectra, illustrating the outcomes of this iterative process. As noise is incrementally introduced with each iteration, the derived shape further diverges from the base shape. To assess the sensitivity of the derived shapes, we compute the mean of squared errors (MSE) between the target spectra for inverse design and the predicted spectra of the derived shapes. Specifically, we iterate the generation process 15 times for the derived shape with the highest noise value and a statistical analysis is shown in Fig. S8.† The MSE value for the plus shape is observed to be having a wider distribution as compared to the MSE of the circle shape, thus depicting a higher sensitivity of the plus shape class.
Evidently, as illustrated in Fig. 5B and D, the ground truth and predicted spectra of the derived plus shapes display a greater deviation from the base shape compared to the derived circle shapes. Therefore, it can be inferred that the predicted optical response of the base circle shape is more resilient to fabrication tolerances and process variability, which inherently impact the final geometrical shape. The spectra of the final geometry closely align with the desired performance characteristics of the base shape's ground truth spectrum.
The reported training of locally and globally continuous cross-linked latent representations can take advantage of manifold18 and metric learning33 and also facilitate the search for novel responses. The versatility of this approach makes it easier to explore complex shape sets and enables innovative research in various kinds of research areas through combining multiple data modalities. Multimodal approaches70 like UNITER71 and triplet network training72 along with scalable semi-supervised learning on graph-structured data73 can find extreme relevance in our approach. Large language models (LLM) are proving adept at interpreting scientific papers. We envision that with the help of LLM and the proposed methodology, it may be possible to learn the structure–response relationships in very large shapes acquired from published literature.
Footnotes |
† Electronic supplementary information (ESI) available: Shape set details and dataset generation, CNN model training and hyperparameter optimization, and additional experimental details including sensitivity analysis. See DOI: https://doi.org/10.1039/d4dd00107a |
‡ The authors contributed equally and are considered as first authors. |
This journal is © The Royal Society of Chemistry 2024 |