I. INTRODUCTION
Section:
ChooseTop of pageABSTRACTI. INTRODUCTION <<II. METHODSIII. RESULTSIV. CONCLUSION AND OUTLOO...REFERENCESWe exploit these features to train GNNs to propagate electromagnetic field distributions in time for arbitrary domain sizes and material distributions. We stress that in this approach, we do not necessarily seek for steady-state solutions of the field but, instead, seek to advance the field propagation by a certain amount of time. The main purpose of our GNNs is to learn the FDTD scheme while exploiting all benefits offered by GNNs, e.g., working on unstructured grids, extrapolation to larger domain sizes, and extrapolation to material distributions not seen in the training process.
This paper is structured as follows: first, we discuss our methods, including FDTD, graphs, and graph neural networks. Subsequently, we outline the physical setup and training data generation. Here, we follow two approaches and focus either on the prediction of the electric field only or on performing full-field predictions. Finally, we conclude with our results and sketch improvements, extensions, and applications.
II. METHODS
Section:
ChooseTop of pageABSTRACTI. INTRODUCTIONII. METHODS <<III. RESULTSIV. CONCLUSION AND OUTLOO...REFERENCESA. The finite-difference time-domain method
The finite-difference time-domain method (FDTD) is an iterative numerical update procedure to compute the evolution of an electromagnetic field in space and time.3232. J. B. Schneider, “Understanding the finite-difference time-domain method,” School of Electrical Engineering and Computer Science, Washington State University (2010), 28. For reasons of consistency, we will briefly sketch the approach in 2D assuming a transverse-magnetic (TM) polarization, starting from the source-free Maxwell equations. Please note that, since different conventions exist, TM is understood in this contribution as the situation where, in an x–y plane, the considered electric field has a non-zero component perpendicular to this plane, while the magnetic field has non-zero components in this plane. An understanding of how the FDTD works is necessary to appreciate the formulation in form of a graph afterward, which is the natural starting point for a graph neural network to solve these equations.For simplicity, we write down Maxwell’s equations for a non-dispersive material characterized in the spatial domain by permittivity ɛ(r), permeability μ(r), electric conductivity σ(r), and magnetic conductivity σm(r) as the point of departure. These Maxwell equations read as−σm(r)H(r,t)−μ(r)∂H(r,t)∂t=∇×E(r,t)=êx∂Ez(r,t)∂y−êy∂Ez(r,t)∂x,(1)σ(r)E(r,t)−ε(r)∂E(r,t)∂t=∇×H(r,t)=êz∂Hy(r,t)∂x−êz∂Hx(r,t)∂y,(2)In these equations, E(r, t) is the electric field, H(r, t) is the magnetic field, σm(r) is the magnetic conductivity, σ(r) is the electric conductivity, and ɛ(r) and μ(r) are the permeability and permittivity, respectively. If dispersive materials are required to be considered, auxiliary differential equations would be needed to supplement these Maxwell equations. These auxiliary equations would describe the temporal evolution of a polarization or magnetization depending on the electric or magnetic field, respectively.
In our considered setting, i.e., the propagation of the fields in the x–y-plane and with the restriction to TM polarization, the differential equations that describe the spatial and temporal evolution of the three involved field components are−σmHx(r,t)−μ∂Hxr,t)∂t=∂Ez(r,t)∂y,(5)σmHy(r,t)+μ∂Hy(r,t)∂t=∂Ez(r,t)∂x,(6)σEz(r,t)+ε∂Ez(r,t)∂t=Hy(r,t)∂x−∂Hx(r,t)∂y.(7)To render the equations manageable on a computer, the space and time are discretized in the FDTD on a finite grid spaced by Δx = Δy and Δt. For the spatial discretization, we define a grid indexed by m and n for the x- and y-direction, respectively; thus, r = (x, y) = (mΔx, nΔy). The temporal steps follow Δt=1c−Hym−12,n,q+12−ΔtεΔy−Hxm,n−12,q+12.(8)Nearly identical expressions can be found for Hx and Hy, respectively. A detailed derivation can be found in the literature.7,337. A. Taflove, S. C. Hagness, and M. Piket-May, “Computational electromagnetics: The finite-difference time-domain method,” in The Electrical Engineering Handbook (Elsevier, Inc., 2005), pp. 629–670.33. A. Taflove, A. Oskooi, and S. G. Johnson, Advances in FDTD Computational Electrodynamics: Photonics and Nanotechnology (Artech house, 2013). The idea of the FDTD is now to evolve the fields in a leap-frog-scheme. To be specific, the spatial differences from the magnetic field allow to advance the electric field in time. Afterward, those spatial differences from the electric field allow advancing the magnetic field in time. To solve the PDE, we have to impose some boundary conditions and potentially some perfectly matched layers, i.e., an impedance matched material that absorbs incoming radiation without any reflection. Such a material is necessary to imitate the infinite open space on a finite grid. To execute an actual simulation, we also need to define sources of electromagnetic radiation. We provide more details on these aspects in the section titled RESULTS since they are related to the training data generation.B. Graph representation
As sketched in Sec. , the whole FDTD formalism is based on a local and iterative updating scheme. That is why the use of graph neural networks appears very suitable to treat this problem. GNNs rely on the information exchange of neighboring nodes, which is ideal to capture the locality of the updating procedure. Moreover, FDTD works independently of the domain size, which can be translated to a GNN working on arbitrary input graphs.The purpose of this section is a brief introduction to graph representations and graph neural networks; thus, it becomes clear to the reader how and why GNNs are potentially useful for FDTD computations.
A graph G(V,E) comprises a set of nodes or vertices V=v1,v2,… and edges E⊆V×V. We consider here only undirected graphs. Therefore, every edge represents an undirected link between two graph nodes eij=(vi,vj)∈E. Both nodes and edges hold a set of attributes, the node features hv∈RD and edge features he∈RC, respectively. In general, the number of features for each node or edge can vary in the so-called heterogeneous graphs. However, in our case, we have a fixed number of features, making our graphs homogeneous. Additionally, we assign spatial positions pi = (pi,x, pi,y) of the nodes that coincide with the simulation coordinates.
Graphs are very suitable to represent unstructured data, such as networks, molecules, or meshes.22–2422. O. Wieder, S. Kohlbacher, M. Kuenemann, A. Garon, P. Ducrot, T. Seidel, and T. Langer, “A compact review of molecular property prediction with graph neural networks,” Drug Discovery Today: Technol. 37, 1–12 (2020). https://doi.org/10.1016/j.ddtec.2020.11.00923. H. Ma, Y. Bian, R. Yu, W. Huang, T. Xu, W. Xie, G. Ye, and J. Huang, “Multi-view graph neural networks for molecular property prediction,” arXiv:2005.13607 (2020).24. P. Tobias, M. Fortunato, A. Sanchez-Gonzalez, and P. W. Battaglia, “Learning mesh-based simulation with graph networks,” arXiv:2010.03409 (2020). For the purpose of field propagation, we represent the electromagnetic field and material distribution as graphs. To keep things simple in this initial work, we concentrate here on dielectric materials only, for which it is sufficient to describe them using a spatially resolved dielectric function only. We can think about two possible graphs.One approach is a direct representation of the Yee-grid; see Fig. 1. Here, the node features hv∈R2 encode the electric field Ez(r, t0) and the permittivity ɛ(r). In extension, the edge features he∈R4 comprise the magnetic field components Hx,y(r, t0) and the relative Cartesian vector rij of linked nodes,rij=(pj,x−pi,x,pj,y−pi,y).(9)Another approach is a meshing of the domain [see Fig. 1(b)] with equivalent edge and node features. While, technically, there is no distinction between a regular grid and a mesh at the level of the graph, it would make a tremendous difference at the level of the FDTD. In the FDTD, frequently, a regular grid is used, which causes a stair-case approximation of the surface of rounded objects. This approximation can lead to computational artifacts that can be avoided with such a mesh.In both cases, the choice of resolution is important. It does not have to be as fine as it is chosen for the FDTD simulation, but a too-low resolution leads to poor results. This is not necessarily caused by bad predictions but rather by a poor interpolation of the predictions to achieve the fields at all points in space inside the domain. Overall, the resolution is a trade-off between exceedingly large graphs, which slows down computation, and too less nodes to achieve accurate results at all points in space. In this work, we use a resolution that is found by manually tuning and choosing a sweet spot.
C. Graph neural networks
After having discussed the classical FDTD algorithm used to advance electromagnetic fields in space and time and having discussed how to represent the discretized Maxwellian problem in terms of a graph, we provide the following short introduction to graph neural networks. In short, the purpose of these networks will be to learn how to propagate a given input field in space and time.
A graph neural network (GNN) is an artificial neural network (ANN) that works on graphs. In contrast to conventional ANNs, such as linear or convolutional networks, a GNN does not rely on a specific shape and size of the input data. It works on any graph, independent of its number of nodes, edges, or connectivity and yields permutation invariance. A GNN can be interpreted as a graph operator F that acts on a graph G and returns a new graph G′ with updated features and potentially new nodes and edges. GNNs can perform tasks on different levels, specifically node-level, edge-level, and graph-level. We want to predict the electric field at each point in space after a fixed time step. This coincides with a node-level regression task since these encode the electric field in space. The graph that our network takes corresponds to the field at a certain moment in time, and afterward, it returns the field after a time step.
Our network comprises several components, depicted in Fig. 2. First, we have two encoders that lift the graph’s node and edge features in a high-dimensional embedding space. Both encoders are fully connected feedforward networks of three linear layers with ReLU activation. The number of features increases with every MLP layer to 32, 64, and 128. Node and edge features must have the same dimension to enable the following generalized graph convolution (GCN) layers.3434. G. Li, C. Xiong, A. Thabet, and B. Ghanem, “All you need to train deeper GCNs,” arXiv:2006.07739 (2020). To perform the graph convolution, each node vi gathers edge and node features of every neighboring node vj:j∈N(i) and connecting edge eij, the so-called message passing. Subsequently, the messages get aggregated (message aggregation) and added to the original node features. Finally, a multi-layer perception (MLP) processes the features. With that, we achieve updated node features vi′. The whole operation readswhere ɛ is a small positive parameter to ensure non-zero division in the aggregation scheme, for which we use max aggregation. Additionally, we use a layer normalization before non-linear activation and we add residual connections after every GCN layer. The message passing allows information exchange of connected neighbors, which we use to adapt the updating scheme performed in the Yee grid. However, since we use several GCN layers, we achieve information exchange not only of directly connected but also of further away nodes. Finally, we have a single decoder, another three-layered MLP, that projects the resulting features of each node back to a single value, the updated electric field. Please note that the whole operation on the input graph G is independent of the amount of nodes and edges, only the number of input features is fixed. In general, this is one of the main strengths of GNNs, which we exploit to treat different domain sizes with the same model.We trained GNNs with different hyperparameters to find a suitable configuration. However, we observed only minor differences in the final performance; thus, a qualitative reproduction of our results can also be achieved with different parameters. In our experiments, we found that a minimum number of trainable parameters of ∼300000 is needed.
III. RESULTS
Section:
ChooseTop of pageABSTRACTI. INTRODUCTIONII. METHODSIII. RESULTS <<IV. CONCLUSION AND OUTLOO...REFERENCESWe train different GNNs to evolve a given electromagnetic field distribution Ez(r, t0), Hx,y(r, t0) for a distinct time step Δt. We follow two different approaches for this purpose. First, we train different GNNs to perform a single time step in electric field prediction, given a comparatively low spatial resolution. We test and evaluate this for both grid and meshing. Second, we want to be able to predict the full field. Therefore, we keep a high temporal and spatial resolution and deduce the magnetic fields from predicted electric fields following the FDTD propagation procedure. The training is performed in parallel on four NVIDIA A100 Tensor Core graphics processing units (GPUs) with 40 GB RAM each. The training time varies, depending on model specifications, between 40 minutes and up to 4 hours.
We use the open-source Python FDTD-implementation meep3535. A. F. Oskooi, D. Roundy, M. Ibanescu, P. Bermel, J. D. Joannopoulos, and S. G. Johnson, “Meep: A flexible free-software package for electromagnetic simulations by the FDTD method,” Comput. Phys. Commun. 181(3), 687–702 (2010). https://doi.org/10.1016/j.cpc.2009.11.008 for data generation and testing. We consider a plane wave source at λ = 1 µm and arbitrarily shaped non-magnetic, non-conducting, non-dispersive, dielectric 2D scatterers of ɛ = 2.1025 in vacuum. The domain size varies between 1 and 3 µm side length with perfectly matched layers as boundary conditions. We chose a spatial resolution of 60 pixels/wavelength to ensure physically correct results. The source is a time-harmonic plane wave propagating along the +x-direction into the considered domain. This comparably basic setting is most suitable for initial investigations. However, the method is not restricted to a specific source or material properties. An expansion to dispersive materials and different sources is possible.A. Electric field prediction
To evolve the electric field only for a fixed time step, we can use a low spatial resolution and comparatively large time step Δt=λ4c. This Δt coincides with 15 high resolution time steps in meep. It is possible to choose such a high time step since the underlying training data were created with a high temporal resolution. Thus, it is physically accurate. The model then learns from correct data to adapt larger time steps. We construct graphs as structured grids, similar to the Yee grid, and meshes. In this case, edge features include both Hx and Hy values.
The training set comprises 125 different shapes, sampled at 20 different time steps of discretization Δt=λ4c. The shapes are generated from random noise after applying a Gaussian filter. Starting from the initial field distribution, either a structured grid or a meshed graph is constructed; see Fig. 1. For meshing, we use the Python library meshpy3636. I. Steinbrecher and A. Popp, “MeshPy: A general purpose 3D beam finite element input generator,” https://compsim.gitlab.io/codes/meshpy, 2021. to create a triangular mesh. We control the number of mesh points by restricting the volume size. The domain edges and the scatterer boundary are added manually. Subsequently, the GNN performs the temporal evolution for Δt, leading to a transformed graph G′. The loss function compares the ground truth result of FDTD with the node features of G′ and computes the mean squared error (MSE),MSE(G,Ĝ)=1Nv∑iNv(hv,i−ĥv,i)2(11)=1Nv∑iNv(Ez,i−Êz,i)2.(12)Here, the FDTD ground truth values are interpolated in the case of meshed graphs to get the field values at every node position. The whole dataset is split into 2000 and 500 samples for training and validation, respectively. We use the optimizer ADAM3737. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980.ISO 690 (2014). and a learning rate scheduler, which reduces the learning rate by a factor of 0.5 if no further improvement of the validation loss is observed within 10 epochs. We train for a total of 200 epochs and observe the convergence of validation and training loss.To achieve the resulting field at every point in space, not only at the node positions, we linearly interpolate the nodes to a grid.
We test the resulting models on 20 different shapes and time steps and observe excellent agreements of the predicted fields with GNN and FDTD simulations. In the case of square grid graphs, we reach a mean squared error of 7.8 × 10−4, whereas meshing results in 4.4 × 10−4. Figure 3 shows two examples for each grid (a) and meshing (b). Here, the left figure shows the initial state. The following figure, i.e., the second to left, shows the predicted field after a time step Δt=λ4c with the GNN. The third figure from the left shows the predicted field values as computed with FDTD for reference purposes. Both predictions from the GNNs considering grids and meshes are in excellent agreement with the ground truth FDTD simulations. To quantify this agreement, we evaluate the normed relative error of the GNN predicted field Ez(r) and FDTD computed field Ez,0(r) at t0 + Δt,δ(r,Ez,Ez,0)=|Ez(r)−Ez,0(r)||Ez,0(r)|+|Ez,0|̄,(13)where |Ez,0|̄ is the average absolute value of the correct field with respect to r. The error is shown in the right of Figs. 3(a) and 3(b). The prediction is slightly worse for smaller propagation times due to transitional temporal effects arising from the source initiation. These effects are particularly hard to capture with meshing with coarser elements in the spatial domain. This results in interpolation errors when converting the graph representation to fields. However, the prediction with grids of resolution 30 pixels/wavelength covers the effects quite well.Now, we test the trained GNNs on domain sizes outside the trained range of 1–3 µm. Figure 4 shows an example with 5 µm side length in meshing. The mean value for this example coincides with ≈1.4% deviation, which is very low. In addition, the shape of the scatterer is rather specific compared to the ones used in training. This shows that our model learned to adapt the propagation procedure of a given electromagnetic field. It can extrapolate very well to larger domain sizes and even complex shapes not seen in training. We achieve similar results with grid graphs.B. Full field prediction
Instead of only predicting Ez(r, t0 + Δt), we want to train models that are able to compute the full electromagnetic field. For that, we add two feature channels to the node decoder to enable direct prediction of Hx,y(r, t
Comments (0)