Ankit
Gaurav
a,
Xiaoyao
Song
b,
Sanjeev Kumar
Manhas
a and
Maria Merlyne
De Souza
*b
aDepartment of Electronic and Communication, Indian Institute of Technology Roorkee, Roorkee, 247667, India
bDepartment of Electronic and Electrical Engineering, University of Sheffield, North Campus, Sheffield S3 7HQ, UK. E-mail: m.desouza@sheffield.ac.uk
First published on 7th December 2024
Efficient storage and processing are essential for temporal data processing applications to make informed decisions, especially when handling large volumes of real-time data. Physical reservoir computing provides effective solutions to this problem, making them ideal for edge systems. These devices typically necessitate compact models for device-circuit co-design. Alternatively, machine learning (ML) can quickly predict the behaviour of novel materials/devices without explicitly defining any material properties and device physics. However, previously reported ML device models are limited by their fixed hidden layer depth, which restricts their adaptability to predict varying temporal dynamics of a complex system. Here, we propose a novel approach that utilizes a continuous-time model based on neural ordinary differential equations to predict the temporal dynamic behaviour of a charge-based device, a solid electrolyte FET, whose gate current characteristics show a unique negative differential resistance that leads to steep switching beyond the Boltzmann limit. Our model, trained on a minimal experimental dataset successfully captures device transient and steady state behaviour for previously unseen examples of excitatory postsynaptic current when subject to an input of variable pulse width lasting 20–240 milliseconds with a high accuracy of 0.06 (root mean squared error). Additionally, our model predicts device dynamics in ∼5 seconds, with 60% reduced error over a conventional physics-based model, which takes nearly an hour on an equivalent computer. Moreover, the model can predict the variability of device characteristics from device to device by a simple change in frequency of applied signal, making it a useful tool in the design of neuromorphic systems such as reservoir computing. Using the model, we demonstrate a reservoir computing system which achieves the lowest error rate of 0.2% in the task of classification of spoken digits.
To address this challenge various machine learning models based on multilayer perceptron (MLP) neural networks have been proposed.15–17 In multi-state devices, a single voltage can result in different current values corresponding to low and high resistance states, necessitating a separate model for each state. For instance, researchers have explored two distinct approaches using MLP neural networks to model memristors. One approach involves decoupling the switching and conducting behaviours,17 whereas the second emulates the physical equations governing state variables and current.16 Additionally, some models directly use the state variable as input to the MLP network for accurate predictions.15
In an alternative approach,18 a memristor device model based on long short-term memory (LSTM) was introduced, that treats the device switching and conducting behaviour as a time-series problem. This approach eliminates the need for two separate models, as the LSTM output depends on the previous input and output states.
However, LSTM is limited by its dependency on previously measured signals in the training dataset, necessitating separate models for distinct signal types i.e. sine and random sine variations, which hinders its compactness and practical use in circuit simulations. These approaches result in increased model complexity, due to separate models for switching/conducting behaviour or distinct signal types, and the need for a larger dataset to accurately capture device dynamics. Additionally, these models encounter challenges in continuous time modelling due to their structure consisting of a fixed number of discrete hidden layers. This inflexibility hampers their ability to adjust to changing temporal dynamics, reducing their effectiveness for real-time processing and continuous learning applications of truly neuromorphic systems.
Alternatively, a new family of deep neural network models called neural ordinary differential equations (neural ODEs) was introduced, which offers an effective approach for continuous-time modeling.19 Instead of specifying a discrete sequence of hidden layers as in traditional neural networks, such as MLP, or LSTM, neural ODEs parameterize the derivative of the hidden state using a neural network. Neural ODE models operate at continuous depths, which means they adapt their evaluation strategy to each input dynamically. Unlike fixed-depth architectures, neural ODEs can have varying depths based on the problem at hand. The output is computed using a black-box differential equation solver for any given time t, making them also useful for irregularly sampled data. Moreover, neural ODEs are both parameter–efficient, typically requiring fewer parameters than traditional neural networks, and memory-efficient, because the initial state is sufficient to predict the dynamical state. This efficiency translates to better performance with smaller datasets during training.19
As an example, the transformation of the input state x0 using a discrete sequence of hidden layers as in traditional neural networks vs. neural ODEs is shown in Fig. 1a and b. Consider the transformation of input from x0 to x1 by a residual network (ResNet), as shown in Fig. 1a.
x1 = x0 + f1(x0,θ1) | (1) |
![]() | (2) |
Recently, a new approach was proposed for modeling spintronic devices using a modified neural ODE framework.21 This approach incorporates an embedding theorem, which reconstructs the state space of a dynamical system from a series of observations. By doing so, it addresses the challenge of unknown internal variables and external time-varying inputs by utilizing multiple successive previous states, which is equivalent to information provided by higher-order derivatives:
X(t) = (x1(t), x1(t − Δtd),…, x1(t − (n − 1)Δtd)) | (3) |
![]() | (4) |
The general schematic of the modified neural ODE function is shown in Fig. 1c, and the neural ODE network with the new function is shown in Fig. S1 (ESI†). Adding extra dimensions with time delay to the input space allows for learning more complex functions with simpler flows. However, this increases the computational overhead, making the ODE solver work harder due to the added complexity.20 Once trained, the prediction speed of neural ODEs and ANODEs is influenced by model complexity and available computational resources. Generally, they are quite efficient during prediction, as most intensive computations occur during training. Further, to deal with inconsistencies in the data, advanced ODE solvers that are designed for stability and consistency such as Nesterov's accelerated gradient22 can be used. Alternatively, pre-data processing techniques such as interpolation can be applied to fill in gaps in the data.
In a different approach,23 a physics-informed neural ODE was proposed. This method incorporates physical principles into reduced-order models (ROMs) by employing collocation-based loss terms. This strategy notably improves the performance in data-scarce and high-noise environments to predict the behaviour of complex systems viz., acoustics, gas and fluid dynamics, and traffic flows. Similarly in another study,24 a physics-enhanced neural ODE combined partially known mechanistic models (based on physical laws) with universal surrogate approximators (neural networks). This hybrid approach allows for more accurate modelling from limited datasets, particularly applied to industrial chemical reaction systems. Such studies demonstrate the adaptability of neural ODEs in accurately predicting the behaviour of complex systems, even when data is limited.
In this paper, we demonstrate a neural ODE based continuous-time model of an electronic charge-based system. We incorporate time-delay embedding of a single observable and external input in a neural ODE as proposed in ref. 21. Using this approach, we demonstrate that neural ODEs can accurately predict the complex behaviour of a three-terminal, non-filamentary, solid electrolyte-based thin film transistor (SE-FET) with a unique negative differential resistance in its gate current characteristics.7 The model achieves high accuracy in dynamically predicting the performance of the SE-FET after training on a minimal experimental dataset spanning only 240 seconds. Furthermore, we compare the performance of the neural ODE model with a previous conventional physics-based model25 and experimental measurement using distinct input not belonging to the training dataset. Our model is able to predict the temporal dynamics of the SE-FET such as excitatory postsynaptic current (EPSC) measured as the time-dependent channel conductance after the application of a voltage pulse (presynaptic pulse) on the gate electrode, notably, where previous multilayer perceptron and long short-term memory-based models of memristor fail.15–18 Moreover, we show that transient, steady state response, and device-to-device variation can also be predicted using the neural ODE model without training a separate model. Additionally, we demonstrated the SE-FET-based reservoir system using the neural ODE model of the device. The performance of this reservoir system was benchmarked using the standard spoken-digit recognition task.
The unique redox reaction occurring within the insulator of SE-FET7 defines the device mechanism.26 The accumulation of positively charged vacancies near the channel end of the insulator, induced by a gate voltage, leads to an additional electrolytic capacitance. During the reverse sweep, this capacitance becomes negative25 due to a rapid collapse of the internal electric field in the device, resulting in steep switching as shown in Fig. 2, without any filamentary process. The characteristics of the SE-FET such as hysteresis, plasticity, negative capacitance, short term memory and non-linearity facilitate a broad spectrum of neuromorphic analogue computing applications such as vision, speech recognition, and time series forecasting.27,28 However, modelling the SE-FET gate current characteristics in insulators is challenging due to the inherent differences between insulators and semiconductors, particularly with respect to the behaviour of charge carriers in semiconductors versus insulators.
In semiconductors, the current continuity equation governs the flow of electrons and holes, ensuring charge conservation and accounting for the drift and diffusion of charge carriers under the influence of electric fields and concentration gradients. Conversely, in insulators, the absence of free charge carriers necessitates consideration of alternative phenomena to explain any observed electrical behaviour. These include space charge effects,29 where charge accumulation at interfaces or within the material influences the electric field; Ion generation and movement,30 involving ion migration within the insulator under an electric field; and tunneling,31 a quantum mechanical process where electrons pass through potential barriers, they classically shouldn’t be able to cross. This introduces challenges in conventional TCAD tools, especially when trying to model current continuity at the same time as these other phenomena. We have earlier used a simple point ion model of Mott and Gurney32,33 to define the motion of the ions in the insulator, coupled with a 1D-Poisson model to evaluate the charge in the channel. The rate of change of sheet charge density at the interface was modelled by a balance between the drift and diffusion current densities. However, the model does not include any gate current characteristics, including the unique redox reaction in the gate insulator (Fig. 2) that underlies the sub-60 nm decade−1 steep switching observed during the reverse sweep of the gate bias.25 Therefore, this model is not sufficient to capture the dynamic characteristics of the device.
![]() | (5) |
To minimize the loss function, the gradients of the loss with respect to the parameter θ are computed using the adjoint sensitivity method, then θ is updated using the adaptive moment estimation (Adam) optimization algorithm in a supervised manner.
Our neural ODE function is given by eqn (4) (i.e., X(t) = (x1(t),…,x1(t − (n − 1)Δtd)), I(t) = (I1(t),…,I1(t − (n − 1)Δtd))) and consists of a feedforward neural network, as shown in Fig. 1c, with three hidden layers, each featuring 200 neurons, with as an activation function. A fourth-order fixed step Runge–Kutta with 3/8 rule is used as the ODE solver to solve this neural ODE function. To prevent overfitting, we use batch sampling without replacement to ensure diversity in the batches. We add dropout layers, which randomly set a fraction of input units to zero during each update in training. This stops the network from becoming overly reliant on specific neurons. Additionally, we initialize weights with a normal distribution to promote stable training. These techniques together make the model more robust and less likely to overfit.
For a clearer visualization, the entire procedure of our technique is laid out in Algorithm 1.
![]() | ||
Fig. 3 Training and testing of a neural ODE model of the SE-FET. (a) Normalized random voltage (green line), in the form of ΔVGS (gate to source voltage) is applied as input to the neural ODE. (b) The corresponding experimentally measured normalized ΔIDS (drain to source current) (grey line) is used to train the neural ODE for which a low training error of 0.013 (RMSE) is obtained (red dotted line). (c) Training error with respect to iterations for a different set of previous states to that used in training. All achieve a low training error with RMSE ≤ 0.02. (d) and (e) Testing the dynamic characteristics of the SE-FET and comparison with the physics based model25 and experiment. Normalized voltage pulse, in the form of ΔVGS (gate to source voltage) is given as input to the neural ODE. The neural ODE model shows a low prediction error of 0.06 (RMSE) when compared to experiment (blue line). A lack of underlying physics that is not fully captured in,25 leads to discrepancy with experiment as shown by the green line. (f) Testing the normalized excitatory post synaptic current (ΔEPSC) of the neural ODE model compared to experiment.7 The presynaptic pulse widths vary from 20 to 240 ms with 50% duty cycle and consist of 10 normalized gate pulses (ΔVGS = 1) in each case. For this task, a low prediction error of 0.0093 (RMSE) is obtained. |
Fig. 3d shows the positive write pulses of +2 V (normalized to ΔVGS = 1) with pulse width of 180 ms continuously repeated 8 times, different from that used in training. For this test, a very low prediction error of 0.06 (RMSE) is achieved as shown in Fig. 3e by the red line. For a lower number of previous states, the error as shown in the Fig. S3 (ESI†) ranges from 0.14–5.68 (RMSE). A higher number of previous states is equivalent to the information provided by higher-order derivatives that leads to better performance albeit for a longer training time.34 On the other hand, our previous physics-based model as shown by the green line in Fig. 3e, fails to capture the dynamics with a prediction error of 0.15 (RMSE). With this new approach (neural ODE), the prediction error is reduced by 60% compared to our physics-based approach25 (see device fabrication and mechanism section). Additionally, our model predicts device dynamics in approximately 5 seconds, compared to nearly an hour using an equivalent computer with a physics-based model. In Fig. 3f, the EPSC response of the device is compared with predicted results with a prediction error of 0.009 (RMSE). The SE-FET, EPSC is measured as the channel conductance changes over time following a voltage pulse applied to the gate electrode. If the EPSC signal persists for a few seconds to tens of minutes, it is analogous to a short-term memory. Conversely, if the EPSC signal endures for several hours to a lifetime, it represents a long-term memory. A stimulus train of different pulse widths (20 to 240 ms) with 50% duty cycle, consisting of 10 normalized gate pulses (ΔVGS = 1) are used in each case. A longer pulse width results in higher EPSC values and extended retention time. Both these characteristics are captured well by the model.
Further, we evaluate the model using an extended input sequence comprising 5000 time steps, with each time step corresponding to 100 ms, consistent with experimental measurements. For this test, the neural ODE accurately captures both the transient and steady-state behaviour of the SE-FET. Remarkably, this is achieved with a low error rate of 0.03 (RMSE), without specifically training the model for steady-state dynamics as shown in Fig. 4a and b. Initially the response of the SE-FET is in transient states up to 2500 times steps which shows a gradual increment in ΔIDS beyond which the device response goes into steady state. Furthermore, experimentally, we find that when different SE-FET devices are subjected to the same input, their responses follow the same trend but only differ in magnitude. We can artificially recreate this behaviour in our model by feeding the same input at different frequencies as shown in Fig. 4c. This is because when we apply input pulses at a lower frequency, the device experiences longer periods of exposure to the signal that results in a higher magnitude and vice versa. To validate this approach, we compare the model output to that of three different SE-FET devices, at (12.5 Hz = 80 ms, 14.28 Hz = 70 ms, and 20 Hz = 50 ms) to emulate the same level of device-to-device variation as in experiment with RMSE = 0.03 or lower (Fig. S4, ESI†). With this simple modification, we can use the model to study any application, with multiple devices including device to device variation without training a separate model of each device.
Table 1 summarizes various machine learning based device models. The previously reported MLP15–17 and LSTM18 based memristor models operate in discrete time, where information flows through a fixed number of layers (depth) during each forward pass. Their depth is determined by the architecture, and it remains constant regardless of the complexity.35 Whereas in neural ODEs, instead of discrete layers, they use continuous-time dynamical systems, allowing a continuous adaptation to their depth implicitly.19 This adaptability makes them suitable for predicting device dynamics across different pulse widths or frequencies using a single trained model. Additionally, the SE-FET devices exhibit short-term memory characteristics, allowing them to switch states from low to high resistance in an analogue manner over time without any input. Thus, continuous-time dynamical systems such as neural ODEs are more effective in capturing the device dynamics and short-term memory. Additionally, whether applied to spintronic21 or SE-FETs, they can be effectively trained with limited experimental data. This makes them valuable for predicting the behaviour of complex systems in environments where data is scarce. In an alternative approach,36 physics-informed neural networks (PINNs) have been used to model system dynamics but differ in how they handle the underlying dynamics. PINNs embed physical laws directly into the training process, using differential equations as constraints. This allows them to model physical systems using both observed data and governing physics equations. Meanwhile, neural ODEs evolve hidden states continuously through a differential equation solver, learning system behaviour solely from data without the need for explicit physical laws. This makes neural ODEs ideal for data-driven applications where the device physics is unknown. In essence, PINNs blend known physics with learning, whereas neural ODEs extract dynamics purely from data.
Device | Machine learning | Data quantity | Continuous-time modeling | Proposed for | ref. |
---|---|---|---|---|---|
TaN/HfO2/Pt memristor | LSTM | Large | Inherently not suited | Process tuning | 18 |
HfOx memristor | MLP | Moderate | No | HSPICE circuit simulation | 15 |
Memristor | MLP | Moderate | No | Transient circuit simulation | 17 |
TaN/HfO2/Pt memristor | MLP | Moderate | No | HSPICE circuit simulation | 16 |
Spintronic | Neural ODEs | Small | Yes | Efficient alternative to micromagnetic simulations | 21 |
ZnO/Ta2O5 solid electrolyte FET | Neural ODEs | Small | Yes | Temporal dynamic modeling | This work |
In conclusion, we demonstrate the application of our model in an example of physical reservoir computing (RC). Physical dynamic reservoirs can efficiently process temporal input with low training costs by leveraging the short-term memory of the device for in-memory computation.27 Our framework and process flow of an SE-FET-based reservoir system for the task of recognition of spoken digits is highlighted in Fig. 5. It consists of three sections: input, reservoir, and a readout function. For a specific input, processed via a mask into a temporal signal of time duration (t), is fed into the reservoir, consisting of the neural ODE model of m number of SE-FET devices. The connection between each temporal input to each SE-FET is fixed. The SE-FET responses are sampled continuously such that the output of the present state also depends upon its previous history. The sampled reservoir output nodes S10, S11… are used to train the weights (Wout) of the readout network using logistic regression. Superscript denotes the device number (m), and subscript denotes (n) over all the sample states of a given device. We use this framework to investigate the impact of various reservoir parameters on performance, such as data representation, single vs. multi-device reservoirs, and device variation. Using the model, we perform a standard benchmark task of recognition of isolated spoken-digits using the NIST TI46 database37 which consists of 500 samples.
![]() | ||
Fig. 5 Our framework and process flow of the SE-FET-based reservoir system for the classification of spoken digits. |
Before feeding the audio file of isolated spoken digits (0–9) into the reservoir as input, it is preprocessed using Lyon's passive ear model based on human cochlear channels (see Supplementary note 1 for methods, ESI†). The preprocessing using Lyon's passive ear model transforms the audio sample into a set of 64-dimensional vectors (corresponding to the frequency channels) with up to 42-time steps. One example representing the original spoken digit waveform of digit 5 is shown in Fig. 6(a) and its preprocessed Cochlea gram by Lyon's passive ear model consisting of 64 channels is shown in Fig. 6(b) where the lower channel number captures the higher frequency components and vice versa. The preprocessed input is then converted into an analog voltage stream by concatenating all 64 channels. The converted analog voltage stream is applied to the SE-FET reservoir in transient states and its response is sampled after every 0.1 second as shown in Fig. 6(c) for spoken digit 5. The procedure is repeated for all 500 samples.
In this implementation, a single device based reservoir achieves an overall mean accuracy of 99.80%. However, a major drawback of a single device-based reservoir is that it takes too much time to process each isolated spoken digit. For example, the input consists of 64 channels with each channel 42 times long. To process one step, it takes 0.1 seconds. Thus, the total time required to process the entire input is (64 × 42 × 0.1) = 268.8 seconds. To resolve this problem, we can use a multi-device-based SE-FET reservoir system consisting of 64 devices, one device per channel processed using different devices in parallel. Each of these 64 devices are assigned to a different input frequency in the range of 10 Hz to 20 Hz to create device to device variation. Each individual channel is fed to devices in parallel, thus reducing the overall time required to process the one isolated spoken digit from 268.8 seconds to (42 × 0.1) = 4.2 seconds. The response of the reservoir for channel numbers 23 and 64 is shown as an example in Fig. 7, similarly, undertaken for all other channels. A multi device based reservoir results in an overall mean accuracy of 99%. Furthermore, for the performance summary of SE-FET based RC under various conditions such as data representation, single vs. multi-device reservoirs see Supplementary note 2 (ESI†).
![]() | ||
Fig. 7 The response of neural ODE model based multi-device reservoir for (a) channel number 23 and (b) channel number 64 is shown as an example. |
In our recently reported work28 for the same task, each channel was divided into 14 sub-sections mainly due to it being practically impossible to measure 242 different input patterns. This lead to unnecessary addition of a reset pulse in experiment for a reasonable mask length of 3 bits sequence read at a time. Such constraints do not exist in the present work, due to the neural ODE model. We were also able to reduce the time to 4.2 seconds by parallel processing, which was previously more than an hour for a single spoken digit.28 Moreover, with the help of the neural ODE model we were able to simulate reservoir computing in an analog mode without digitizing the input, which removes extra preprocessing to convert analog to digital signal, while preserving the integrity of analog information. Our SE-FET based reservoir computing achieved a lower error rate of 0.2% for a single device reservoir and 1% for multi device reservoir which performs better or on par with earlier published works, which need other physical devices as shown in Table S3 (ESI†).
However, real-world variations of manufacturing defects and environmental changes can be reasonably expected to affect the performance of the model. This can be mitigated by incorporating variations in the training data to simulate these defects and changes, helping the model to generalize better under different conditions. Additionally, implementing anomaly detection algorithms can identify and filter out data points likely caused by these defects or changes, ensuring cleaner data for training. Further, to integrate device degradation into the model will require to gather data reflecting degradation. This could involve collecting data from devices at various stages of their lifecycle to show changes in performance metrics over time. Moreover, creating features that quantify degradation, such as the age of the device or its usage frequency, could be beneficial.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4tc03696d |
This journal is © The Royal Society of Chemistry 2025 |