Nicholas
Armendarez
a,
Md Sakib
Hasan
*b and
Joseph
Najem
*a
aDepartment of Mechanical Engineering, The Pennsylvania State University, University Park, PA, USA. E-mail: jsn5211@psu.edu
bDepartment of Electrical and Computer Engineering, The University of Mississippi, University, MS, USA. E-mail: mhasan5@olemiss.edu
First published on 4th December 2024
Memristive physical reservoir computing is a promising approach for solving data classification and temporal processing tasks. This method exploits the nonlinear dynamics of physical, low-power devices to achieve high-dimensional mapping of input signals. Ion-channel-based memristors, which operate with similar voltages, currents, and timescales as biological synapses, are promising due to their rich dynamics, especially for use in biological edge settings. Accurate modeling of their dynamics is essential for optimizing network hyperparameters ex situ to save time and energy. Here, a generalized sigmoidal growth model of ion-channel memristor conductance is presented and shown to be more accurate in predicting dynamics than linear or logistic models. Using the exact solution of the proposed sigmoidal model, the MNIST handwritten digit classification task is optimized and trained ex situ, then tested in situ with the same trained weights. This approach achieved an experimental testing accuracy of 90.6%.
In particular, physical reservoir computing (PRC) has recently been a focus of research, with demonstrations of new ways to realize the reservoir layer using power-efficient dynamic devices.4–13 The basis of these techniques lies in the reservoir computing (RC) paradigm, where recurrent neural network (RNN) training is dramatically simplified by creating a “reservoir layer” of a random, high dimensional, and nonlinear network that has the critical feature of “fading” memory. This property indicates that the current state of the nonlinear reservoir strongly depends on recent inputs and past states, with this dependence decreasing at further time points in the past.14–17 The RC layer is typically connected to a readout layer through fixed, trained weights.14 PRC entails creating this reservoir layer using the physics of excitable materials and devices to generate nonlinear dynamics. Many devices have been utilized as reservoir layers in various configurations;8,18 however, in all cases, input is supplied to the device, and the internal states are some physically measurable quantities, such as voltage or current. These measured states dynamically update based on the device's nonlinear response to present and previous inputs.8 A standard device used to build PRC layers is the volatile memristor, which is typically utilized in a parallel architecture where each device is self-recurrent but uncoupled to adjacent devices.5,6,11,12,19 Memristors are two-terminal electronic components whose resistance is modulated based on past electrical stimulation.20 “Volatile” memristors, in particular, return to some base-level resistance after removing stimulation,5 which helps satisfy the “fading” memory requirement for RC systems.
Many types of devices have been shown to exhibit volatile memristance, including metal–oxide,5–7 spintronic,9 perovskite,21,22 ion-channel,11,23 and nanofluidic pores.24 Among these devices, ion-channel-based memristors are of particular interest due to the similarity in operation to biological synapses,23 which also operate with voltage-gated ion channels. Recent works have established alamethicin and monazamycin as volatile memristive ion channels and demonstrated their use in PRC.11,12,25 The architecture of these memristor PRC systems is shown in Fig. 1a. The dynamics of ion-channel-based memristors appear linear with respect to their state variable update on longer timescales and at near steady-state values but show pronounced nonlinearities on shorter timescales and when far from the steady-state value.25,26 Models for the dynamic update in conductance of these devices range from simple, single-state, linear models23,26 to three11 or even five-state nonlinear models.27 These models have tradeoffs regarding the number of physical phenomena represented versus the ease of solving. Generally, linear models offer exact solutions that scale easily in simulation and have been shown to describe functional reservoir dynamics when deviations from the steady-state are sufficiently small.12 Using a model with an exact solution is advantageous so that the readout weights can be trained ex situ using large datasets and then implemented with the responses from physical devices for inference. The ease of simulation allows hyperparameters, such as input encoding, to be tuned outside physical experiments.
Conversely, nonlinear, multistate models provide increased response accuracy far from the steady state, which can be seen in similar memristor devices using monazamycin peptides.25 Notably, for devices that are initially in a low conductance state subjected to a large voltage step, the experimental increase in conductance follows a sigmoidal rather than an exponential response up to the steady state value for short timescales. Nonlinear models can capture this initial transient, as shown in Fig. 2. Besides more accurately describing the dynamics of the memristors themselves, a nonlinear model is desirable due to the critical role of nonlinearity in RC. Conventional RC characteristically utilizes state nonlinear dynamics at each node,14 and nonlinear dynamics are indeed required when the task to be solved is, itself, highly nonlinear. However, the existing multistate models are not analytically solvable due to the nonlinear coupling of state variables. The lack of an exact solution makes ex situ training and hyperparameter tuning impractical as architectures scale larger. Therefore, it is imperative to model ion channel memristors using a model with explicit nonlinear dynamics and exact solutions, making it better suited for computational purposes.28
Due to the conflicting requirements of state nonlinearity and exact solubility, we investigate which known first-order nonlinear ordinary differential equations (ODEs) might apply to our alamethicin-based memristor.23 Given the biological nature of our memristor, we focus on the logistic ODE and the more generalized Richard's Differential Equation (RDE). These nonlinear models are prevalent in biological systems and have exact solutions.29,30 We demonstrate the effectiveness of using an RDE model to accurately capture the dynamics of alamethicin channels in the transient phase. Furthermore, we use the RDE model's exact solution to train a reservoir layer on the full dataset of MNIST handwritten digits encoded as voltage squarewave trains. We optimize this training by tuning the voltage encoding hyperparameters, achieving higher accuracy in this task than previously reported in the literature.5,11 Finally, to validate the model's accuracy and showcase the efficacy of utilizing the nonlinear dynamics of memristors, we experimentally test the reservoir on the MNIST task using the same readout layer trained ex situ and show that the level of accuracy is largely maintained outside of the simulation environment. Fig. 1 graphically explains the mixed training and inference method. Additionally, a more in-depth flowchart describing the workflow can be found in Fig. S5 ESI.†
![]() | (1) |
![]() | (2) |
![]() | (3) |
Na(t) = Ce−t/τ + Nss | (4) |
C = Na(t0) − Nss | (5) |
This solution is valid from any positive starting value of Na, assuming that voltage is constant from t0 to t.
As mentioned in the Introduction, while this linear model captures the dynamics of alamethicin near and at steady-state, it fails to capture a significant part of the initial transient response. Fig. 2 shows that the initial response of alamethicin to input voltage is sigmoidal rather than a simple exponential growth. Therefore, an alternative model is required to capture these dynamics. The significant nonlinearity that needs to be accounted for by such a model is the increase in low-state rates of conductance. The conventional linear model of alamethicin does not predict the exponential rise in conductance in the short time immediately after a voltage is applied. To capture this dynamic, we must introduce a nonlinear state equation.
![]() | (6) |
![]() | (7) |
While the logistic equation captures a sigmoidal temporal rise in response to a step voltage, it requires symmetry around the inflection point. Experimentally, we observe that the inflection point behaves asymmetrically, typically closer to the beginning of the rise than the middle. We observed this behavior in Fig. 2a, where the transition from concave up to concave down happens closer to the beginning of the response than to the steady state. Therefore, we propose that a more general logistic equation, such as Richard's differential equation, can capture this dynamic.
![]() | (8) |
![]() | (9) |
![]() | (10) |
![]() | (11) |
This solution is valid from any starting value of Na, assuming that voltage is held constant from t0 to t.
Because we are considering only first-order models, we can graphically visualize the differences between the models by examining the phase portrait. The two states of the system are pore density and the rate of change of pore density. We model a single trajectory in the phase space at every input voltage. Fig. 3 demonstrates the differences between the various models.
Alamethicin memristors operated far from the equilibrium state must consider an effect known as electrowetting, which describes that the contact area between lipid monolayers tends to increase with applied voltage.23 Because the models considered here predict the areal density of pores, we must consider that area changes will affect the total number of pores in the memristor, directly affecting the conductance. Electrowetting is well studied, and parameters for the exact solution to the electrowetting equations exist in the literature.23
Furthermore, significant discrepancies between the rates of pore exit in response to sub-threshold and supra-threshold voltages have been modeled conditionally with a different set of rate parameters depending on whether the applied voltage surpasses the threshold of the memristor.12 We also include this two-rate effect in our models and find the parameters for the sub-threshold rates from the literature.12
The steady-state pore density also empirically shows deviations from an exponential voltage function at high activation levels (see Fig. S1 ESI†). We account for this slight difference by fitting the steady-state pore density function to a sigmoidal rather than exponential (Methods). This approach better fits the experimental data but requires an additional fit parameter.
Finally, because experimental data displays stochastic noise due to fluctuations in the number of inserted ion channels, which is proportional to the number of inserted pores,26 we include a white noise term to randomly add between −4% and 4% of the calculated number of pores. The inclusion of a noise term acts not only to better represent the stochasticity in the experimental results but also helps to regularize RC training, increasing testing accuracy at the expense of some amount of training accuracy.16 Furthermore, experimental noise for the conductance of the memristors increases inversely to the level of voltage applied. This relationship is characterized in Fig. S2 (ESI†).
The primary consideration for solving the MNIST task with a memristor reservoir layer is the number and way reservoir states are obtained. Reservoir states are the measured values of the memristors that have changed in response to the voltage waveform input. These state values will be sent to the readout layer for classification. Typically, images are row-scanned to create the voltage waveforms. However, taking only a single state per row is not feasible for accurate prediction, as a memristor could reach too many possible conductances after 20 pulses to distinguish them easily.5,11 Effectively, more states are needed to classify the images, and various methods can create these extra states. One method is to break the rows into shorter segments of only a few pixels, allowing for more states and easily distinguishable conductances.5 Alternatively, the whole row waveform can be sent to the memristor, and multiple states can be sampled throughout the stimulation at regular intervals or “virtual nodes” (VN). The VN technique couples the state of the previous VNs in a row to the presently measured one, which increases the connectivity of the reservoir.4,11 Furthermore, additional memristors can increase the number of states by encoding the row information multiple times with different input parameters such as pulse width and off-time. However, in this work, we consider only single input encoding reservoir formulations to avoid synchronization issues of data streams. Utilizing multiple memristors with different dynamic properties could obtain similar results using a single input encoding, avoiding this synchronization issue.12 To make up for the restriction of only a single input encoding, we encode voltage waveforms from the rows and the columns of the digitized images and extract VNs at each fourth pixel. Utilizing 40 parallel memristors (20 for the rows and 20 for the columns) and sampling a VN at every fourth pixel, we obtain 200 total states for each image. With ten classes of digits (0 to 9) and a linear readout layer with ten additional bias terms, there are 2010 total trained parameters. Four parameters control the input encoding of the pixel sequences into voltage waveforms: Von, Voff, ton, and toff, which represent on-voltage, off-voltage, on-time, and off-time, respectively. These parameters transform the discrete input pixel sequence into a temporal voltage waveform. Each scan generates a sequence that is 20 pixels long. For the kth pixel, the voltage is set to Voff for time toff, followed by either Von or Voff for time ton, conditional on if the kth pixel is a 1 or a 0 respectively. The conversion from pixel sequences to voltage waveforms can be seen in Fig. 4.
We performed a coarse grid search in simulation over these four hyperparameters to find input encoding parameters that would best run in the physical devices. For each point in the grid search, the reservoir was simulated using the new model and trained on the full 60000-image MNIST training dataset. The training was a single epoch of stochastic gradient descent utilizing lasso regularization with a regularization parameter of 0.1. The grid search reveals that Von and ton had the most significant effect on the accuracy of the reservoir, while Voff had a lesser effect and toff had the slightest effect. Fig. 5 shows a 3-D cross-section of the 4-D search for the three most effecting parameters and a 2-D cross-section of that 3-D space showing the effect of the two most influencing parameters. The full 4-D space can be seen in Fig. S3 (ESI†). A region of similarly high accuracies was found within the grid search, with the best parameters being Von = 140 mV, Voff = 20 mV, ton = 4.7 ms, and toff = 0.3 ms. This grid search in simulation ran for 4.85 hours on a 48-core server node. This demonstrates the impressive time-saving potential of ex situ optimization of physical reservoir layers, as the equivalent in situ training would have required 293 hours of continuous run time (Note 1 ESI†). This represents an effective 60-times reduction in hyperparameter optimization time.
We used the “optimal” parameters obtained from the grid search to train two reservoirs, one with the RDE model and one with the linear one. We experimentally generated inference data in physical memristors using voltage waveforms constructed with the best results we obtained from the input encoding grid search. We use 1000 testing images for inference. Fig. 6 compares the experimental data to the RDE and linear models in response to an example row of input data. This comparison highlights that the nonlinear RDE model better approximates the experimental data in the far-from-steady state encodings required in the MNIST task. Fig. S4 (ESI†) shows the relative error between the model and experimental data for every pixel sequence used.
After every fourth pixel-equivalent pulse, we sampled the memristors and divided the current output by the voltage input to obtain the conductance at that VN. We collected VN conductances for each unique sequence of pixels in the testing dataset. Next, we constructed state matrices for each test image from the VNs, which included 200 states. We then fed these state matrices into the readout layer in software utilizing the trained weights from the simulated reservoir. Fig. 7 highlights the importance of using the RDE model. When the linear model trained weights are used, the classification fails, predicting only the number seven. By comparison, the total accuracy of the testing set using the RDE model obtained weights was 90.6.
To generate data for phase space extraction, we subjected alamethicin memristors to square waveforms and recorded the current response. At rest, no ion channels existed in the membrane, and upon application of supra-threshold voltage, ion channels formed within the lipid membrane until they reached a steady state. Square waveforms were held constant at the investigation voltage for 100 ms to ensure the memristor had enough time to get a steady state. Data was collected for voltages between 0 and 150 mV in increments of 2 mV; however, only supra-threshold voltages (9–0 mV with the formulation of peptide and salt concentrations used) produced a measurable response that we used for fitting. After each investigation voltage, the memristor was subjected to 0 mV for 10 seconds to ensure it would return to a fully insulating state before the next voltage was applied.
Once we had collected the current data, it was processed using a custom MATLAB script. First, the investigation voltage divided the current response to obtain the conductance. Next, we accounted for the change in the area of the memristor by taking a single measurement of the membrane diameter on the upright microscope. Typically, diameter measurements would be taken continuously over the length of the experiment to obtain real-time data on area change due to electrowetting. However, as the voltage pulses were only 100 ms in these experiments, there was not enough time to obtain adequate data on the area changes as the frame rate of the measurement camera was 30 frames per second. Instead, we measured the area at the beginning of the experiment and used literature values of electrowetting23 to model the relative change in area based on the applied voltages at a 0.1 ms time step. To calculate the pore density of alamethicin, we divided the conductance values by the calculated area at each time point and by the average conductance of a single alamethicin pore, represented by Gu which is found from the literature to be about 5 nS per pore.26
Further, we numerically differentiate this data by dividing the change between points by the timestep (0.1 ms). To minimize the effect of noise, we applied a moving average filter with a window of 50 time steps (5 ms) before numerically differentiating. Pore density and its time derivative are the two states of the system to which we fit the model.
We used a single nonlinear least squares fitting routine for each data set generated by the memristor at the respective investigation voltage. In the case of the logistic model, two parameters (α and Nss) were fit, and in the case of the RDE model, three parameters (β, Nss, and Z) were fit. These parameters were then fit to voltage functions. The parameters α and β were fit to exponential voltage functions, and the parameter Nss was fit to a logistic voltage function of the form:
![]() | (12) |
The data to train and test the network in this study, MNIST handwritten digits, are available at https://yann.lecun.com/exdb/mnist/.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4nr03439b |
This journal is © The Royal Society of Chemistry 2025 |