Hyoseob Kim‡
a,
Kyungho Hong‡b,
Sungjoon Kimc,
Woo Young Choi*b and
Min-Hwi Kim
*ad
aDepartment of Intelligent Semiconductor Engineering, Chung-Ang University, Seoul, Republic of Korea
bDepartment of Electrical and Computer Engineering and Inter-university Semiconductor Research Center (ISRC), Seoul National University, Seoul, 08826, Republic of Korea. E-mail: wooyoung@snu.ac.kr
cDepartment of AI Semiconductor Engineering, Korea University, Sejong, 30019, Republic of Korea
dSchool of Electrical and Electronics Engineering, Chung-Ang University, Seoul, 06974, Republic of Korea. E-mail: minhwi@cau.ac.kr
First published on 18th June 2025
We demonstrate ultra-low-power spiking neural network (SNN) inference on an RRAM crossbar array by applying network lightweight techniques, and predict average power consumption using a highly accurate array-level model. A 24 × 24 crossbar array was fabricated using non-filamentary HT-RRAM, and quantized and pruned weights were transferred to the array. The compact model of HT-RRAM was used as a synaptic device to simulate a crossbar array model of the same scales the fabricated array, and the same network lightweight techniques were applied in the simulation. Both the crossbar array and the array model successfully transferred over 94% of the weights within an error margin of 2 nS, and the SNN inference results over 25 time steps showed highly consistent output currents. With this reliable array model, power consumption during MNIST inference was estimated for arrays with lightweight techniques applied. Based on our experimental results, the power consumption of image inference operations is predicted to be 243 nW with weight quantization only and 222 nW with weight quantization and pruning, across 10 classes. These findings suggest that ultra-low-power operation can be achieved in the RRAM array through the application of lightweight network techniques.
New conceptsIn this study, we aimed to predict the power consumption during spiking neural network inference in a crossbar array, employing network lightweight techniques for ultra-low power operation, using a highly accurate model. A 24 × 24 crossbar array was fabricated using non-filamentary resistive random access memory based HT-RRAM, and weights with quantization and pruning applied were transferred to the array. Based on our highly accurate and reliable array circuit model, the power consumption was obtained for image inference in hardware-based neural networks employing low-power resistive memories and network lightweight techniques. Based on our experimental results, the power consumption of image inference operations is predicted to be 243 nW with the weight quantization only and 222 nW with the weight quantization and pruning, across 10 classes. We believe that our research and discoveries will serve as a cornerstone for enhancing the performance of ultra-low-power hardware visual inference systems utilizing next-generation resistive memory and establishing a framework for predicting power consumption. |
One of the most promising approaches is to emulate the biological neuron and synapse structures in the hardware. By using crossbar array architectures, ANN can rapidly process data in parallel through vector–matrix multiplication (VMM) between stored weights and incoming input signals.5–8 In particular, the biological brain operates with minimal power consumption by using spike pulses of uniform amplitude and width, enabling event-driven operation based on temporal differences. Therefore, implementing a spiking neural network (SNN) in hardware-based ANN allows for much higher energy efficiency compared to conventional deep neural networks (DNNs) that rely on vector-based input signals.9,10 Synaptic devices responsible for weight storage may utilize conventional CMOS-based non-volatile memory (NVM).11,12 However, due to its limitations in speed and scalability, many researchers are focused on next-generation analog memories such as resistive RAM (RRAM), ferroelectric RAM (FRAM), and spin-transfer torque resistive RAM (STT-RAM).13,14 Among these, RRAM has gained considerable attention for its suitability for array implementation due to its high multi-level cell capability and excellent endurance.15–19
Filamentary RRAM, the most widely studied type among oxygen-based RRAM devices, offers advantages in terms of long data retention and high endurance compared to other types. However, notable current variability with the applied set and reset voltages can interfere with precise weight transfer, directly affecting inference accuracy. Additionally, as device scaling is pursued for high-density integration, the forming voltage tends to increase, and the conductivity in the low-resistance state (LRS) after filament formation does not decrease proportionally relative to that of the high-resistance state (HRS). These characteristics can negatively impact conductivity uniformity across the array and interfere with accurate weight transfer when implementing multi-bit functionality.20–22
Moreover, several challenges arise during the fabrication of RRAM arrays. Since RRAM performance is highly dependent on the chemical composition of materials, any inconsistency of stoichiometry during deposition can result in imprecise resistance changes in response to voltage. Also, if the thin film is not deposited uniformly over the entire wafer area, inconsistencies in device characteristics may occur between array elements. These issues become even more critical in multilayered RRAM devices. Minor changes in process conditions or incomplete process control can weaken the uniformity of switching behavior across the array. This makes accurate weight transfer difficult and necessitates additional time and cost for optimization. Therefore, it is essential to establish a high-precision array model that can predict the variation of operational characteristics of synaptic devices before array fabrication.
In this study, we used an oxygen vacancy-based non-filamentary RRAM to minimize issues such as non-uniform conductivity and scaling limitations encountered in filamentary RRAM. The 24 × 24 RRAM crossbar array was fabricated using HT-RRAM with HfO2 and TiOx as insulating layers, employing it as a synaptic device. To enable ultra-low-power operation, mimicking the efficiency of biological neural networks, a lightweight network was realized by applying quantization and pruning to the pre-trained positive and negative weights, which were then transferred to the array. Subsequently, a 24 × 24 crossbar array was modeled using the PySpice tool, applying the compact model of HT-RRAM to accurately simulate the characteristics of the device under various operational conditions. The weight transfer and MNIST inference results obtained from the physical array and the array model were compared, demonstrating that the array model accurately reflects the behavior of the physical array. A schematic overview of this process is illustrated in Fig. 1. Finally, the average power consumption for 10 classes (from 0 to 9) was calculated using the array model, revealing differences in power consumption when quantization-only and quantization with pruning were applied.
Fig. 2a illustrates the fabrication process of the HT-RRAM device. The process was conducted on a 4-inch heavily doped p-type silicon wafer, with a 300 nm silicon oxide layer formed through a wet oxidation process. First, the bottom electrode (BE) was deposited using an electron-beam evaporation system, where 10 nm of Ti and 50 nm of Pt were deposited. Patterning was carried out through a lift-off process. Next, TiOx was deposited to a thickness of 9 nm via oxygen reactive sputtering which serves as the RSL. To analyze the electrical properties as a function of the oxygen concentration in the RSL, TiOx was grown at three oxidation conditions (0.3, 0.7, and 1.1 sccm). In the third step, a barrier layer of HfO2 was deposited to a thickness of 4.5 nm using atomic layer deposition (ALD). In the fourth step, the top electrode (TE) was deposited using the electron-beam evaporation system, with 10 nm of Ti and 50 nm of Pt. Patterning was carried out through the lift-off process. Finally, residual oxide was etched to remove the remaining oxides and to ensure the stable performance of the device. Additionally, a reference filamentary RRAM device, comprising a Pt/Ti/HfO2/Pt/Ti stack, was fabricated without the deposition of TiOx. Fig. 2b presents the high-resolution transmission electron microscopy (HR-TEM) image and energy dispersive X-ray spectroscopy (EDS) line scanning of the fabricated HT-RRAM device. The HR-TEM image confirms that all layers were stacked as intended, indicating the successful operation of the fabrication process. For a more detailed analysis, the atomic percent obtained from EDS line scanning clearly shows the presence of approximately 4.5 nm of HfO2 between Ti and TiOx, while TiOx was confirmed to have a thickness of around 9 nm. Additionally, Fig. 2c presents the 3D schematics of both the HT-RRAM and the reference RRAM. To analyze the non-filamentary switching behavior of the HT-RRAM, we compared the reference and HT-RRAM devices after performing a DC sweep. As shown in Fig. 3a, the reference RRAM device showed an abrupt set at 1 V and a gradual reset at −1 V after the forming process, which are characteristic of filamentary RRAM behavior. This behavior is consistent with that of the TiOx-based RRAM, as shown in Table S1 (ESI†). For the HT-RRAM devices with oxygen composition of 0.3, 0.7, and 1.1 sccm, the I–V characteristics were measured under the same voltage conditions applied to the top electrode, up to hard breakdown. The corresponding results are presented in Fig. 3b–d. In contrast to the reference RRAM device, the HT-RRAM exhibited a gradual increase in current upon the application of the set voltage under all three oxygen partial pressure conditions. This behavior confirms that the HfO2 layer effectively suppresses the formation of conductive filaments within the TiOx layer. While several previously reported RRAM devices share a similar HfO2 and TiOx bilayer structure, our device uniquely operates without a compliance current or forming voltage, due to its non-filamentary switching mechanism. Comparative details are summarized in Table S2 (ESI†).
![]() | ||
Fig. 2 (a) Fabrication process flow of the HT-RRAM. (b) TEM image and EDS line scan of HT-RRAM. (c) Comparison of stack structure and layer thickness between the HT-RRAM and reference RRAM. |
Moreover, under all three oxygen partial pressure conditions, the set voltage gradually increased in approximately 0.2 V intervals, ranging from 3 V to just before hard breakdown, enabling multi-level conductance control. However, hard breakdown occurred at an average 4.4 V and permanently altered the conductance in the LRS state, making it essential to avoid such conditions during switching operations. Therefore, the voltage range for safe multi-level conductance switching is limited to 3 V to 4.3 V.
To compare the electrical characteristics of HT-RRAM under three oxygen compositions, each I–V curve obtained with a 4 V applied voltage was overlaid, as shown in Fig. 3e. At 1 V, the 1.1 sccm condition exhibited the widest current window, which can be attributed to its higher maximum current under the same set voltage. These results indicate that higher oxygen partial pressure within the RSL increases the probability of oxygen vacancy generation, resulting in a relatively higher conductivity. Such characteristics lead to an expanded current window range, suggesting that oxygen partial pressure directly impacts the size of the current window in the switching layer.
Lastly, the scalability effects of the HT-RRAM device were assessed by measuring cells with four distinct sizes: 2.5 μm × 2.5 μm, 5 μm × 5 μm, 10 μm × 10 μm, and 20 μm × 20 μm. This approach allowed for a comparative analysis of the devices performance across varying cell areas, providing insights into how scaling impacts the electrical characteristics of non-filamentary RRAM. All devices were initially set to the HRS state, and a 4 V bias was applied to the TE to switch them to the LRS state. Under the same bias conditions, larger cell areas exhibited higher current levels. Fig. 3f presents box charts showing the current densities of LRS and HRS states measured from 10 devices per each cell size. The current density was calculated by dividing the current measured at a 1 V read voltage by the respective cell area. The box charts reveal that current density is nearly unaffected by cell size, demonstrating a key characteristic of non-filamentary RRAM devices. Unlike filamentary RRAM devices, where current flows through highly localized conductive filaments, non-filamentary devices exhibit uniform current flow across the entire cell area, resulting in consistent current density. This uniformity highlights a significant advantage for high-density integration: even with a reduction in device size, the electrical characteristics remain consistent, making non-filamentary devices ideal for array fabrication and miniaturization.
Synaptic devices intended for application in crossbar arrays benefit from enhanced multi-level functionality when the current window is wider within the same switching voltage range. To meet these requirements, we aimed to implement an array using the 1.1 sccm HT-RRAM, which demonstrated a larger current window under these conditions. Initially, the multi-level characteristics of the 1.1 sccm HT-RRAM were evaluated. Based on the switching voltages identified in the I–V characteristics, ISPP (incremental step pulse programming) was applied within a range that avoids hard breakdown. Set pulses, starting at 3 V and increasing in 25 mV increments up to 4.3 V, and reset pulses, starting at −1 V and decreasing in 50 mV increments down to −2.5 V, were sequentially applied to the device in its initial state. Between each pulse, a read pulse of 1 V was applied, and the resulting current values were recorded and presented in Fig. 4a. The width of all pulses was set to 30 ms.
The results of three cycles revealed a repeatable current range of approximately 0 nA to 80 nA, demonstrating the potential for 3-bit weight tuning (G1 to G8) with 10 nA intervals at 1 V. In addition, the gradual current increase observed in the long-term potentiation (LTP) region suggests that it is suitable for reaching specific target states, whereas the abrupt current drop in the long-term depression (LTD) region indicates that it should be used solely for resetting the conductance state. Based on these findings Fig. 4b presents the cumulative probability of each tuned state using the incremental step pulse verification algorithm (ISPVA). Conductance was also measured at a 1 V read voltage, and the results include data from 50 devices for each state. Notably, 94% of the total weights were tuned with an error margin of 2 nS, while the remaining values were tuned within a 4 nS error margin. These error margins are well within the acceptable range, given the target weight interval of 10 nS. This indicates that the device can be quantized and programmed as a 3-bit weight in a 24 × 24 array configuration.
Element | Value | Description |
---|---|---|
Rs | 51 × 105·exp[−(abs(VArea1) − 2.52)/0.35] Ω | Non-linear switch resistance |
Ri | 87 × 102·exp[−(abs(VArea2) − 4.52)/0.485] Ω | Non-linear insulator resistance |
RC | 100 Ω | Contact resistance |
Cs | 0.1 fF | Switch Capacitance |
Ci | 0.1 fF | Insulator Capacitance |
S1–Sn | VT (threshold voltage): 0.85–1.55 V | Switch resistances (HRS to LRS) |
VH (hysteresis voltage): 2.4 V | ||
RON (on resistance): 30 kΩ–62 MΩ | ||
ROFF (off resistance): 1012 Ω |
Results from DC sweep simulation demonstrate that the compact model exhibited a gradual set behavior during a voltage sweep from 0 V to 4.2 V, while reset behavior occurred during a sweep down to −1.6 V. To further evaluate the feasibility of implementing multi-level capability, an ISPP simulation was conducted. In this simulation, set pulses were sequentially applied starting at 3 V and increasing in 25 mV increments up to 4.2 V. Subsequently, reset pulses were applied, starting at −1 V and decreasing in 50 mV increments down to −1.6 V, effectively replicating the actual measurement scheme. Fig. 5b presents the current values read at a 1 V pulse between each programming pulse. The pulse width was set to 2 ms. Over three cycles, the current values were observed to consistently range from approximately 0 nA to 80 nA. This behavior closely matches that of the actual device, confirming its potential for 3-bit weight tuning (G1 to G8) with 10 nA intervals at 1 V. These results indicate that the compact model was designed successfully to operate in a way closely aligned with the behavior of the actual device.
The signals and the weights stored in the first layer perform VMM operations to produce currents, which are integrated and fired by neurons to generate spikes for the second layer. The second layer, a fully connected 24 × 10 network, outputs currents through VMM operations involving the spikes from the first layer and its stored weights. This layer is implemented using an HT-RRAM based crossbar array and an array model using PySpice, as shown in Fig. 6b and c. The array model, based on the compact model, was designed to incorporate line resistance. Further details are provided in Fig. S2 (ESI†).
In both cases, the system is designed to accept spike trains over 25 time steps and perform VMM operations at each timestep to produce current outputs. To represent both positive (IG+) and negative (IG−) weight values, the network requires differential pairs of 24 × 10 crossbar arrays. As illustrated in Fig. 6(c), a single 24 × 24 crossbar array is used, where positive weights range from IG+ (1, 1) to IG+ (10, 24), occupying rows 1 to 10 and columns 1 to 24, while negative weights range from IG− (1,1) to IG− (10,24), occupying rows 11 to 20 and columns 1 to 24.24 Input spikes are applied to 24 wordlines (input nodes), and the VMM operations direct the resulting outputs to 20 bitlines (output nodes). The outputs for positive and negative weights are subtracted, and the current sum determines the correct digit based on the output node with the highest current. The overall classification accuracy is calculated using the same method. The array model simulations were conducted under identical conditions.
Therefore, we performed software-based pre-training using Python simulations (ANN with ReLU activation) to ensure that all layers include both positive and negative weights. This off-chip learning approach allows only the inference operation to be executed on the physical array, making it advantageous for low-power operation in resource-limited edge devices. Two types of pre-training were performed: 3-bit quantization (QT) and 3-bit quantization & 40% pruning (QT & PR). Fig. 6d shows the weight histograms for all layers under FP32 and QT conditions, with QT presented separately for each layer. Meanwhile, FP32 weights are distributed across a wide range, and QT restricts weights to specific discrete values. For QT & PR, a pruning ratio of 40% was applied to the FP32 weights using a pruning method, and quantization was performed afterwards.25–27 Fig. 6e illustrates the weight histograms for the second layer after applying QT and QT & PR. In the case of QT & PR, weights near −0.25 and 0.25 disappear due to pruning. All pre-trained weights were designed to maintain an accuracy of 90% during ANN inference, as shown in Fig. 6f, which compares the weight accuracy for each case.
Subsequently, the pre-trained weights were optimized for HT-RRAM characteristics by mapping them to conductance values ranging from 10 nS to 80 nS with 10 nS intervals. SNN simulations were performed to verify that these optimized weights maintained their accuracy. Furthermore, the SNN simulation converted the ANN inputs of each layer spike train, facilitating the ANN to SNN conversion process.
Following the conversion, different methods were employed to transfer the optimized weights to the fabricated HT-RRAM array and the array model. The fabricated RRAM array focused on precise transfer, whereas the array model was designed to minimize simulation time. As shown in Fig. 7a, the RRAM crossbar array used the half-bias method to control the conductance of individual cells.28 In this method, half of the target voltage with opposite polarity was applied to the TE and bottom electrode (BE) using an SMU pulse. A rectangular pulse with a 20 ms width was used to adjust the conductance, and a 1 V read pulse was applied to measure the adjusted conductance. If the measured value exceeded the allowable tolerance from the target state, set and reset pulses were applied iteratively. The pulse voltage was adjusted based on measured error to quickly reach the target state. Furthermore, the voltage applied to half-selected devices was automatically limited to half of the target voltage, preventing unintended conductance changes and enabling accurate weight transfer.
Furthermore, Kim et al. proposed a fast weight transfer method designed for real-time online learning in RRAM-based neuromorphic systems, demonstrating significant improvements in the efficiency of weight update operations.29 In this study, the same method was applied to minimize the simulation time for weight transfer. Fig. 7b illustrates the fast weight transfer method. A triangular pulse with a 2 μs width was applied to the selected wordline at the target voltage, while the bitline was driven with either ground (GND) or an inhibit voltage (VIH) to control writing or inhibition. For inhibited wordlines, half of the target voltage was applied to effectively suppress write disturbances in unselected cells. The key advantage is that devices sharing the same weight level on a wordline are transferred simultaneously, which not only ensures reliable data writing but also reduces the weight transfer simulation time by half compared to the half-bias method.
Using the above methods, we completed the weight transfer and evaluated the accuracy of the transferred weights. Fig. 8 illustrates the change in conductance for 240 devices transferred from the RRAM crossbar array and the array model, classified by weight levels. In both the measured and simulated QT and QT & PR cases, the distinction between LRS and HRS is clearly observed. HRS conductance values are consistently distributed near 0 nS regardless of the weight level, while LRS values show conductance changes corresponding to their respective weight levels within the allowable variance. Notably, in the QT & PR case, the 20 nS weight level is absent due to pruning. This also indicates that our HT-RRAM operates within a smaller conductance range compared to RRAM arrays reported over the past five years, highlighting its advantage in achieving ultra-low-power operation (see Table S3, ESI†). Fig. 9 presents a violin plot that compares the measured and simulated results for both positive and negative weights. The root mean square error (RMSE) and mean absolute error (MAE) are 1.1 nS and 0.85 nS for the measured QT results, and 0.57 nS and 0.48 nS for the QT & PR results. This indicates that, despite a few outliers, most of the weights were transferred with an allowable error margin. Also, the RMSE and MAE for the measured QT & PR results are 0.57 nS and 0.48 nS, respectively. The simulation results also closely matched the measured data, confirming the strong alignment of the array model.
Fig. 10 illustrates the weight maps of the target, the RRAM crossbar array, and the array model after weight transfer under both QT and QT & PR cases. The weight map represents the calculated values of IG+–IG−, and Fig. S3 (ESI†) displays the detailed weight maps for IG+ and IG−. These results confirm that we conclude that the weights were successfully transferred to the RRAM crossbar array. Moreover, the results demonstrated that a reliable array model has been established, closely replicating not only the measurement data, but also its associated error margins.
After completion of the weight transfer process, VMM operations were performed using test image 6. Fig. 11a shows the current values of the output nodes for the RRAM crossbar array measurement and the array model simulation. In both cases, the highest current output was observed for digit 6, and the current outputs from other output nodes exhibited significantly similar trends. This implies that the array model not only emulates weight transfer with high fidelity, but also performs SNN inference in a manner aligned with the physical array. Notably, during inference, all bitlines in both the simulated array model and the physical array are grounded, thereby preventing the occurrence of sneak paths. More details are illustrated in Fig. S7 (ESI†). Additionally, to further validate the VMM operations of the array model, classification tests were conducted using 200 random MNIST test images (20 images per digit) across all digits. The confusion matrix in Fig. 11b and c presents the accuracy of inference for each digit under the conditions QT and QT & PR. QT achieved an average inference accuracy of 95% in all classes, while QT & PR maintained an average accuracy of 90% despite pruning. Interestingly, these results are highly consistent with the pre-training accuracy obtained through Python simulations, as shown in Fig. 6f. Thus, the array model implemented in this study can be considered reliable.
As illustrated in Fig. S6 (ESI†), during the inference process, the current measured on a single wordline in the array is equal to the sum of the currents flowing out from the 20 connected bitlines. Consequently, the power consumed by this wordline over time can be considered the sum of the power consumed by the 20 connected RRAM devices, as presented in eqn (1).30,31
![]() | (1) |
As presented in eqn (2), integrating the time-dependent power consumption of a wordline over the entire inference period and then dividing by that period calculates the average power consumption (PAVG.i) of a single wordline. Furthermore, because the array implemented in this study contains 24 wordlines, the total sum of the PAVG.i values of all wordlines calculates the average power consumption of the array (PAVG.array), as shown in eqn (3).
![]() | (2) |
![]() | (3) |
The time-dependent power consumption of each individual wordline in the array can be visualized, as shown in Fig. S4 (ESI†). The figure also demonstrates that more frequent voltage application leads to increased power consumption. Using this method, we finally computed the power consumption during SNN inference. Fig. 12a shows the total power consumption over the inference time in the array and the corresponding PAVG.array for the case when the input digit is 0. Fig. 12b compares the PAVG.array values during inference on three randomly selected MNIST images for each digit (class) under the QT and QT & PR cases. The total inference time was 25 timesteps, and all images were correctly classified without errors in the results. For all images, QT & PR consumed less power than QT. On average, the PAVG.array values for QT and QT & PR were 243 nW and 22 nW, respectively, which corresponded to approximately 8.6% reduction in power consumption with QT & PR. Although increasing the current pruning ratio beyond 40% could further decrease the power consumption, it may also lead to a decrease in inference accuracy, as illustrated in Fig. S5 (ESI†), which shows trade-off between the pruning ratio and the accuracy of inference. Therefore, determining the optimal pruning ratio for applications that require low power consumption necessitates a careful balance between accuracy and power consumption.
For ultra-low-power consumption, we applied two types of lightweight network: QT and QT & PR to pre-trained FP32 weights. The weights were then successfully transferred to the fabricated 24 × 24 HT-RRAM crossbar array. Through simulations, a successful transfer of the lightweight networks showed a weight map that was highly similar to that of the physical crossbar array, including error margins. The SNN inference results for digit 6 using both the RRAM crossbar array and the array model revealed that the highest current output occurred at the same output node, with closely matching current patterns across all nodes. This confirms that we have implemented a highly reliable array model. Simulation-based verification of VMM operations demonstrated that QT and QT & PR achieved average accuracies of 95% and 90%, respectively, on 200 random MNIST images. Finally, average power consumption calculated during inference (25 timesteps) showed that the two types of lightweight networks consumed 243 nW and 222 nW, respectively, confirming that QT & PR consumed approximately 8.6% less power than QT.
These results suggest that ultra-low-power SNN inference can be achieved not only through network quantization but also by applying pruning techniques. Moreover, the array model developed in this study demonstrated its ability not only to predict operational characteristics but also to accurately estimate error margins when implementing HT-RRAM devices at the array scale. Additionally, the modeling suggests the potential to implement various RRAM devices at the array level.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5mh00086f |
‡ H. Kim and K. Hong contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |