Open Access Article
Ugo
Bruno‡
ab,
Daniela
Rana‡
cd,
Chiara
Ausilio
ab,
Anna
Mariano§
a,
Ottavia
Bettucci¶
a,
Simon
Musall
ce,
Claudia
Lubrano||
cd and
Francesca
Santoro
*acd
aTissue Electronics, Istituto Italiano di Tecnologia, 80125, Naples, Italy
bDipartimento di Chimica, Materiali e Produzione Industriale, Università di Napoli Federico II, 80125, Naples, Italy
cInstitute of Biological Information Processing – Bioelectronics, IBI-3, Forschungszentrum Juelich, 52428, Germany
dNeuroelectronic Interfaces, Faculty of Electrical Engineering and IT, RWTH Aachen, 52074, Germany
eFaculty of Medicine, Institute of Experimental Epileptology and Cognition Research, University of Bonn, Bonn, Germany
First published on 30th April 2024
Organic neuromorphic platforms have recently received growing interest for the implementation and integration of artificial and hybrid neuronal networks. Here, achieving closed-loop and learning/training processes as in the human brain is still a major challenge especially exploiting time-dependent biosignalling such as neurotransmitter release. Here, we present an integrated organic platform capable of cooperating with standard silicon technologies, to achieve brain-inspired computing via adaptive synaptic potentiation and depression, in a closed-loop fashion. The microfabricated platform could be interfaced and control a robotic hand which ultimately was able to learn the grasping of differently sized objects, autonomously.
New conceptsIn this manuscript a brain-inspired closed-loop system has been demonstrated for the accomplishment of motor control and actuation tasks, through a learning process mediated by a neurotransmitter. This system integrates well established silicon technologies with organic materials, more suitable for communication with biological neurons. While the intelligence and the decision-making process is completely delocalized, it lies in the local adaptation of a neuromorphic OECT. Such adaptation is closed-loop controlled using a PID control law, and it mimics the neurotransmitter-mediated synaptic plasticity of biological neural networks (BNNs). From the existing field of research, sensing and motion control have been achieved by exploiting organic neuromorphic architectures, with the goal to recapitulate autonomous local learning typical of the human neural processing (Krauhausen, I. et al. Organic neuromorphic electronics for sensorimotor integration and learning in robotics. Science Advances7, eabl5068, 2021). In these applications there are two main aspects still missing: the first one is the lack of a neuromorphic control-loop architecture strongly desirable for the adaptive responsiveness to external stimuli of the system; the second one is the employment of biological signalling as responsible of the synaptic plasticity during the learning process, typical of the human brain and useful for the active integration of this technologies in a biological environment. In this new concept dopamine is the signal used for the strengthen of the artificial synapse, integrated in a closed-loop system able of adaptive and reinforcement learning with an object-specific recognition and training. |
In addition, neuromorphic architectures, by exploiting neural primitives, may enable real-time interaction with the surrounding environment. Indeed, state-of-the-art neuromorphic controllers may either leverage on neuromorphic models to determine a control law,26,27 or they can rely on spiking neural networks (SNNs) to implement long-standing control laws, such as proportional, integral and derivative (PID) controllers on silicon neuromorphic chips.28,29 Crucially, such approaches still fail in recapitulating the autonomous adaptation that characterizes neural processing. As a result, while neuromorphic sensing and motion control in organic neuromorphic architectures were demonstrated,7,30 a neuromorphic control-loop controlled system is still missing. Indeed, the implementation of a neuromorphic closed-loop architecture is strongly desirable, enabling fully autonomous systems, that could adapt as a response to external stimulation. Here, we present a simple and direct approach in the realization of a neuromorphic closed-loop system, in which silicon and organic materials cooperate to accomplish real world tasks, in real-time. In the presented closed-loop system a silicon microcontroller controls actuation and motion, while the intelligence and the decision-making process is completely delocalized, as it lies in the local adaptation of a neuromorphic OECT. Such adaptation is closed-loop controlled using a PID control law, and it mimics the neurotransmitter-mediated synaptic plasticity of biological neural networks (BNNs). Dopamine (DA) oxidation was exploited to strengthen/weaken the artificial synapse, that controls in real time the opening and closure of a robotic hand in a closed-loop configuration. Finally, autonomous reinforcement learning, based on a reward/punishment protocol is implemented. Here, the organic synaptic device ‘learns’ how to drive the hand to grasp different-sized object. The proposed architecture demonstrated how an organic brain-inspired platform can cooperate with either biological and electronic systems, establishing a neuromorphic closed-loop system.
In brief, the neural signalling was recapitulated by applying square voltage pulses at the gate terminal, emulating action potentials (APs) of biological synapses. In a biological synapse, when an action potential reaches the axon terminal of the pre-synaptic neuron, neurotransmitters are released in the synaptic cleft. Such neuroactive molecules, then, bind to specific receptors of the post-synaptic neuron, eliciting an electrical response.31
In the organic synaptic device, upon the application of the gate potential, ions would migrate from the electrolyte to the polymeric channel of the transistor, resembling the neurotransmitter release. In addition, such ions would dope/de-dope the polymeric channel of the transistor, modulating its current, and mirroring the post-synaptic potential response.
In addition, continuous stimulation/inhibition of neural connections results in potentiation/depression of specific neural pathways, resulting in the so-called synaptic plasticity.31 Here, the ENODe could emulate such synaptic potentiation and depression, by controlling in real time the local concentration of a neurotransmitter. A faradaic reaction was indeed exploited to elicit a charge transfer mechanism, doping/de-doping the channel of the ENODe in a non-volatile way (Fig. 1b, c and d, e, respectively).
Among the variety of voltage-oxidizable neurotransmitters23,32 that could be employed in the non-volatile conductance modulation of the ENODe channel, dopamine (DA) was chosen, as this molecule is crucial in the closed-loop circuitry of motor-learning in the brain, in which it is used to enforce positive behavioural outcomes, such as the correct execution of a movement.33,34
First, open-loop non-volatile potentiation and depression (i.e., synaptic plasticity) of the ENODe were demonstrated and characterized. When DA was present in the electrolyte, the application of a gate bias, matching the neurotransmitter's oxidation potential,32 favoured an oxidation reaction and a consequent release of protons and electrons in the electrolyte.35 Cations generated in such reaction would elicit a charge-transfer process to both gate and channel of the organic transistor, reducing the PEDOT:PSS and finally decreasing the device's conductance (Supplementary Discussion S1, ESI†).24 As a result, the redox reaction permanently de-doped the ENODe channel, as the current decreased with the number of pulses (Fig. 1b). The change in the baseline of the current (Fig. 1b) represented the synaptic potentiation of the neuromorphic device, as it mirrored the long-term strengthening of biological synapses that takes place as a response to an increased stimulation.36
Notably, this was a concentration-dependent process (Fig. 1c). A linear dependence of the synaptic potentiation on the concentration of DA employed in the faradaic process was observed in the range 5–30 μM. However, at higher values (50 to 100 μM), the channel conductance modulation would exhibit a saturation behaviour without any significant increase (Fig. S1, ESI†). Here, protons released during the DA oxidation reaction penetrated the bulk of the CP (Fig. 1c), progressively reducing the PEDOT:PSS and de-doping the transistor channel24 whereas the saturation behaviour might occur because of the limited oxidative species present at the gate electrode.23
In addition, synaptic depression could be recapitulated here by introducing H2O2 in the electrolyte solution (Fig. 1d) that oxidizes PEDOT:PSS, reversing the DA-mediated (30 μM) de-doping, restoring the initial conductance level of the ENODe (Supplementary Discussion S1, ESI†).
As shown in Fig. 1e, the conductance variation was not dependent on the concentration of H2O2: the synaptic depression slightly increased with increasing H2O2 concentration (up to 60 mM), while it remained constant for higher concentrations. Such non-dependency of the synaptic depression on the concentration of the analyte may imply that the H2O2-mediated oxidation of PEDOT:PSS was elicited by a reaction occurring at the surface of the material (Fig. 1d; Fig. S2, ESI†).37 Importantly, the difference in the numerical values of synaptic potentiation and depression was correlated to the aforementioned PEDOT:PSS reduction and oxidation mechanisms. In the former, the oxidation of the neurotransmitter could immediately reduce the polymer, dramatically reducing its conductance. In the latter, the H2O2 in the electrolyte slowly oxidized the PEDOT:PSS, recovering its conductance.
Subsequently, to achieve real time control of the synaptic potentiation/depression, a y-shaped microfluidic module (see Methods) was coupled to the transistor to regulate the DA and H2O2 – gate interface and therefore exploit the flowrate as a control variable. In fact, as previously reported, the flowrate might increase the number of species available for the oxidation, while preventing fouling at the gate electrode.23 While no significant difference in synaptic potentiation was observed for a 30 μM DA solution under static condition and at low flow rates (0.1 ml min−1), high flow rates (0.5 and 1 ml min−1) induced an almost linear increase of the conductance modulation (Fig. 2a). Furthermore, a linear dependence of synaptic depression was observed when employing the H2O2 solution at different flow rates, suggesting that the flowrate increases the amount of solution that actively washed the surface of the polymeric channel (Fig. 2b).
Then, to determine the simultaneous effect of DA and H2O2 solutions on the synaptic modulation, different flow rates conditions were numerically simulated (Fig. 2c–e) and correlated to the measured channel conductance variation (Fig. 2f–h). Notably, different distributions of DA and H2O2 were achieved in the microfluidic channel by changing the ratio between flowrates
. When uDA ≪ uH2O2, (Fig. 2c)
Thus, the DA-mediated synaptic potentiation (synaptic potentiation ΔG = 9.9%, Fig. 2f) was comparable to the potentiation obtained with same concentration of neurotransmitter under static conditions (Fig. 1c). When
, half of the microfluidic channel volume was filled with DA, increasing the synaptic potentiation in comparison to the static case (ΔG = 17.4%, Fig. 2g). Lastly, as uDA ≫ uH2O2, DA was present in the whole microfluidic channel, strongly increasing the synaptic potentiation (ΔG = 28.5%, Fig. 2h).
The flow rate could therefore be selected as a control parameter to regulate the long-term modulation of the ENODe and a closed-loop system including an organic synaptic device was implemented (Fig. 3a).
Here, the completion of a desired task (i.e., hand closure/opening) corresponded to a certain channel current (setpoint), and the competing DA/H2O2 flow rates were closed-loop controlled to reach this setpoint. At each loop iteration, the channel current was measured (IMEAS) and compared to the setpoint (ISET), to quantify the error (e = ISET – IMEAS) of the closed-loop system. This error described how far the actual value of current was from the setpoint. At this stage, a PID controller interpreted the error and computed a control law u, as a linear combination of the error, its integral and derivative over time (Fig. S3, ESI†).
The control law u determined the optimal flowrates of DA and H2O2 to minimize the error e. Finally, a square voltage pulse was applied at the gate terminal (VGS), oxidizing DA (if present in the microfluidic channel), changing the channel conductance and, consequently, the channel current. The current was measured again, determining a new value of IMEAS, and triggering a new iteration. This process was repeated until the channel current reached the setpoint (e < ε).
Notably, the channel current IDS was measured after the application of a pulse (Fig. 3b), not including the transient response of the transistor due to ions injected from the electrolyte to the polymeric channel (rising edge of VGS) and then migrating back to the electrolyte (falling edge of VGS).38
In addition, the value of IMEAS was sent to a microcontroller that determined a control signal as an input to servo motors, driving a 3D-printed robotic hand (Fig. S4, ESI†).
The robotic hand could be closed or opened by adjusting ISET, transducing synaptic potentiation/depression of the ENODe into motor commands: initially, (i.e., before execution of the closed-loop system) the value of IMEAS sent to the microcontroller was encoded as a complete closure of the hand. If, depending on ISET, synaptic potentiation occurred, IMEAS progressively decreased. Such current decrease would be encoded by the microcontroller as the opening command of the robotic hand. Conversely, in case of synaptic depression, the IMEAS increase would be encoded as a closing command of the hand.
Fig. 3c shows a closed-loop system example where two setpoints were arbitrarily chosen (I1 and I2, dashed horizontal lines). Initially, as ISET was lower than IMEAS, the system required synaptic potentiation to complete the regulation task. DA was released inside the electrolyte (Fig. 3c), while square voltage pulses were applied at the gate terminal, oxidizing the neurotransmitter, and decreasing the ENODe conductance by reducing PEDOT:PSS. Concurrently, the prosthetic hand gradually opened as IMEAS decreased, until full opening was achieved (Fig. 3d). Eventually, the measured current IMEAS decreased, and the regulation task was successfully completed, i.e., IMEAS ≅ ISET. Once the value I1 was reached, ISET changed to I2. Here, as the setpoint was higher than the measured current, the system required synaptic depression. The control system responded to such variation by releasing H2O2 (Fig. 3c, blue trace), oxidizing the surface of PEDOT:PSS and increasing the channel current, that reached the setpoints (Video S1, ESI†). The workflow of the software for the control is reported in Supplementary Discussion S2 (ESI†).
Finally, by integrating a pressure sensor in the closed-loop architecture (Fig. 4a), reinforcement learning39 was introduced. The goal of this system was to learn to recognize and grasp objects of different sizes, using reward and punishment signals, as shown in Fig. 4a.
The task presented increased because of the variable object size where diverse degrees of movement were required to adjust and close the hand motors. Initially, the hand was completely open and the current IMEAS determined the closure of the robotic hand. The extended system could sense the environment through the sensor, describing whether the hand was able to grasp the object or not. In case of a failed grasp, a punishment signal in the form of a square voltage pulse was applied at the gate terminal, oxidizing DA present in the electrolyte of the device. The punishment signal caused IMEAS to decrease, leading to a further closure of the hand. Conversely, when the hand was able to grasp the object, a reward action was provided, by biasing the gate terminal at zero voltage level, keeping a stable channel current, and preventing a further closure of the hand.
A large (tennis ball) and small (ping pong ball) objects were used in the reinforcement learning experiment. The pressure was continuously read from the sensor to detect contact with the test object, while the angle of the motors was recorded and used as a hand closure parameter. The channel current (measured after each reward/punishment signal), instead, indicated the learning of the neuromorphic ENODe (Fig. 4b).
First, the tennis ball was employed (Fig. 4b-i), and the robotic hand closed (based on the value of IMEAS) in the attempt of grasping it. The task initially failed (Fig. 4c-i) and consequently a punishment signal was supplied, oxidizing DA, and decreasing IMEAS. As this process was iterated, the oxidation of the neurotransmitter progressively caused the hand to close (increasing motor angle). When the ENODe learnt how to grasp the tennis ball (i.e., when the pressure sensor detected that the hand correctly touched the object), the reward was provided, and the robot stopped moving (constant motor angle). Then, the ping pong ball was introduced (Fig. 4b-ii). Because the closed-loop system was not trained to grasp such a small object (Fig. 4c-ii), the hand failed in executing the assigned task. Therefore, a punishment signal was supplied, eliciting DA oxidation, and progressively closing the hand (motor angle increases). Punishment signals were continuously provided until the hand completely grasped the smaller ball and completed the learning protocol. Further details are reported in Fig. S5 (ESI†).
Last, the first object (tennis ball) was introduced again. Here, considering that the ENODe had already learned how to grasp it before, the object was immediately grabbed correctly (Fig. 4b-iii and 4c-iii and Video S2, ESI†).
:
1 wt/wt with a crosslinker, and cured at 80 °C, 1 h (PDMS, Silgard 184). Finally, through the etching process two symmetrical PEDOT:PSS stripes 7 × 17 mm wide were deposited 2 mm apart. Then the devices were immersed in milliQ water for 1 h to allow for the complete swelling of the PEDOT:PSS prior to further measurements.
Three consecutive measurements were performed, resulting in the sum of 18 applied voltage pulses, and conductance variations were computed and averaged for each device. At least N = 3 devices were measured with this procedure to obtain the results shown in the manuscript.
Footnotes |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3mh02202a |
| ‡ These authors contributed equally. |
| § Current address: Institute of Polymers, Composites and Biomaterials (IPCB) − CNR, Viale J. F. Kennedy 54, Mostra D'Oltremare Pad 20, Naples, 80125 Italy. |
| ¶ Current address: Department of Materials Science and Milano-Bicocca Solar Energy Research Centre (MIB-SOLAR), Via Cozzi 55, I-20125, Milano, Italy. |
| || Current address: The International Clinical Research Center of St. Anne's University Hospital in Brno (FNUSA-ICRC), Pekařská 53, 656 91 Brno-střed, Cechia. |
| This journal is © The Royal Society of Chemistry 2024 |