Cameron S.
Movassaghi‡§
*ab,
Katie A.
Perrotta‡
a,
Maya E.
Curry
c,
Audrey N.
Nashner
a,
Katherine K.
Nguyen
a,
Mila E.
Wesely
de,
Miguel
Alcañiz Fillol
f,
Chong
Liu
a,
Aaron S.
Meyer
g and
Anne M.
Andrews
*abgh
aDepartment of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA. E-mail: aandrews@mednet.ucla.edu; csmova@g.ucla.edu
bCalifornia NanoSystems Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
cInstitute of Society and Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
dDepartment of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA 90095, USA
eDepartment of Psychology, University of California, Los Angeles, Los Angeles, CA 90095, USA
fInteruniversity Research Institute for Molecular Recognition and Technological Development, Universitat Politècnica de València -Universitat de València, Camino de Vera s/n, Valencia, 46022, Spain
gDepartment of Bioengineering, University of California, Los Angeles, Los Angeles, CA 90095, USA
hDepartment of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, and Hatos Center for Neuropharmacology, University of California, Los Angeles, Los Angeles, CA 90095, USA
First published on 10th June 2025
Voltammetry is widely used to detect and quantify oxidizable or reducible species in complex environments. The neurotransmitter serotonin epitomizes an analyte that is challenging to detect in situ due to its low concentrations and the co-existence of similarly structured analytes and interferents. We developed rapid-pulse voltammetry for brain neurotransmitter monitoring due to the high information content elicited from voltage pulses. Generally, the design of voltammetry waveforms remains challenging due to prohibitively large combinatorial search spaces and a lack of design principles. Here, we illustrate how Bayesian optimization can be used to hone searches for optimized rapid pulse waveforms. Our machine-learning-guided workflow (SeroOpt) outperformed random and human-guided waveform designs and is tunable a priori to enable selective analyte detection. We interpreted the black box optimizer and found that the logic of machine-learning-guided waveform design reflected domain knowledge. Our approach is straightforward and generalizable for all single and multi-analyte problems requiring optimized electrochemical waveform solutions. Overall, SeroOpt enables data-driven exploration of the waveform design space and a new paradigm in electroanalytical method development.
A grand challenge in chemical neuroscience is to uncover the functional and dysfunctional interplay between neurotransmitters in the brain.12 Voltammetry is broadly used to characterize and quantify electroactive neurotransmitter release and reuptake using brain-implanted microelectrodes during biological perturbation,13–15 including in humans.6 Recent progress has focused on developing novel electrode materials, coatings, or data analysis procedures to improve the selectivity and sensitivity of real-time neurochemical monitoring in behaving subjects.13,16–23 Meanwhile, voltammetry waveform development (i.e., selecting optimal waveform parameters for detecting particular analytes) has remained essentially unchanged for decades. It relies principally on historic performers (e.g., pre-patterned waveforms), heuristics, and grid searches.24–29
For neurochemistry applications, historic performers include fast-scan cyclic voltammetry (FSCV) triangle or N-shape (i.e., sawtooth) waveforms for detecting evoked dopamine8 or serotonin,30 respectively, in vivo. The N-shape waveform improved serotonin detection over the FSCV waveform by increasing the scan rate to 1000 V s−1 and altering the holding potentials.31 Modifying these waveform parameters impacts sensitivity, selectivity, and temporal resolution.24,32–34 For example, increasing the switching potential from 1.0 V to 1.3 V renews the electrode surface and enhances serotonin detection.24 The development of fast-cyclic square-wave voltammetry has improved the sensitivity and selectivity of dopamine35 and serotonin36 detection by superimposing triangle and N-shape waveforms, respectively, on pre-patterned staircase waveforms. Other waveform modifications have led to fast-scan controlled absorption voltammetry and multiple cyclic square-wave voltammetry to determine basal dopamine37 or serotonin levels.38,39 These approaches required separate waveforms to measure different analytes over different timescales and were derived from the prior triangle and N-shape waveforms in a guess-and-check manner (Fig. 1, top).
We developed rapid pulse voltammetry (RPV) to enable multi-analyte monitoring (e.g., simultaneous serotonin and dopamine detection) across timescales (i.e., quantification of basal and stimulated neurotransmitter levels using the same waveform in the same recording session).40 Rapid pulse voltammetry utilizes background-inclusive (i.e., non-background subtracted) data, requiring novel waveform design to produce informative background currents.41 This custom design is opposed to other popular pulse voltammetry approaches (e.g., normal, differential, staircase), which use pre-patterned approaches to longer waveforms (s to min).42 While also based on characteristic oxidation and reduction potentials derived from the triangle and N-shape waveforms, rapid pulses (i.e., 2 ms), rather than fast linear sweeps, reduce fouling and produce informative faradaic and non-faradaic currents. The resulting current–time fingerprints from our original generation (OG) RPV waveform40 yielded analyte-specific information that can be used by partial least squares regression (PLSR) or other supervised regression models (e.g., artificial neural networks, elastic net) to distinguish analytes and predict their concentrations. Because the OG waveform was inspired by heuristics from the voltammetric electronic tongue (VET) field for ‘soft’ sensing (e.g., intermediate and counter pulses),43–45 we refer to this as VET-inspired design (Fig. 1, middle).
Having shown that our VET-inspired OG waveform outperformed conventional waveforms,40 we sought a generalizable and expandable approach to designing and optimizing rapid pulse (and other types of) waveforms. Because tuning specific waveform parameters improves analyte-specific currents,13,24,46 we hypothesized that enhanced RPV waveforms for serotonin and dopamine co-detection (and many more analytes) exist but remain undiscovered due to the lack of design principles needed to explore intractably large waveform search spaces.
We focused first on detecting serotonin to address this waveform space problem (vide infra). Serotonin is involved in modulating mood, anxiety, and reward-related behavior via interconnecting brain circuits.47–51 Serotonin is an essential gut hormone. It also plays a role in spinal pain transmission and immune function.52–55 Serotonin is a challenging target to detect using voltammetry due to its relatively low physiological concentrations (high pM to low nM),48 colocalization with other neurotransmitters having similar redox profiles (e.g., dopamine, norepinephrine), and irreversible oxidation byproducts56 that can foul electrodes. We further hypothesized that a waveform development paradigm to discover optimized serotonin waveforms would generalize to other neurochemicals, other types of analytes, and their combinations.
When developing RPV or other complex waveforms, a prohibitively large number of waveform step or segment combinations prevents exhaustive empirical investigation, even for a small number of steps or segments. Step potentials, lengths, order, and hold times are all variables for investigation when exploring and improving pulse waveforms; minor modifications of each variable can have complex effects on electrochemical signals due to changes in the surface roughness, fouling propensity, and functionalization (e.g., anionic oxide groups) of carbon fiber microelectrodes.24,32 The use of various electrode materials, carbon allotropes, and polymeric coatings further complicates this landscape.57 While a ‘guess and check’ approach has yielded the handful of useful conventional and VET-inspired waveforms mentioned above, one-parameter-at-a-time or randomized58,59 optimization approaches do not take advantage of the rich information diversity encoded in complex waveforms, leaving the overall waveform search space relatively unexplored.
Recently, Bayesian optimization has been used to navigate intractable physiochemical search spaces when combined with experimental training data.60–65 This adaptive experimental approach presents an opportunity to pair machine learning with electroanalysis to create a new waveform development paradigm (Fig. 1, bottom). Here, we present a Bayesian optimization workflow (SeroOpt) that generates fit-for-purpose voltammetry waveforms for selective serotonin detection. To our knowledge, a systematic machine-learning-based approach to designing, testing, and optimizing analyte-specific waveforms has not yet been reported. We show that analyte-specific waveform information depends on specific potentials occurring in a particular order and timing, confirming the need for a parsimonious search approach across parameter dimensions. Our active learning approach outperformed randomly designed and domain expert-designed waveforms after only a handful of iterations. Our methods can be straightforwardly extended to designing any voltammetry waveform for any electroactive analyte to discover new and perhaps non-intuitive waveforms optimized for application-specific metrics. To encourage widespread adoption, we provide data, tutorial code notebooks, and videos at github.com/csmova/SeroOpt (https://github.com/csmova/SeroOpt), as well as our corresponding open-source voltammetry acquisition and analysis software66 at github.com/csmova/SeroWare (https://github.com/csmova/SeroWare) and github.com/csmova/SeroML (https://github.com/csmova/SeroML).
To initialize a model of the relationship between waveform and objective (i.e., the optimization metric), six waveforms were randomly generated using the constraints above (Fig. 2, step 1). The choice of six waveforms was arbitrary and within the number of waveforms that could be experimentally evaluated in a single-day experiment. We refer to this collection of random initialization waveforms as string 1 (S1).
Set | Sample | DA (nM) | 5-HT (nM) | 5-HIAA (μM) | DOPAC (μM) | Ascorbate (μM) | pH (units) | KCl (mM) | NaCl (mM) |
---|---|---|---|---|---|---|---|---|---|
Training | Blank | 0 | 0 | 0 | 0 | 0 | 7.3 | 3.5 | 147 |
A | 300 | 0 | 6 | 80 | 200 | 7.3 | 3.5 | 147 | |
B | 1000 | 20 | 10 | 70 | 110 | 7.3 | 3.5 | 147 | |
C | 0 | 120 | 6 | 90 | 190 | 7.3 | 3.5 | 147 | |
D | 450 | 350 | 4 | 0 | 130 | 7.3 | 3.5 | 147 | |
E | 600 | 500 | 1 | 10 | 170 | 7.3 | 3.5 | 147 | |
Blank | 0 | 0 | 0 | 0 | 0 | 7.3 | 3.5 | 147 | |
F | 160 | 250 | 2 | 20 | 180 | 7.3 | 3.5 | 147 | |
G | 700 | 300 | 0 | 0 | 100 | 7.3 | 3.5 | 147 | |
H | 80 | 160 | 10 | 60 | 100 | 7.3 | 3.5 | 147 | |
I | 20 | 60 | 0 | 50 | 160 | 7.3 | 3.5 | 147 | |
J | 40 | 40 | 2 | 100 | 120 | 7.3 | 3.5 | 147 | |
Blank | 0 | 0 | 0 | 0 | 0 | 7.3 | 3.5 | 147 | |
K | 800 | 10 | 8 | 30 | 150 | 7.3 | 3.5 | 147 | |
L | 500 | 0 | 0 | 0 | 100 | 7.3 | 3.5 | 147 | |
M | 0 | 250 | 0 | 0 | 100 | 7.3 | 3.5 | 147 | |
N | 0 | 0 | 10 | 0 | 100 | 7.3 | 3.5 | 147 | |
O | 0 | 0 | 0 | 50 | 100 | 7.3 | 3.5 | 147 | |
P | 0 | 0 | 0 | 0 | 100 | 7.3 | 3.5 | 147 | |
Blank | 0 | 0 | 0 | 0 | 0 | 7.3 | 3.5 | 147 | |
Test | T1 | 750 | 50 | 1 | 85 | 200 | 7.3 | 3.5 | 147 |
T2 | 100 | 400 | 5 | 9 | 200 | 7.3 | 3.5 | 147 | |
T3 | 400 | 200 | 5 | 85 | 190 | 7.3 | 3.5 | 147 | |
T4 | 70 | 30 | 5 | 35 | 200 | 7.3 | 3.5 | 147 | |
Blank | 0 | 0 | 0 | 0 | 0 | 7.3 | 3.5 | 147 | |
Challenge (pH) | T1 pH | 750 | 50 | 1 | 85 | 200 | 7.1 | 3.5 | 147 |
Blank pH | 0 | 0 | 0 | 0 | 0 | 7.1 | 3.5 | 147 | |
T2 pH | 100 | 400 | 5 | 9 | 200 | 7.2 | 3.5 | 147 | |
Blank pH | 0 | 0 | 0 | 0 | 0 | 7.2 | 3.5 | 147 | |
Challenge (a.c.) | T3 a.c. | 400 | 200 | 5 | 85 | 190 | 7.3 | 120 | 31 |
Blank a.c. | 0 | 0 | 0 | 0 | 0 | 7.3 | 120 | 31 |
The PLSR model predicted the test and challenge set sample concentrations of serotonin and dopamine (Fig. 2, steps 2 and 3; see Methods for definitions of training, test, and challenge samples). These predictions were used to calculate the eight optimization metrics listed (Fig. 2, step 4; defined in Table S1†). All metrics were calculated on all waveforms in each string, unless otherwise noted (Fig. 2, steps 2–4). We focus on the results for the second waveform (W2) of each string, which is optimized across strings for the serotonin test set prediction accuracy metric. The latter is the mean absolute error in the PLSR model predictions of test samples T1–4 (including a blank; Table S1†), thus creating a minimization task (maximum accuracy implies minimum error). We chose mean absolute error rather than relative error due to the presence of the blank (true null concentration).
The choice of test set accuracy as an optimization metric was motivated by several factors. First, we pursued single-objective optimization for simplicity and (at the time of analysis) a lack of user-friendly open-source software to perform multi-objective human-in-the-loop optimization. Having to choose only a single metric to focus on, test set accuracy is an attractive choice as it is a direct measure of waveform performance, instead of alternatives, such as PLSR model-specific metrics (e.g., scores clustering). Using model-specific metrics is less physically meaningful and would limit the extendibility of our method. Using physically meaningful parameters, such as test set accuracy, our workflow remains model-agnostic (i.e., any model that performs supervised regression prediction can be used). For similar reasons of retaining metrics in raw form, we chose not to combine multiple metrics into a single objective task (e.g., scalarization69).
Second, we encoded selectivity in our test and challenge set design. Our calibration curve varies the concentrations of all analytes and interferents across the training, test, and challenge sets used to build and evaluate the PLSR models (Table 1). If the PLSR model for a given waveform confuses any interferent for serotonin, this will be represented in the test or challenge set accuracy metric for serotonin and will contribute to the mean absolute error. Thus, serotonin test and challenge set accuracy is a proxy for selectivity in varying dopamine, 5-hydroxyindoleacetic acid (5-HIAA), ascorbate, 3,4-dihydroxyphenylacetic acid (DOPAC), pH, and K+/Na+ concentrations (see Methods).
Lastly, other analytical figures of merit that could be used as optimization metrics (sensitivity, limit of detection (LOD), linear range, etc.) are irrelevant if model accuracy and selectivity are not first established. For example, we included LOD as an alternative optimization metric (Fig. 2). The selectivity performance of LOD-optimized waveforms (inferred via test and challenge set accuracy) was poor. Thus, we did not continue to optimize for LOD in subsequent campaigns but were still able to utilize these waveforms as training data by calculating their other metrics. For these reasons, we focused on test set accuracy. Specifically, we focused on serotonin (5-HT) because it is historically more difficult to detect by voltammetry than dopamine. Serotonin concentrations are approximately 10-fold lower than dopamine in striatum,48 and serotonin has complex redox mechanisms and fouling processes.30
Regardless, we included other optimization metrics in our workflow rather than solely serotonin test set accuracy to explore which metrics have an objective landscape that is ‘optimizable’. As this was a first attempt, we had no guarantee that the serotonin test set accuracy was a viable choice of metric. We also wanted to investigate other analytes and metrics for future use with multi-objective optimization. For example, we included dopamine-specific metrics in the scheme for comparison with our original RPV work40 because serotonin/dopamine co-detection is a long-term goal for multi-objective optimization.70
To maximize the training data produced in an experimental day, we calculated the performance of all waveforms on all metrics in each string, regardless of which metric a waveform was designed to optimize. For example, the optimal serotonin test set accuracy waveform (W2) in each string was used to calculate the serotonin test set accuracy metric. Still, the performance of this waveform on the dopamine, pH, and altered cation (a.c.) accuracy metrics was also recorded. This approach allows additional waveforms (albeit waveforms not optimized specifically for that metric) to be tested per string rather than solely the one ‘optimized’ waveform for each metric. Performing single objective optimization in this parallel manner explores ‘optimizable’ metrics while obtaining additional training data per string in a simple yet sample-efficient manner. For example, if test set accuracy failed as an optimizable metric for serotonin, we could pivot to an alternative metric exhibiting promising optimization progress (e.g., serotonin pH or a.c. accuracy, or serotonin LOD), with training data already aggregated across all waveforms for that metric.
The eight waveforms (each corresponding to optimization for one of the eight metrics) output from the first optimization loop of this workflow are shown as string 2 (S2). Eight new waveforms were generated, with each new waveform optimized on a single metric (i.e., using the training data generated from S1 (Fig. 2, steps 4–6)). Because S1 was randomly generated to initialize the surrogate model, S2 represented the first iteration of optimized waveforms produced by the workflow.
We repeated the optimization loop by obtaining experimental calibration curve data using each new S2 waveform. We then calculated the individual optimization metrics, aggregated the data with the previous string(s) (e.g., all S3 waveforms were predicted using all S1 and S2 data, one metric at a time), and predicted the next set of optimal S3 waveforms for each metric (Fig. 2, steps 7 and 8). This process was repeated again to generate four waveform strings in total (Fig. 2, step 9). We refer to the group of strings as S1–4. Each string had eight waveforms (W1–8) corresponding to the eight separate metrics, except the initial string (S1), which had only six randomly generated waveforms (arbitrary). All four strings and their associated waveforms were collectively referred to as run 1 (R1).
Even though R1S4W2 was only 5.5 ms long, it outperformed the OG waveform, which was 8 ms. Given the similarity in pulse potentials, the increase in data fidelity was attributed partly to changes in the hold times of each step; that is, Bayesian optimization was able to generate better-performing choices of τ. While a 2.5 ms difference in overall pulse length was ostensibly negligible at data rates of 1 MHz, this equates to a reduction of 2500 data points per scan. This reduction can easily save gigabytes of data that otherwise would need to be stored, and save computation time wasted during multi-hour experiments. Decreasing the overall length of the rapid pulse sequence also opens opportunities to increase the temporal resolution to >10 Hz or design more complex combinations of pulses with additional steps, while retaining 10 Hz sampling.
We do not attribute the success of the optimized waveform to chance, as the convergence plot (Fig. 3b) shows that for each optimization string (S2–S4), the waveform optimized for serotonin test set accuracy (W2) found a new minimum for serotonin prediction error during each iteration. This improvement across strings suggests that the surrogate model is learning a reasonable representation of the optimization landscape for serotonin accuracy. Convergence plots for all metrics and runs are provided (Fig. S3†).
While sample T2 for R1S4W2 still had a mean absolute error of ∼50 nM (13% error, 2.8% coefficient of variation (CV)), predictions were improved compared to the OG waveform (22% error, 3.4% CV). Continuing the optimization campaign for additional iterations might have minimized the remaining error further. However, the T2 samples had lower DOPAC and higher 5-HIAA concentrations than other test samples. These similarly structured interferents may have had confounding effects on the serotonin concentration predictions. Moreover, these samples may have analyte instability due to degradation or surface adsorption to sample vials.
Regardless, Fig. 3a represents a single trial of the waveform on a single electrode, performed during the optimization campaign. Meanwhile, Fig. 3c–e represents a reproducibility study, performed across three total trials using two separate electrodes. These panels demonstrate a more dramatic improvement in the accuracy and precision of R1S4W2 compared to the OG waveform. For example, across these three runs, sample T2 had 0.7% error and 14% CV. Meanwhile, T2 for the OG had 34% error and 42% CV. Given that the microelectrodes were hand-made, different electrodes were used across strings, and dynamic surface changes occur at electrode surfaces, variability in concentration predictions is expected. Nonetheless, compared to the OG waveform, SeroOpt produced a more precise and accurate waveform that generalized across electrodes and replicates.
While the OG waveform confounded changes in pH and Na+/K+ in the challenge set, the R1S4W2 waveform did not suffer similar pitfalls (see samples T2 pH 7.2, T3 a.c., blank a.c. for each waveform in Fig. 3a). We discuss the performance of test and challenge set samples further in Fig. S4a and b.† This result was not due to the waveform not sensing a change in current for varying cation concentrations or being ‘electrochemically silent’.72 Increases in current (hundreds of nA) were evident when aCSF a.c. blanks were injected compared to normal aCSF blanks (Fig. S4c†). Similar responses were noted for pH blanks.
To investigate whether the initial results for R1S4W2 outperforming the OG waveform were precise and robust, the waveforms and training/test/challenge sets were run in triplicate using two different electrodes (Fig. 3c–e). We determined that the R1S4W2 waveform increased prediction accuracy for test samples 1–4 by ∼20% compared to the OG waveform. We found that the agnostic behavior towards pH was reproducible for R1S4W2 and not the OG waveforms. However, we noticed that the T3 a.c. challenge sample accuracy was not reproducible across electrodes for either waveform. We attribute this to variations in electrode fabrication. Standardizing the fabrication of fast voltammetry electrodes, along with multi-objective optimization with reproducibility as a metric, will help to alleviate this issue. Regardless, the performance of R1S4W2 as an early optimization candidate, showing enhanced test and challenge set accuracy, demonstrates the success and future promise of the SeroOpt workflow.
In all cases, except for the first run of pH and a.c. challenge samples, the average serotonin test/challenge set errors were lower when using the optimized serotonin waveforms (W2, 4, 6, 8 for S2, 3, 4 of R1 and R2), compared to the averages for the randomly generated S1 waveforms of R1 and R2 (Fig. 4). The error minima were lower in all cases for the optimized waveforms; random search never produced a better waveform than Bayesian optimization. Moreover, while each W2 waveform in R1 improved across strings, R2S2W2 immediately found a 5-fold lower minimum than the starting initialization. Thus, new random initialization waveforms lead to the discovery of new optimized waveforms in new local minima.
These results suggest the following. Bayesian optimization produces better waveforms than randomly generated or chemist-enabled waveforms. Moreover, Bayesian optimization finds waveforms corresponding to error minima better than random chance. The Bayesian optimization surrogate model (i.e., Gaussian process) effectively models the relationship between voltammetry waveforms and performance, as the minima only occurred for waveforms optimized specifically for serotonin detection metrics (e.g., W2, 4, 6; Table S3†). For example, the average serotonin accuracy was ∼45 nM using the randomly generated waveforms. By optimizing for any serotonin parameter (test set accuracy, a.c. accuracy, pH accuracy, detection limit), serotonin accuracy, on average, was improved to 34 nM (24% improvement). While an ostensibly small return on investment, this is only the first iteration of this protocol, and the results consistently outperformed the few standard alternatives to waveform design.
Data for all waveforms and metrics are provided (Tables S2 and S3†). We noticed that for serotonin accuracy (W2), the predicted waveforms between R1 and R2 looked similar, especially for S3 and S4 (Fig. 5, inset). The serotonin accuracy waveforms share characteristics with the OG waveform across R1 and R2. They exhibit low to high potential steps for the oxidative potential steps, and high to low potential steps for the reductive potential steps. By S4, all waveforms prefer the ‘intermediate’ anodic pulse step concept described in the VET literature, in which a relatively low amplitude E1 step before a higher amplitude E2 step prevents signal saturation and enhances concentration discrimination.44 Further, most waveforms exhibited a large amplitude counter-pulse (e.g., a large difference between E2 and E3 to complete the redox cycle).67 The fact that the model learned these domain knowledge heuristics across the four iterations suggests it can also learn more complex, higher-order interactions.
Waveform optimizations occurred with relatively small changes in E and τ, even for waveforms as simple as four steps, as shown here. Tuning waveforms can result in dramatic improvements in the predictive performance differences of the resulting models. The effect of varying and reorganizing pulse parameters is relatively unexplored in a systematic, multi-variate manner, as done here. For example, R1S4W6 and R1S3W8 differed by ≤0.04 V and ≤0.9 ms in E and τ (Table S2†). Yet, R1S3W8 outperformed R1S4W6 for serotonin test set, pH, and ion accuracy, with up to nearly a 50% reduction in error (Table S3†).
To test whether these performance increases were due to differences in electrodes across strings (separate electrodes were used across strings to encourage generalizability across electrodes), we compared two similar waveforms tested on the same electrode: R2S1W2 and R2S1W3. These waveforms differed by ≤0.21 V and ≤1.2 ms, yet R2S1W2 outperformed R2S1W3 in all serotonin metrics (Table S2 and S3†). Thus, small and seemingly “insignificant” changes in step potentials and hold times can produce significant accuracy differences. These findings support the importance of a technique like Bayesian optimization to tune parameters with fine-grained adjustments.
The order of the steps in the rapid pulse also matters. For example, R1S1W1 and R1S4W3 are nearly identical, except for the order of their pulses. Yet, R1S1W1 outperformed R1S4W3 in all serotonin detection metrics up to five-fold (Tables S2 and S3†).
The PDPs for the aggregated runs (R1 and R2 combined) and the individual runs are shown for the serotonin test set accuracy metric (Fig. 6a, S5 and S6,† respectively). We focus on the aggregated models because these have more total samples and, thus, are more likely to uncover meaningful relationships. The 2D plots on the diagonal represent the average effect of a metric while varying that parameter. Generally, the more a PDP plot for a particular feature varies, the more important that feature is. Conversely, flat lines indicate either unimportant or interacting features.
The aggregated data PDPs (Fig. 6a) confirm a complex and interacting optimization landscape. For example, E3 oscillates, E4 is parabolic, and E1 and τ1 are monotonically decreasing or increasing, respectively. The 3D contour plots below the diagonal represent the average effects on each metric while varying two waveform parameters. Because we minimize error, the purple shading represents the optimal (minima) regions, while the yellow regions represent maxima.
Interpreting the PDPs has some weaknesses. First, PDPs represent averages, meaning heterogenous interactions can be obfuscated (e.g., an effect on one-half of the data may be averaged out by an opposite effect on the other half). Thus, non-varying parameters in PDPs could be misinterpreted. To confirm this, we examined individual conditional expectation (ICE) plots. The ICE plots show the individual contributions that make up the averages in the PDP plots.73 Thus, the 2D PDPs (blue lines, Fig. 6a) have matching structures with the average ICE plots (blue lines, Fig. 6b). The individual instances (gray lines, Fig. 6b) show that there are heterogeneous effects hidden by the PDP averages for some parameters. For example, τ1, E3, and E4 have traces that do not all follow the same general trends. Thus, varying these parameters depends on heterogeneous interactions between the other waveform parameters. Meanwhile, the remaining parameters, E1, E2, τ2, τ3, and τ4, follow the same general trends (flat lines suggesting non-interacting waveform parameters).
As an alternative to PDP and ICE plots, we used Shapley additive explanations (SHAP) plots.73 The SHAP values enable interpretations of how features contribute to individual model predictions. The SHAP plots confirmed that the essential features were E3, E4, τ1, and E1. Fig. 6c shows the spread of the SHAP value per feature. Further, the heterogeneous effects, particularly in E3 and E4, are confirmed by the different colors of the feature values that do not cluster on a single side.
Other approaches can be used to design waveforms (e.g., first principles, chemometric screening, design of experiments). However, these approaches suffer from limited computational complexity, an exponential number of experiments required to optimize individual parameters, resource intensity (labor, time, materials, etc.), and an inability to account for confounding waveform parameter interactions.91 Our attempts to use feature selection to identify critical waveform step potentials and lengths were complicated by the magnitude of the current response and the pulse pattern (Fig. S7†). The difficulty in designing electrochemical waveforms arises partly because each pulse (voltage and step length) influences the state of the interface between the solution and the working electrode. This interface evolves during and between pulses. The effect of an individual pulse depends not only on its characteristics (E and τ) but also on preceding pulses.
We introduced an experimental design framework to embed voltammetry waveforms and their corresponding electroanalytical performance into a Bayesian optimization workflow to overcome these limitations. Rather than optimizing for a particular electrochemical response (e.g., peak oxidative current of a single analyte), the accuracy of the supervised regression models was optimized directly by including model accuracy metrics as the objectives. We explored which model metrics were optimizable by simultaneously performing parallel single-objective optimization loops across eight metrics (Fig. 2). We found that serotonin test set accuracy optimization was sample-efficient, reproducible, and outperformed domain-guided and randomly designed waveforms across multiple metrics (Fig. 3).
We demonstrated that in two separately initialized optimization campaigns, consisting of four strings or ‘rounds’ of optimization, we generated waveforms selective for serotonin in the presence of interferents (Fig. 4). Previous applications of Bayesian optimization in other fields achieved improvements in as few as three or four string-like iterations (i.e., low data regimes). Thus, the behavior we observed was anticipated.76,82,92,93 Notably, our selectivity challenges were more arduous yet efficient than standard waveform validation schemes that test only a single interferent or interferent concentration after a waveform is developed for an analyte of interest.
Future efforts could include more lengthy optimization campaigns. In the present work, our stop criteria were somewhat arbitrary; we empirically noticed improvements in predictive accuracy by string 4, and other studies have found improvements in <5 iterations. Thus, we stopped after four strings to analyze the results. Based on the convergence plots, we identified that waveform accuracy metrics were unlikely to improve once they reached <10 nM error, even if the waveform was found early in the campaign (e.g., within the first ten waveforms; Fig. 4e and S3†). This suggests a possible signal-to-noise limit in the single-digit nanomolar range, consistent with previous voltammetry methods.24 Thus, a campaign should be stopped early if the metric reaches known or reasonable instrument detection limitations. Further, only one metric (dopamine pH robustness, run 2) failed to improve after any iterations (Fig. S3,† bottom). Thus, in our hands, ∼30 waveforms (the total number of waveforms tested across four strings, per run) indicated whether the waveform would improve. The campaign may be halted if a metric fails to improve after 30 waveforms.
Selectivity is a significant barrier to effective waveform design, especially for background-inclusive and multi-analyte waveforms. Most voltammetry approaches achieve selectivity by either training a machine-learning model, modifying a waveform, or changing the electrode material. Rather than independently adopting one of the latter approaches, our data-driven waveform design uses the predictive performance of a machine learning model as feedback to modify waveform parameters – the black box model decides what waveform would generate more accurate PLSR predictions.
In addition to 5-HIAA, DOPAC, and ascorbate, monovalent cation concentrations (i.e., Na+, K+, H+) fluctuate in the brain extracellular space with neural stimulation due to the biophysics of membrane polarization and repolarization, transporter dynamics, and elevated O2 consumption (and CO2/carbonic acid/H+ production) associated with synchronized action potentials.94 Thus, these species represent key interferents to test in the presence of analytes, as electrodes will likely encounter changes in cation concentrations under real-world (in vivo) conditions.
The literature suggests that specific voltage pulses can deconvolute monoamine neurotransmitter responses from cation changes.95–97 Thus, we hypothesized our search space would contain cation and interferent agnostic waveforms. We expected to find waveforms whose voltammograms, modeled in low-dimensional space by PLSR, are selective for features specific only to the analytes of interest (dopamine and serotonin) and not those affected by interferents. Training across such interferents is unnecessary if a waveform-model combination can ignore cation interferent effects (i.e., is cation agnostic). Thus, we implicitly built the search for agnostic waveforms into our Bayesian workflow by introducing the concept of a challenge set.
Challenge set samples illustrated that SeroOpt can identify implicitly (i.e., requiring no explicit training samples) interferent agnostic waveforms (Fig. 3a). While the literature has demonstrated cationic interferent agnostic waveforms,72,95–97 our approach required no manual or additional data processing, and instead automatically acquired agnostic waveforms. Combining the information content of an optimized waveform with a powerful machine learning model (PLSR) enabled this agnostic behavior.
Because step potential,44,67 step order,43 and hold time98 or hold potential96 can impact waveform performance, other pulse techniques that layer steps at constant potentials and times could maximize their performance by tuning these parameters similarly to the manner presented here.45 Adding more pulses could deteriorate model performance, as useless steps add noise to the data.45 Thus, careful selection of the number of steps is paramount. We confirmed this by noting performance differences across waveforms with only slight parameter differences. We attribute this behavior to the unique faradaic and non-faradaic processes occurring at sub-ms timescales.72,95,97,99
Optimization of individual pulse step lengths results in different transient redox responses from the preceding pulses becoming the starting state for the succeeding pulses, as opposed to letting the current decay to steady-state. A non-steady-state approach has been shown to discriminate compounds more efficiently using VETs. Yet, a lack of methods for optimizing individual step lengths has prevented the broad adoption of this practice. Differentiating dopamine from norepinephrine has been accomplished using pulses with differences as small as 0.1 V, though without systematic design patterns.100
Potential mechanisms underlying interferent agnostic waveforms include diffusion layer depletion of the interfering species by the onset pulse (E1/τ1),101 and other differentiating information provided by unique pulse sequences and transient responses of the rapid pulses to the model.95,97,98 More optimization campaigns, interpretability techniques, and numerical simulation of species at electrode surfaces could uncover the phenomena at play.
Regardless, the finding that interferent agnostic waveforms can be identified and optimized, especially when forgoing background subtraction, shows the utility of historically categorized “nonspecific” capacitive currents. These findings show that analyte-specific information from appropriately designed waveforms occurs in the background current. This information is captured by our model without explicit training, even in the presence of interferents that affect the double layer. Previous reports have shown that pH and Na+/K+ fluxes can cause hundreds to thousands of nM prediction errors in vitro.95,102 For the same fluxes, our waveform-model combinations show only tens of nM error or less, and do not require explicit training, specialized waveform augmentation, or data analysis.
We noticed that across runs and interpretability methods, E1 or τ1 (onset pulse/time), E2 and E3 (pulse/counter pulse67), and E4 (holding potential) were repeatedly ranked as the most critical features for the surrogate models of serotonin test set accuracy. These parameters represent four known heuristics: τ1 and E1 (onset time/intermediate potential; useful for selectivity and diffusion layer depletion),101 counter pulse potential (E3, useful for analyte confirmation),67 and holding potential (E4, useful for analyte accumulation, sensitivity, and reduced serotonin fouling).32 The E3 parameter completes the redox cycle of the analytes, as it is the first cathodic step after a series of anodic steps. While the relationship of E3 with other parameters is complex and affected by their choices, in general, moderate, sequential reductive steps (e.g., E3 ∼ −0.2 V) are optimal. Previous work found that a −0.1 V cathodic limit, as opposed to −0.4 V, was optimal for serotonin detection by limiting analyte polymerization, which resulted in electrode fouling.24 As mentioned for E1, an intermediate voltage of E3 may also act as a more selective step for serotonin reduction amidst its possible interferents, or have beneficial effects on the diffusion layer environment relevant to the proceeding E4 step.
Based on these results, future waveform optimization studies should include as comprehensive training sets of interfering analytes as possible, as done here, and should not use one-factor-at-a-time optimization, which is currently the most common approach. The setting of one parameter influences the optimal settings for the remaining parameters (Fig. 6). An interesting area of future exploration would be to determine whether these effects generalize to waveforms with greater than four steps, i.e., if the first cathodic step remains the key step to optimize for a 6, 8, or 10-step or greater waveforms. Further meta-analyses of these behaviors will provide essential insights into unexpected electrochemical optimization design patterns.
Small amplitude onset pulses have been shown to improve the deconvolution and differentiation of ions such as H+,97 Na+, and K+,95 along with small amplitude onset sweeps for drift and pH.72,103 Again, carefully designed waveform tuning can result in explicit and implicit interferent-agnostic waveforms. Other waveform parameters deemed unimportant in this study might be associated with the imposed constraints affecting the full exploration of parameter space or our relatively small sample size. Further, the interpretability methods are also estimates of the surrogate model, which is an estimate. Thus, our interpretations must be taken lightly as correlations, not causation.
The SeroOpt paradigm is immediately extendable to more than four steps (eight parameters) to create more complex waveforms. Future research into other optimization metrics, supervised regression and surrogate models/kernels, and additional analytes is underway.104,105 For example, pulses have been shown to differentiate norepinephrine from dopamine.100
We note the extendibility of our waveform embedding approach. This embedding can be used for any waveform type, such as sweeps, where the parameter values represent the slope (scan rate) of each segment, along with parameters for start and stop potentials. Pulse and sweep designs can also be combined.101 Similar approaches could also extend to embedding AC voltammetry parameters (e.g., amplitude, phase).106 Thus, rather than starting from a historic performer and exploring new waveforms one factor at a time, entirely new waveforms can be discovered de novo.
Our approach will accelerate waveform development for new single- and multi-analyte panels in environments that hinder selectivity or other difficult-to-optimize metrics. Further exploration of waveforms with agnostic behavior and for multi-analyte co-detection is underway. Applications of Bayesian optimization or alternative machine-learning guided workflows to electrochemical reaction studies and battery technology development have delivered robotics and other automated instrumentation solutions. An area of future work could be to develop an automated flow cell/waveform optimization pipeline to fully ‘close the loop’.65,107,108 To aid other investigators in this pursuit, we provide data, tutorial code notebooks, and videos at github.com/csmova/SeroOpt (https://github.com/csmova/SeroOpt), as well as our corresponding open-source voltammetry acquisition and analysis software66 at github.com/csmova/SeroWare (https://github.com/csmova/SeroWare) and github.com/csmova/SeroML (https://github.com/csmova/SeroML).
To our knowledge, we report the first application of active learning to electrochemical waveform design. Our study represents one of the largest-scale investigations of neurochemical detection waveforms. Using a data-driven approach, we generated a waveform for serotonin detection that outperformed our expert-designed waveform and randomly generated waveforms across various metrics. We demonstrated the ability to search for interferent-agnostic waveforms using a priori design of ‘challenge’ samples. We attributed the success of SeroOpt to the efficient fine-grained tuning of voltage and temporal waveform parameters by Bayesian optimization, each having complex interaction effects. Lastly, we interpreted our model with three separate techniques to confirm our model was learning a representation of the waveform optimization landscape that aligned with heuristics and domain knowledge.
Electrode tips were cleaned with HPLC-grade isopropanol (Sigma Aldrich #34863) for 10 min. Electrodes were then overoxidized by applying a static 1.4 V potential for 20 min.110 Low-density EDOT:Nafion solution was made by first preparing a 40 mM EDOT (3,4-ethylenedioxythiophene; Sigma Aldrich, St. Louis, MO; 483028) stock; 100 μL of this stock was added to 200 μL of Nafion (Ion Power, Inc., Tyrone, PA; LQ-1105) and diluted with 20 mL of acetonitrile.16 A triangle waveform (1.5 V to −0.8 V to 1.5 V) was applied using a CHI Instruments Electrochemical Analyzer 15× at 100 mV s−1 to generate a PEDOT:Nafion coating on each electrode.
![]() | ||
Fig. 7 Workflow for parallel Bayesian optimization of voltammetric waveforms with intrinsic interferent selectivity. |
Standard concentrations were selected using a fractional factorial box design (Table 1). This is a chemometric approach that designs a multi-dimensional ‘box’ spanning analytes, their concentrations, and experimental conditions of interest.91,111 We selected a fractional approach to bias towards low analyte concentrations and small relative changes. High accuracy and precision in the nM range are important for monitoring basal and stimulated neurotransmitter levels using a single technique.
The fractional approach avoids a full factorial design, which would require orders of magnitude (and prohibitively) more calibration samples. In contrast, traditional calibration sets are information-poor and can lead to spurious correlations when training a multiplexed method with overlapping signals from analytes and interferents.91 The training and test sets effectively spanned the concentrations and combinations of analytes of interest without correlation (Fig. S8†). Ascorbate was included in all samples (except blanks) for antioxidant properties. The concentrations of dopamine, serotonin, 5-HIAA, DOPAC, and ascorbate were altered over physiologically relevant changes in concentration throughout so the model could be trained and tested across all analytes.
Solutions of aCSF were purged with nitrogen for at least ten minutes before sample preparation. All training and test samples were prepared from stocks stored at −80 °C on the day of experiments. All solutions were adjusted to the corresponding pH each day prior to aliquoting. All solutions were kept covered from light and on ice during the experiments.
We define a training set (i.e., calibration set) as known concentration analyte mixtures, i.e., “standards”, used to train a PLSR model. A test set is defined as known concentration analyte mixtures that were not used during training but instead held out and used to measure model performance. Test set samples only include samples with conditions occurring in the training set (i.e., the same buffer conditions). We define “challenge” samples as additional test set samples prepared under conditions not included or varied in the training set, such as varied pH and cationic buffer salt concentrations (Table 1; see Data analysis). We define an injection blank or zero (0) as an injected solution containing only aCSF.
Training, test, and challenge sets were injected (∼1 mL into a 500-μL loop) into a flow cell using a six-port valve (Fig. 7). The valve was switched to the inject position for ∼20 s per injection. The time between injections was ≥200 s, depending on the waveform and time for the current to return to baseline. Samples were injected in a pseudo-randomized but consistent order. Within each string, the waveform calibration curves were completed across consecutive days. All waveforms within a string were acquired with the same electrode. A different electrode was used for each string to ensure the robustness of the waveform optimization. All waveforms were conditioned for ≥10 min in aCSF before acquiring data.
In-house software was developed for RPV as described in a previous publication.40 The software has since been published and named SeroWare, and is described elsewhere.66
The acquisition function (expected improvement) was then minimized using the ‘ask’ interface to generate a vectorized waveform to be experimentally queried. Kernel hyperparameters (i.e., length scale, smoothness) and the acquisition function were optimized automatically by the limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm in the software package. The acquisition function returned a vectorized waveform that was then created in SeroWare format for data acquisition. After experimental results were obtained with the predicted waveform. The metrics of all previous waveforms were aggregated with the newest metrics. The Bayesian optimizer was updated using the ‘tell’ interface to set new query points using the ‘ask’ interface.
In this work, increments of voltage were rounded to the nearest 0.001 V, and increments of time were rounded to the nearest 0.1 ms. Built-in partial dependence functions to scikit-learn and scikit-optimize were used to interpret the model, along with the SHAP Python package.
We found this process increased the accuracy and precision of the PLSR predictions. It was generalizable to test set samples. We attribute this to a low-dimensional representation of drift learned by the model (Fig. S9†). All concentration predictions were constrained to be ≥0 (i.e., domain knowledge dictates concentrations cannot be negative). Negative concentration predictions were replaced with 0.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5dd00005j |
‡ These authors contributed equally to this work. |
§ Present address: Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048. |
This journal is © The Royal Society of Chemistry 2025 |