Open Access Article
Xiao Huang
a,
William F. DeGrado
b,
Michael J. Therien
a and
David N. Beratan
*acd
aDepartment of Chemistry, Duke University, Durham, NC 27708, USA. E-mail: david.beratan@duke.edu
bDepartment of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143, USA
cDepartment of Biochemistry, Duke University, Durham, NC 27710, USA
dDepartment of Physics, Duke University, Durham, NC 27708, USA
First published on 22nd April 2026
Electron bifurcation reactions separate electron pairs into high and low potential pools, and these reactions are central to the bioenergetics of living systems. Here, we used kinetic analysis and machine learning to analyze a diverse set of structural and electrochemical landscapes that may guide the design of molecular architectures that could serve as experimental targets that would function to bifurcate holes using light. We find that strong electrostatic repulsion between the holes enhances the quantum yield for bifurcation but reduces the energy efficiency of the process. We find that the quantum yield for hole bifurcation is enhanced by positioning the hot-hole pathway cofactor farther from the hole bifurcation site than its cold-hole pathway counterpart. This integrated design and optimization approach provides design strategies for de novo structures that could realize light-drive hole bifurcation, advancing the aim of employing bioinspired electron bifurcation for energy conversion, photocatalysis, and electrocatalysis. Beyond the specific light-driven hole-bifurcation architecture, our combined kinetic–statistical–machine-learning approach is transferable to other multi-particle, multi-site charge-transport network design challenges, opening paths for designing photochemical and catalytic networks, as well as for designing functional redox networks.
Realizing electron or hole bifurcation in a de novo protein framework5–8 represents an unrealized challenge, although some basic design principles are emerging.9,10 EB delivers electrons into redox pools at different potentials and, as such, is an open-system reaction. A challenge faced by the EB networks is to avoid electron leakage (or short circuiting) between the low and high potential pathways. Short circuiting avoidance makes the design and synthesis of electron bifurcating networks very challenging.3,4,9,11 In our earlier analysis of light-driven hole bifurcation (HB),10 we described architectures (with specific electrochemical gradients and inter-cofactor distances) that could support light-driven HB. Our earlier photoinduced HB design relied on having two nearby tryptophan residues that are oxidized sequentially by photoactivated (ruthenium) chromophores bound to the de novo protein's surface. Rapid sequential Trp oxidation would produce electrostatic repulsion between two nearby radical cation states, energizing the system. As each hole migrates along a spatially separated pathway, the first and second holes would depart the bifurcation site at two different potentials. The first hole to depart is expected to be more strongly oxidizing, or “hot” due to hole–hole repulsion, while the second hole to leave will be less oxidizing, or “cold”. These holes are directed onto spatially separated transport pathways that lead to terminal sites where they could carry out further redox chemistry. The design is shown schematically in Fig. 1 and 2.
![]() | ||
| Fig. 1 Schematic representation of a light-driven de novo hole bifurcating system. Two flashes, in rapid succession, oxidize nearby Trp residues using photo-generated Ru(III) species attached to the protein (via a flash-quench scheme).14 The photogenerated holes hop on “hot” and “cold” hole transport pathways. The holes arrive at the termini of the pathways. The redox cycle is completed when the initial electron quencher re-reduces these terminal species.14 This figure is reproduced from ref. 10, available under a CC-BY 4.0 license, and is copyrighted by X. Huang, P. Zhang, J. L. Yuly, W. F. DeGrado, M. J. Therien, and D. N. Beratan. | ||
![]() | ||
| Fig. 2 An illustrative molecular model of a designed de novo hole bifurcating protein. Two ruthenium flash-quench chromophores (FQC) are covalently bound on the protein surface nearby two Trp (W1 and W2) residues. Following W1 and W2 photooxidation, hole transfer proceeds along two spatially separated pathways: the hot-hole path (W1 → WH1 → WH2 → PMn(CF3)4, shown in the red-arrow pathway) and the cold-hole path (W2 → WL1 → WL2 → Tyr˙-HisH+, in the blue-arrow pathway). The two hole transfer routes terminate on acceptors tuned to different reduction potentials. This design demonstrates how light can initiate bifurcation in a synthetic, protein-based construct. This figure is reproduced from ref. 10, available under a CC-BY 4.0 license, and is copyrighted by X. Huang, P. Zhang, J. L. Yuly, W. F. DeGrado, M. J. Therien, and D. N. Beratan. | ||
Earlier kinetic analysis found that the distances and electrochemical potentials of the cofactors determine the quantum yields for delivering the holes to the chain termini. Even small changes to hopping site distances or electrochemical potentials can dramatically alter the network's bifurcation efficiency. Earlier studies of open system EB in the steady state focused on networks with uniform spacings between the nearest neighbor hopping sites, three (infinite) charge reservoirs, and equal magnitude slopes for the high- and low-electrochemical potential chains.9,12,13 The aim of the present study is to discover molecular designs that could produce high HB quantum yields (while maintaining acceptable energy efficiency) on closed HB networks without enforcing the simplifying constraints used in the earlier models for open system EB. This optimization focuses on the molecular architectures illustrated in Fig. 1 and 2. That is, we aimed to optimize light driven HB in de novo proteins where one high potential and one low potential electron hopping pathway direct charge from the two-hole bifurcation locus.
We combined kinetic network analysis, robust statistical analyses of large hole bifurcation designs, and machine learning models to study hole bifurcation target systems. Using Bayesian optimization of HB network quantum yield and random (unbiased) sampling of the design space, we learned how structural and energetic design parameters influence HB network quantum yields and energy efficiencies. This approach to bifurcation network design produced new insights with respect to earlier EB designs that were focused on near-reversible, open-system biological structures that operate in the steady state in contact with electron reservoirs. Our optimization of closed-system, light-driven hole bifurcation finds that high quantum yield HB can be realized with acceptable energy efficiency across a surprisingly wide range of energy and distance landscapes.
While our immediate focus is on two-hole bifurcation networks in de novo proteins, the challenge that we address, namely how to design functional kinetic networks assisted by machine-learning, is of broader interest. The design strategy that we describe can be mapped to other multi-site, multi-particle charge-transport or bifurcation networks in proteins, molecular materials, or supramolecular assemblies.
The rate constants (kij) for non-adiabatic hole transfer (HT) between pairs of redox active cofactors i and j were calculated using non-adiabatic electron transfer theory, accounting for one high-frequency (quantum) vibrational mode, shown in eqn (1):15,16
![]() | (1) |
For a given hole transfer reaction, 〈Vij2〉 is the thermally averaged electronic coupling; ΔG(0)ij is the free energy change; λij is the outer-sphere reorganization energy; D is the Huang–Rhys factor, equal to λin/(ħω), where λin is the reorganization energy of the high frequency mode; and ħω is the energy spacing for the (harmonic) high-frequency mode (see Section S1 for the estimated values for the parameters in eqn (1) and (2)).
The mean-squared electronic coupling, 〈Vij2〉, is typically approximated using an exponential decay model reflecting its sensitivity to the inter-cofactor distance Rij,17 shown in eqn (2)
| Vij = V0e−βRij | (2) |
Here, Rij is the edge-to-edge distance between the donor and acceptor heavy atoms.
The occupancy status of each cofactor within the network for any given redox microstate is represented by a vector S. For a system with N cofactors, [C1, C2, …, CN], S is formulated as eqn (3):
| S = [n1, n2, …, nN] | (3) |
The time evolving hole population on the network was simulated using a kinetic master equation based on the system microstates. The probability P(Si,t) of being in microstate Si (defined by the hole occupancy of each cofactor) at time t evolves according to eqn (4):
![]() | (4) |
| P(t) = eKtP(0) | (5) |
![]() | (6) |
The diagonal elements of the rate matrix, kii, are the negative sums of all rates from microstate Si to all other microstates:
![]() | (7) |
The quantum yield (QY) for HB is defined as the probability for the system to reach the final state where a hole resides on each of the two terminal acceptors for a long time (t = 1 s). This is calculated by summing the probabilities of all microstates S where both terminal cofactors are occupied:
![]() | (8) |
We define the energy efficiency metric, η, as in eqn (9):
![]() | (9) |
E(0)PMn in eqn (5) is the electrochemical potential of the hot-hole acceptor,
is the potential of the second tryptophan (W2) at the bifurcation site (this potential is relevant after the first hole leaves), and Erepulsion is the electrostatic repulsion energy between the two holes at their site formation (W1˙+ and W2˙+). In our earlier study, Erepulsion has been shown to be as much as 1.6 eV according to the DFT calculations, and tunable as a function of edge-to-edge distance between W1˙+ and W2˙+.10
Table 1 shows the range of electron transfer rate parameters (reorganization energies, electrochemical potentials, distances) that are selected to span biologically and synthetically plausible ranges. The parameter ranges in Table 1 were selected to span the structural and energetic space accessible to de novo proteins. The redox potential ranges for tryptophan and the terminal acceptors were derived from literature reports and previous experimental measurements in various protein environments. Our workflow involved: (1) defining the plausible physiological range for each parameter, (2) performing large-scale sampling of these ranges, and (3) calculating the resulting kinetic performance for each set.
We used two complementary strategies to explore the parameter ranges for HB described in Table 1. The first 100
000 parameter sets were generated through uniform random sampling across the parameter ranges indicated in Table 1. This random dataset provides unbiased sampling of the parameter space. Then, in order to improve the exposure of high quantum yield design in the dataset, we performed data augmentation by generating 30
000 more parameter sets, which have high quantum yields (>0.9) identified via Bayesian optimization (BOp), using quantum yield for successful HB as the objective function (see Section S2 for BOp details). We generated a total of 130
000 distinct configurations. These datasets allow us to explore crucial factors for the design of high-efficiency HB networks.
The 130
000 candidate networks represent theoretical “redox landscapes” defined by the parameters in Table 1. Each data point in the training set is a unique combination of distances and potentials, rather than a specific atomistic molecular structure. This approach allows us to discover general design principles that can subsequently be mapped onto specific amino acid sequences and residue positions.
000 enumerated bifurcating networks. These networks were generated as described in Section 2.2. These networks aim to provide sufficient context to establish an understanding of how to control the quantum yield and energy efficiency of HB networks. Correlations between each network's structural and energetic parameters and its performance metrics (quantum yield and energy efficiency) were quantified using Pearson's correlation coefficients and visualized as a heatmap—a standard approach for preliminary feature screening in multivariate studies.24 This linear regression framework enabled us to determine which parameters exert statistically significant effects on quantum yield and energy efficiency, to establish whether each effect is positive or negative (i.e., its sign), and to quantify the strength (effect size) of these relationships under the assumption of linear dependence (as a first-order approximation to capture the primary trends, despite the underlying nonlinearity) between inputs and outputs (see Section S3 for full regression details).
As we described in Section 2.2, we generated 130
000 HB network design candidates. We divided them with an 8
:
2 ratio for the training and testing sets. The training set input information is the set of network edge-to-edge distances, reduction potentials, reorganization energies, electrostatic repulsion energies, and the derived parameter ΔR (ΔR = RH − RL, to show the numerical relationships between RH and RL).
| Predicted high yield | Predicted low yield | |
|---|---|---|
| True high yield (>0.9) | True positive (TP) | False negative (FN) |
| True low yield (≤0.9) | False positive (FP) | True negative (TN) |
With these outcomes defined, we assessed classifier performance using four standard metrics: accuracy, precision, recall, and the F1-score.26 Accuracy measures the overall fraction of correctly classified parameter sets, shown in eqn (10):
![]() | (10) |
Precision quantifies the reliability of positive predictions, shown in eqn (11):
![]() | (11) |
![]() | (12) |
![]() | (13) |
Each of these metrics captures a distinct but complementary aspect of model performance. Accuracy provides a global measure of correct classifications, precision ensures that predicted high-yield networks are trustworthy, recall guards against missing genuinely promising designs, and the F1-score balances these competing goals. Considering all four metrics together is particularly important for HB network analysis, since the parameter space is large and highly nonlinear. In this context, false positives (low-yield networks incorrectly predicted as high yield) may waste experimental resources, whereas false negatives (high-yield networks overlooked by the model) risk discarding potentially valuable designs. By jointly evaluating all four metrics, we ensure that the classifier not only identifies high-yield candidates with confidence but also avoids systematically excluding viable configurations that could advance HB experimental realization.
Since this is a theoretical design study aiming to establish fundamental principles, the “true” labels for the ML model are based on the MJL kinetic model. To account for potential discrepancies with future experimental results, we focused on identifying “sweet-spot” ranges rather than single optimal points. These broad ranges suggest that high HB performance is robust to the typical fluctuations and errors inherent in biological redox parameters.
We also studied how lowered geometric symmetry between the hot- and cold-hole pathways influences quantum yield (quantified by correlations between quantum yield and the distance difference ΔR = RH − RL, where RH and RL are the edge-to-edge distances from the bifurcation site to the first hot- and cold-pathway cofactors). The value of the parameter ΔR has a robust positive correlation with the quantum yield (r = 0.46), suggesting that a network with a hot-pathway cofactor more distant from the bifurcation site than the cold-pathway cofactor produces higher quantum efficiencies for HB reaction. Taken together, these linear correlations provide two important design insights: (1) sufficiently large electrostatic repulsion is needed to drive high yield hole bifurcation (though this comes with a trade-off, as high repulsion simultaneously lowers the energy efficiency), and (2) maintaining a larger ΔR stabilizes the preferential routing of the hot hole onto the hot-pathway (rather than short-circuiting onto the cold-pathway), thereby enhancing the overall hole bifurcation yield.
000 HB network structures in the training set (e.g., inter-cofactor distances, reduction potentials, electrostatic repulsion, reorganization energy). The classifier achieved 96.2% accuracy (along with similar high score on both precision and recall) on a held-out test set (i.e., the subset of data not used during model training, reserved for independent performance evaluation) (see Table 3 for result details), demonstrating that this model is adequate to predict HB network quantum yields. We then applied SHAP analysis to this model in order to quantify the individual and combined influences of each feature on the predicted quantum yield.
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| High yield (>0.9) | 0.91 | 0.91 | 0.91 | 5520 |
| Low yield (≤0.9) | 0.98 | 0.97 | 0.98 | 20 480 |
| Accuracy | 0.96 | 26 000 |
||
| Macro Avg. | 0.94 | 0.94 | 0.94 | 26 000 |
| Weighted Avg. | 0.96 | 0.96 | 0.96 | 26 000 |
SHAP analysis assesses the contribution of each input parameter (i.e., inter-cofactor distances, reduction potentials, electrostatic repulsion, reorganization energy) to the XGBoost classifier's prediction, which distinguishes between networks yielding high (>0.9) versus low (≤0.9) quantum yield (Fig. 4). The summary plot in Fig. 4a shows the SHAP value for each parameter across all 130
000 generated HB network design candidate data samples. On the horizontal axis, a negative SHAP value indicates that a given parameter value shifts the model's output toward predicting a high-yield configuration, while a positive SHAP value favors a low-yield prediction. The color of each point encodes the parameter's magnitude (red = high, blue = low). For example, higher values of Erepulsion, RH, and ΔR consistently produce negative SHAP values, confirming that strong hole–hole repulsion and a larger RH–RL gap are key drivers of high quantum yield. Interestingly, larger total pathway distances R also yield negative SHAP values in this classifier—trained specifically to distinguish configurations with quantum yields above 0.9—suggesting that within the subset of already high-performing designs, shorter individual step distances can be associated with slight variations in yield that are compensated by other optimized parameters.
We ranked the network parameters by their overall influence on quantum yield, quantified as the mean absolute SHAP value across all samples, which provides a global measure of feature importance in predicting high quantum yield (Fig. 4b). In line with the Pearson correlation results, R, Erepulsion, and RH emerge as the most critical determinants of high quantum yield. This ranking informs experimental design. For example, de novo protein scaffolds can be engineered systematically to fine-tune these three parameters, focusing first on modulating R and RH by residue positioning. Changing the amino acid composition or geometry near the bifurcation site can be used to tune Erepulsion, either by changing the local dielectric environment, introducing charged or polar residues, or altering the proximity or orientation of the bifurcating tryptophans.
When we just examine cases that produced quantum yields >0.9, we find broad “sweet-spot” ranges for these high quantum yields (i.e., yields in the 20th–80th percentile of the candidates within the high-yield subset): R between 5.4 and 8.8 Å, Erepulsion between 1.25 and 1.71 eV, and RH between 11.2 and 14.5 Å. These parameter windows, which are remarkably wide, suggest that high-performing HB networks are robust to modest changes in structure and energetics, easing the practical design of the targeted de novo proteins.
While the identified “sweet-spot” for Erepulsion (1.25–1.71 eV) is high, such values are physically realizable within the specialized environments of de novo protein cores. The highest value of the sweet-spot has been verified in our previous work.10 By engineering the scaffold to maintain a low local dielectric constant (ε ≈ 2–4) and positioning the bifurcating tryptophans within a hydrophobic pocket shielded from the aqueous interface, the electrostatic repulsion between photogenerated radical cations can be significantly enhanced. Furthermore, our SHAP analysis indicates that the high-yield classification is robust to modest fluctuations in reorganization energy, suggesting that these designs can remain functional despite the dynamic nature of the protein matrix.
To implement these findings, designers can tune Erepulsion by modifying the local dielectric environment—for example, by replacing surface-exposed residues with hydrophobic ones to shield the bifurcation site. Distances like RH and R can be precisely controlled by selecting specific i, i + 7 or i, i + 11 positions on an α-helix to position the cofactors. This provides a concrete roadmap for converting the identified ‘sweet spots’ into experimental protein sequences.
![]() | ||
Fig. 5 Simulated quantum yield dependence on key HB network parameters. Both plots utilize representative constant values for several parameters, based on optimized or typical values from our previous work:10 RL = 5 Å, inter-cofactor path distance = 5 Å, λ = 1.0 eV, , E(0)PMn = 1.6 V (and other intermediate potentials as specified in the simulation code). For plot (a), repulsion is fixed at 1.6 eV; for plot (b), RH is fixed at 10 Å. (a) Quantum yield initially increases as RH increases from 5 Å. This improves the condition RH > RL, kinetically favoring the initial transfer of the higher-energy (hot) hole towards the hot pathway (WH1) over the cold pathway (WL1), thereby promoting successful bifurcation. The yield peaks within an optimal range (9–14 Å), consistent with the “sweet spot” identified by SHAP analysis. As RH increases further (>14 Å), the yield drops significantly because the rate of the first crucial hole transfer step decreases exponentially with distance due to the decay of the electronic coupling term 〈Vij2〉 in the MJL rate expression (eqn (1)), becoming too slow to compete effectively with potential recombination or undesired pathways within the simulation time. (b) Quantum yield shows a threshold dependence on the electrostatic repulsion energy. Below 0.8 eV, the yield is negligible, indicating insufficient driving force for efficient bifurcation. The yield rises sharply between 0.8 eV and 1.3 eV as repulsion provides the critical energy to drive the hot hole transfer, plateauing near unity for repulsion >1.3 eV. Concurrently, the energy efficiency (orange curve), calculated via eqn (9), decreases monotonically with increasing repulsion. Plotting both metrics visually demonstrates the crucial design trade-off: maximizing yield requires high repulsion (>1.3 eV), while maximizing energy efficiency favors low repulsion. The optimal range for both high quantum yield and acceptable energy efficiency, therefore, lies in a compromise region, consistent with the SHAP analysis sweet spot (1.25–1.71 eV), where yield is high but the energetic cost of repulsion is not excessive. (a) Quantum yield dependence on RH. Besides RH, which is varied from 5 Å to 20 Å, all other parameters are held constant at the representative values listed in the main caption. (b) Quantum yield and energy efficiency dependence on repulsion energy. Besides the repulsion energy, which is varied from 0.5 eV to 1.6 eV, all other parameters are held constant at the representative values listed in the main caption. | ||
The quantum yield shows a distinctive optimal range with respect to the distance RH (see Fig. 5a). Increasing RH (while keeping RL short) enhances the yield by directing the first (hot) hole to its high-potential pathway, consistent with the positive correlation found for quantum yield versus ΔR = RH − RL. However, beyond an optimal distance (14 Å in this simulation), the HB yield drops sharply. This decline is attributed to the exponential decay of the hole transfer rate with distance, making the second step hole transfer from W2 to WL1 too slow for productive HB. The peak performance found in this simulation aligns well with the optimal range of RH values identified in the broader SHAP analysis.
The critical role of electrostatic repulsion at the bifurcating site is indicated in Fig. 5b. The quantum yield (blue curve) shows a clear threshold behavior as a function of Erepulsion, rising rapidly when the repulsion between the holes is in the ∼0.8–1.1 eV window. This finding indicates that a sufficient repulsion strength is needed to provide the thermodynamic driving force required to initiate effective HB. Once hole–hole repulsion is sufficiently large (1.3 eV),10 the HB quantum yield approaches unity. The energy efficiency (orange curve) defined by eqn (9), however, shows the opposite trend. That is, the energy efficiency decreases monotonically as the hole–hole repulsion increases (that is, much of the hole–hole repulsion energy is dissipated). Fig. 5b shows key trade-offs in designing HB systems: maximizing quantum yield necessitates a high driving force to deliver the first hole, but growing the driving force for HB comes at the cost of overall energy efficiency. The repulsion energy optimal range for quantum yield identified by SHAP (1.25–1.71 eV) represents a balance among achieving a high quantum yield and an acceptable energy efficiency; near unit quantum yield can only be realized with an energy efficiency of 30–40%. If we require the design to realize very high energy efficiency, the quantum yield for EB would be very small.
The design examples in Fig. 5 underscore the non-linear relationships involved in designing high-performance HB networks, reinforcing the value of comprehensive statistical analysis and machine learning in network design.
The strategy that we described here (large-scale network enumeration → kinetic simulation → correlation/statistical screening → interpretable ML feature-analysis) constitutes a general design framework for engineered charge-transport networks. While our study focused on light-driven hole bifurcation in de novo proteins, the approach can be extended to design other complex multi-site networks. Networks of this kind might also be used for complex materials assembly or chemical catalysis, enabling the design of functional far-from-equilibrium systems.
| This journal is © the Owner Societies 2026 |