DORA-XGB: an improved enzymatic reaction feasibility classifier trained using a novel synthetic data approach

Abstract

Retrobiosynthesis tools harness the inherent promiscuities of enzymes for the de novo design of novel biosynthetic pathways to key small molecules. Many existing pathway search algorithms rely on exhaustively enumerating the space of all possible enzymatic reactions using generalized rules, followed by an extensive analysis of the ensuing reaction network to extract candidate pathways for experimental validation. While this approach is comprehensive, many false positive reactions are often generated given the permissiveness of such reaction rules. Here, we have developed DORA-XGB, a enzymatic reaction feasibility classifier. DORA-XGB can be used within our DORAnet framework to assess whether newly enumerated enzymatic reactions and pathways would be feasible. To curate a training dataset for our model, we extracted enzymatic reactions from public databases and screened them for their general thermodynamic feasibility. We then considered alternate reaction centers on known substrates to strategically generate infeasible reactions with high confidence, thereby circumventing the lack of negative data in the literature. In training our model, we also experimented with various molecular fingerprinting techniques and configurations for assembling reaction fingerprints, taking into account not just primary substrate and primary product structures, but cofactor structures as well. Our model's utility is demonstrated through favorable benchmarking against a previously published classifier, the successful recovery of newly published reactions, and the ranking of previously predicted pathways for the biosynthesis of propionic acid from pyruvate.

Graphical abstract: DORA-XGB: an improved enzymatic reaction feasibility classifier trained using a novel synthetic data approach

Supplementary files

Article information

Article type
Paper
Submitted
12 Jul 2024
Accepted
31 Oct 2024
First published
02 Nov 2024
This article is Open Access
Creative Commons BY license

Mol. Syst. Des. Eng., 2024, Advance Article

DORA-XGB: an improved enzymatic reaction feasibility classifier trained using a novel synthetic data approach

Y. Chainani, Z. Ni, K. M. Shebek, L. J. Broadbelt and K. E. J. Tyo, Mol. Syst. Des. Eng., 2024, Advance Article , DOI: 10.1039/D4ME00118D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements