Issue 12, 2025

Leveraging large language models for enzymatic reaction prediction and characterization

Abstract

Predicting enzymatic reactions is crucial for applications in biocatalysis, metabolic engineering, and drug discovery, yet it remains a complex and resource-intensive task. Large Language Models (LLMs) have recently demonstrated remarkable success in various scientific domains, e.g., through their ability to generalize knowledge, reason over complex structures, and leverage in-context learning strategies. In this study, we systematically evaluate the capability of LLMs, particularly the Llama-3.1 family (8B and 70B), across three core biochemical tasks: enzyme commission number prediction, forward synthesis, and retrosynthesis. We compare single-task and multitask learning strategies, employing parameter-efficient fine-tuning via LoRA adapters. Additionally, we assess performance across different data regimes to explore their adaptability in low-data settings. Our results demonstrate that fine-tuned LLMs capture biochemical knowledge, with multitask learning enhancing forward- and retrosynthesis predictions by leveraging shared enzymatic information. We also identify key limitations, for example challenges in hierarchical EC classification schemes, highlighting areas for further improvement in LLM-driven biochemical modeling.

Graphical abstract: Leveraging large language models for enzymatic reaction prediction and characterization

Article information

Article type
Paper
Submitted
08 May 2025
Accepted
21 Sep 2025
First published
30 Oct 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025,4, 3588-3609

Leveraging large language models for enzymatic reaction prediction and characterization

L. Di Fruscia and J. M. Weber, Digital Discovery, 2025, 4, 3588 DOI: 10.1039/D5DD00187K

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements