Decoding substrate specificity determining factors in glycosyltransferase-B enzymes – insights from machine learning models

Abstract

Substrate specificity is an essential characteristic of any enzyme's function and an understanding of the factors that determine this specificity is crucial for enzyme engineering. Unlike the structure of an enzyme which is directly impacted by its sequence, substrate specificity as an enzyme attribute involves a rather indirect relationship with sequence as it also depends on structural aspects that dictate substrate accessibility and active site dynamics. In this study, we explore the performance of classifier-based machine learning models trained on curated sequence and structural data for a class of glycosyltransferases (GTs), namely GT-Bs, to understand their substrate specificity determining factors. GTs enable the transfer of sugar moieties to other biomolecules such as oligosaccharides or proteins and are found in all kingdoms of life. In plants, GTs participate in the biosynthesis of plant cell wall biopolymers (e.g.: hemicelluloses and pectins) and are an integral part of the enzymatic machinery that enables the storage of carbon and energy as plant biomass. To elucidate the substrate specificity of uncharacterized GT-Bs, we constructed multi-label machine learning models (Support Vector Classifier, K-Nearest Neighbors, Gaussian Naïve-Bayes, Random Forest) that incorporate both sequence and structural features. These models achieve good predictive accuracies on test datasets. However, despite our use of structural information, we highlight that there is further scope for improvement in training these models to draw interpretable relationships between sequence, structure and substrate specificity determining motifs in GT-Bs.

Graphical abstract: Decoding substrate specificity determining factors in glycosyltransferase-B enzymes – insights from machine learning models

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
21 Oct 2024
Accepted
02 Jul 2025
First published
04 Jul 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Advance Article

Decoding substrate specificity determining factors in glycosyltransferase-B enzymes – insights from machine learning models

S. G. Hennen, Y. J. Bomble, B. R. Urbanowicz and V. S. Bharadwaj, Digital Discovery, 2025, Advance Article , DOI: 10.1039/D4DD00338A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements