Leveraging active site information for deep learning prediction of enzyme-substrate Michaelis constants
Abstract
The Michaelis constant (KM) is a key parameter in enzymology. Its experimental measurement is often low-throughput and costly, but machine learning (ML) can identify patterns to make predictions in a high-throughput way. In this work, we introduce a novel approach that explicitly incorporates enzyme-substrate interface information by encoding the enzyme’s active site as a feature. Using a simple multilayer perceptron (MLP) with a gated layer, we demonstrate that this explicit active site information enables our model, Active Site for KM (AS4Km), to achieve competitive performance on the independent GMKM test set, despite its relatively simple architecture. Ablation studies confirm that active site features significantly enhance generalization to unseen data and distant enzyme sequences. Furthermore, our analysis highlights a critical limitation in current enzymology databases: predictive performance is heavily reliant on substrate identity due to low substrate diversity and a bias towards active enzyme-substrate complexes. Our results show that AS4Km, a data-driven approach combined with explicit interaction interface features, displays competitive performance in the prediction of KM values for enzyme-substrate complexes, and may be able to assist in the identification of novel substrates for known enzymes.
Please wait while we load your content...