Prediction of protein pKa with representation learning

Hatice Gokcan; Olexandr Isayev

doi:10.1039/D1SC05610G

Prediction of protein pK_a with representation learning†

Hatice Gokcan

^a and Olexandr Isayev

*^a

Author affiliations

* Corresponding authors

^a Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA
E-mail: olexandr@olexandrisayev.com

Abstract

The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of pK_a are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein pK_a prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the pK_a for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.

This article is part of the themed collections: Celebrating five years of ChemRxiv and Most popular 2022 organic chemistry articles

Supplementary files

Article information

DOI: https://doi.org/10.1039/D1SC05610G
Article type: Edge Article
Submitted: 12 Oct 2021
Accepted: 29 Jan 2022
First published: 01 Feb 2022
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2022,13, 2462-2474

Permissions

Request permissions

Prediction of protein pK_a with representation learning

H. Gokcan and O. Isayev, Chem. Sci., 2022, 13, 2462 DOI: 10.1039/D1SC05610G

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Chemical Science