A review of mathematical representations of biomolecular data

Duc Duy Nguyen; Zixuan Cang; Guo-Wei Wei

doi:10.1039/C9CP06554G

A review of mathematical representations of biomolecular data

Duc Duy Nguyen,

^a Zixuan Cang

^a and Guo-Wei Wei

*^abc

Author affiliations

* Corresponding authors

^a Department of Mathematics, Michigan State University, MI 48824, USA
E-mail: weig@msu.edu

^b Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA

^c Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA

Abstract

Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein–ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.

This article is part of the themed collections: Emerging AI Approaches in Physical Chemistry, PCCP Perspectives and 2020 PCCP HOT Articles

Article information

https://doi.org/10.1039/C9CP06554G

Article type

Perspective

Submitted

03 Dec 2019

Accepted

17 Jan 2020

First published

22 Jan 2020

Download Citation

Phys. Chem. Chem. Phys., 2020,22, 4343-4367

Permissions

Request permissions

A review of mathematical representations of biomolecular data

D. D. Nguyen, Z. Cang and G. Wei, Phys. Chem. Chem. Phys., 2020, 22, 4343 DOI: 10.1039/C9CP06554G

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Physical Chemistry Chemical Physics

A review of mathematical representations of biomolecular data

Abstract

Article information

Download Citation

Author version available

Permissions

A review of mathematical representations of biomolecular data

Social activity

Search articles by author

Spotlight

Advertisements