Machine-guided representation for accurate graph-based molecular machine learning†
Abstract
In chemistry-related fields, graph-based machine learning has received significant attention as atoms and their chemical bonds in a molecule can be represented as a mathematical graph. However, many molecular properties are sensitive to changes in the molecular structure. For this reason, molecules have a mixed distribution for their molecular properties in molecular space, and it consequently makes molecular machine learning difficult. However, this problem has not been investigated in either chemistry or computer science. To tackle this problem, we propose a robust and machine-guided molecular representation based on deep metric learning (DML), which automatically generates an optimal representation for a given dataset. To this end, we first adopt DML for molecular machine learning by integrating it with graph neural networks (GNNs) and devising a new objective function for representation learning. In experimental evaluations, machine learning algorithms with the proposed method achieved better prediction accuracy than state-of-the-art GNNs. Furthermore, the proposed method was also effective on extremely small datasets, and this result is impressive because many real world applications suffer from a lack of training data.
- This article is part of the themed collection: Emerging AI Approaches in Physical Chemistry