Classification of spatially resolved molecular fingerprints for machine learning applications and development of a codebase for their implementation

Mardochee Reveil; Paulette Clancy

doi:10.1039/C8ME00003D

Classification of spatially resolved molecular fingerprints for machine learning applications and development of a codebase for their implementation

Mardochee Reveil*^a and Paulette Clancy

Author affiliations

* Corresponding authors

^a Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA
E-mail: mr937@cornell.edu

Abstract

Direct mapping between material structures and properties for various classes of materials is often the ultimate goal of materials researchers. Recent progress in the field of machine learning has created a unique path to develop such mappings based on empirical data. This new opportunity warranted the need for the development of advanced structural representations suitable for use with current machine learning algorithms. A number of such representations termed “molecular fingerprints” or descriptors have been proposed over the years for this purpose. In this paper, we introduce a classification framework to better explain and interpret existing fingerprinting schemes in the literature, with a focus on those with spatial resolution. We then present the implementation of SEING, a new codebase to computing those fingerprints, and we demonstrate its capabilities by building k-nearest neighbor (k-NN) models for force prediction that achieve a generalization accuracy of 0.1 meV Å⁻¹ and an R² score as high as 0.99 at testing. Our results indicate that simple and generally overlooked k-NN models could be very promising compared to approaches such as neural networks, Gaussian processes, and support vector machines, which are more commonly used for machine learning-based predictions in computational materials science.

This article is part of the themed collection: Machine Learning and Data Science in Materials Design

Article information

https://doi.org/10.1039/C8ME00003D

Article type

Paper

Submitted

09 Jan 2018

Accepted

19 Feb 2018

First published

20 Feb 2018

Download Citation

Mol. Syst. Des. Eng., 2018,3, 431-441

Permissions

Request permissions

Classification of spatially resolved molecular fingerprints for machine learning applications and development of a codebase for their implementation

M. Reveil and P. Clancy, Mol. Syst. Des. Eng., 2018, 3, 431 DOI: 10.1039/C8ME00003D

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Molecular Systems Design & Engineering

Classification of spatially resolved molecular fingerprints for machine learning applications and development of a codebase for their implementation

Abstract

Article information

Download Citation

Permissions

Classification of spatially resolved molecular fingerprints for machine learning applications and development of a codebase for their implementation

Social activity

Search articles by author

Spotlight

Advertisements