Statistically representative databases for density functional theory via data science

Pierpaolo Morgante; Roberto Peverati

doi:10.1039/C9CP03211H

Statistically representative databases for density functional theory via data science†

Pierpaolo Morgante

^a and Roberto Peverati

*^a

Author affiliations

* Corresponding authors

^a Chemistry Program, Florida Institute of Technology, 150 W. University Blvd., Melbourne, Florida, USA
E-mail: rpeverati@fit.edu

Abstract

The amount of data and number of databases for the assessment and parameterization of density functional theory methods has grown substantially in the past two decades. In this work, we introduce a novel cluster analysis technique for density functional theory calculations of the electronic structure of atoms and molecules with the goal of creating new statistically significant databases with broad chemical scope, and a manageable number of data-points. By analyzing without a priori chemical assumptions a population of almost 350k data-points, we create a new database called ASCDB containing only 200 data-points. This new database holds the same chemical information as the larger population of data from which it is obtained, but with a computational cost that is reduced by several orders of magnitude. The labelling of the significant chemical properties is performed a posteriori on the resulting 16 subsets, classifying them into four areas of chemical importance: non-covalent interactions, thermochemistry, non-local effects, and unbiased calculations. The analysis of the results and their transferability shows that ASCDB is capable of providing the same information as that of the larger collection of data—such as GMTKN55, MGCDB84, and Minnesota 2015B—for several density functional theory methods and basis sets. In light of these results, we suggest the use of this new small database as a first inexpensive tool for the evaluation and parameterization of electronic structure theory methods.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C9CP03211H
Article type: Paper
Submitted: 06 Jun 2019
Accepted: 15 Aug 2019
First published: 15 Aug 2019

Download Citation

Phys. Chem. Chem. Phys., 2019,21, 19092-19103

Author version available

Download author version (PDF)

Permissions

Request permissions

Statistically representative databases for density functional theory via data science

P. Morgante and R. Peverati, Phys. Chem. Chem. Phys., 2019, 21, 19092 DOI: 10.1039/C9CP03211H

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Physical Chemistry Chemical Physics

Statistically representative databases for density functional theory via data science†

Abstract

Supplementary files

Article information

Download Citation

Author version available

Permissions

Statistically representative databases for density functional theory via data science

Social activity

Search articles by author

Spotlight

Advertisements