Issue 35, 2019

Statistically representative databases for density functional theory via data science

Abstract

The amount of data and number of databases for the assessment and parameterization of density functional theory methods has grown substantially in the past two decades. In this work, we introduce a novel cluster analysis technique for density functional theory calculations of the electronic structure of atoms and molecules with the goal of creating new statistically significant databases with broad chemical scope, and a manageable number of data-points. By analyzing without a priori chemical assumptions a population of almost 350k data-points, we create a new database called ASCDB containing only 200 data-points. This new database holds the same chemical information as the larger population of data from which it is obtained, but with a computational cost that is reduced by several orders of magnitude. The labelling of the significant chemical properties is performed a posteriori on the resulting 16 subsets, classifying them into four areas of chemical importance: non-covalent interactions, thermochemistry, non-local effects, and unbiased calculations. The analysis of the results and their transferability shows that ASCDB is capable of providing the same information as that of the larger collection of data—such as GMTKN55, MGCDB84, and Minnesota 2015B—for several density functional theory methods and basis sets. In light of these results, we suggest the use of this new small database as a first inexpensive tool for the evaluation and parameterization of electronic structure theory methods.

Graphical abstract: Statistically representative databases for density functional theory via data science

Supplementary files

Article information

Article type
Paper
Submitted
06 Jun 2019
Accepted
15 Aug 2019
First published
15 Aug 2019

Phys. Chem. Chem. Phys., 2019,21, 19092-19103

Author version available

Statistically representative databases for density functional theory via data science

P. Morgante and R. Peverati, Phys. Chem. Chem. Phys., 2019, 21, 19092 DOI: 10.1039/C9CP03211H

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements