Machine learning in computational chemistry: interplay between (non)linearity, basis sets, and dimensionality
Abstract
Machine learning (ML) based methods and tools have now firmly established themselves in physical chemistry and in particular in theoretical and computational chemistry and in materials chemistry. The generality of popular ML techniques such as neural networks or kernel methods (Gaussian process and kernel ridge regression and their flavors) permitted their application to diverse problems from prediction of properties of functional materials (catalysts, solid state ionic conductors, etc.) from descriptors to the building of interatomic potentials (where ML is currently routinely used in applications) and electron density functionals. These ML techniques are assumed to have superior expressive power of nonlinear methods, and are often used “as is”, with concepts such as “non-parametric” or “deep learning“ used without a clear justification for their need or advantage over simpler and more robust alternatives. In this Perspective, we highlight some interrelations between popular ML techniques and traditional linear regressions and basis expansions and demonstrate that in certain regimes (such as a very high dimensionality) these approximations might collapse. We also discuss ways to recover the expressive power of a nonlinear approach and to help select hyperparameters with the help of high-dimensional model representation and to obtain elements of insight while preserving the generality of the method.
- This article is part of the themed collections: 2023 PCCP Reviews and Insightful Machine Learning for Physical Chemistry