Data-driven machine learning model for the prediction of oxygen vacancy formation energy of metal oxide materials
Abstract
Metal oxides are widely used in the fields of chemistry, physics and materials science. Oxygen vacancy formation energy is a key parameter to describe the chemical, mechanical, and thermodynamic properties of metal oxides. How to acquire quickly and accurately oxygen vacancy formation energy remains a challenge for both experimental and theoretical researchers. Herein, we propose a machine learning model for the prediction of oxygen vacancy formation energy via data-driven analysis and the definition of simple descriptors. Starting with the database containing oxygen vacancy formation energies for 1750 metal oxides with enough structural diversity, new descriptors that effectively avoid the defects of molecular fingerprints, molecular graphic descriptors and site descriptors are defined. The descriptors have obvious physical meanings and wide practicability. Multiple linear regression analysis is then used to screen important features for machine learning model development, and two strongly associated features are obtained. The selected descriptors are used as input for the training of 21 machine learning models to select and develop the most accurate machine learning model. Finally, it is shown that the least squares support vector regression method exhibits the best performance for accurate prediction of the targeted oxygen vacancy formation energy through systematic error analysis, and the prediction accuracy is also verified by the external dataset. Our work establishes a novel and simple computational approach for accurate prediction of the oxygen vacancy formation energy of metal oxides and highlights the availability of data-driven analysis for metal oxide material research.