Understanding the ML black box with simple descriptors to predict cluster–adsorbate interaction energy†
Density functional theory (DFT) is currently one of the most accurate and yet practical theories used to gain insight into the properties of materials. Although successful, the computational cost required is still the main hurdle even today. In recent years, there has been a trend of combining DFT with Machine Learning (ML) to reduce the computational cost without compromising accuracy. Finding the right set of descriptors that are simple to understand in terms of giving insights about the problem at hand, lies at the heart of any ML problem. In this work, we demonstrate the use of nearest neighbor (NN) distances as descriptors to predict the interaction energy between the cluster and an adsorbate. The model is trained over a size range of 5 to 75 atom clusters. When the training and testing is carried out on mutually exclusive cluster sizes, the mean absolute error (MAE) in predicting the interaction energy is ∼ 0.24 eV. MAE reduces to 0.1 eV when testing and training sets include information from the complete range. Furthermore, when the same set of descriptors are tested over individual sizes, the MAE further reduces to ∼0.05 eV. We bring out the correlation between dispersion in the nearest neighbor distances and variation in MAE for individual sizes. Our detailed and extensive DFT calculations provide a rationale as to why nearest neighbor distances work so well. Finally, we also demonstrate the transferability of the ML model by applying the same recipe of descriptors to systems of different elements like (Na10), bimetallic systems (Al6Ga6, Li4Sn6, and Au40Cu40) and also different adsorbates (N2, O2, and CO).