Self-learning entropic population annealing for interpretable materials design†
Abstract
In automatic materials design, samples obtained from black-box optimization offer an attractive opportunity for scientists to gain new knowledge. Statistical analyses of the samples are often conducted, e.g., to discover key descriptors. Since most black-box optimization algorithms are biased samplers, post hoc analyses may result in misleading conclusions. To cope with the problem, we propose a new method called self-learning entropic population annealing (SLEPA) that combines entropic sampling and a surrogate machine learning model. Samples of SLEPA come with weights to estimate the joint distribution of the target property and a descriptor of interest correctly. In short peptide design, SLEPA was compared with pure black-box optimization in estimating the residue distributions at multiple thresholds of the target property. While black-box optimization was better at the tail of the target property, SLEPA was better for a wide range of thresholds. Our result shows how to reconcile statistical consistency with efficient optimization in materials discovery.