Coherent collections of rules describing exceptional materials identified with a multi-objective optimization of subgroups†
Abstract
Useful materials are often statistically exceptional and they might be overlooked by artificial intelligence (AI) models that attempt to describe all materials simultaneously. These global models perform well for the majority of materials, but they do not necessarily capture the useful ones. Subgroup discovery (SGD) identifies descriptions of subsets of materials associated with exceptional values of a chosen property. Thus, SGD can better capture exceptional materials compared to widely used AI techniques. Previous studies focused on the SG that maximizes an objective function establishing a tradeoff between the SG size and the exceptionality of the distribution of property values within the SG. However, this optimization does not give a unique solution, but many SGs typically have similar objective-function values. Here, we identify a “Pareto region” of SGD solutions presenting a multitude of size-exceptionality tradeoffs. The approach is demonstrated by learning descriptions of perovskites with a high bulk modulus.