Composition-property extrapolation for compositionally complex solid solutions based on word embeddings
Abstract
Mastering the challenge of predicting properties of unknown materials with multiple principal elements (high entropy alloys/compositionally complex solid solutions) is crucial for the speedup in materials discovery. We show and discuss three models, using experimentally measured electrocatalytic performance data from two ternary systems (Ag-Pd-Ru; Ag-Pd-Pt), to predict electrocatalytic performance in the shared quaternary system (Ag-Pd-Pt-Ru). As a starting point, we apply Gaussian Process Regression (GPR) based on composition as the feature, which includes both Ag and Pd, achieving an initial correlation coefficient for the prediction (r) of 0.63 and a determination coefficient (r2) of 0.08. Second, we present a version of the GPR model using word embedding-derived materials vectors as features. Using materials-specific embedding vectors significantly improves the predictions, evident from an improved r2 of 0.65. The third model is based on a `standard vector method' which synthesizes weighted vector representations of material properties as features, then creating a reference vector that results in a very good correlation with the quaternary system's material performance (resulting r of 0.94). Our approach demonstrates that existing experimental data combined with the latent knowledge of word embedding-derived representations of materials can be used effectively for materials discovery where data is typically scarce.