Composition-property extrapolation for compositionally complex solid solutions based on word embeddings

Abstract

Mastering the challenge of predicting properties of unknown materials with multiple principal elements (high entropy alloys/compositionally complex solid solutions) is crucial for the speedup in materials discovery. We show and discuss three models, using experimentally measured electrocatalytic performance data from two ternary systems (Ag-Pd-Ru; Ag-Pd-Pt), to predict electrocatalytic performance in the shared quaternary system (Ag-Pd-Pt-Ru). As a starting point, we apply Gaussian Process Regression (GPR) based on composition as the feature, which includes both Ag and Pd, achieving an initial correlation coefficient for the prediction (r) of 0.63 and a determination coefficient (r2) of 0.08. Second, we present a version of the GPR model using word embedding-derived materials vectors as features. Using materials-specific embedding vectors significantly improves the predictions, evident from an improved r2 of 0.65. The third model is based on a `standard vector method' which synthesizes weighted vector representations of material properties as features, then creating a reference vector that results in a very good correlation with the quaternary system's material performance (resulting r of 0.94). Our approach demonstrates that existing experimental data combined with the latent knowledge of word embedding-derived representations of materials can be used effectively for materials discovery where data is typically scarce.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
23 Apr 2025
Accepted
19 May 2025
First published
19 May 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Accepted Manuscript

Composition-property extrapolation for compositionally complex solid solutions based on word embeddings

L. Zhang, L. Banko, W. Schuhmann, A. Ludwig and M. Stricker, Digital Discovery, 2025, Accepted Manuscript , DOI: 10.1039/D5DD00169B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements