Accelerating the prediction of remanent polarization in multicomponent ferroelectrics by using variational autoencoder-based data augmentation†
Abstract
As potential next-generation power systems, ferroelectric capacitors have been thus widely studied, and artificial intelligence (AI) is becoming an efficient tool for searching new systems. As a key parameter that directly affects the energy storage density (Wrec) of capacitors, obtaining low remanent polarization (Pr) is important. To enhance the processing of high-dimensional and nonlinear data and to predict key parameters, this study employs a strategy that integrates data augmentation with feature selection. Based on the atomic structure, electronic configuration, and crystal structure of (K1−x−y−zNaxBayCaz)(Nb1−u−v−wZruTiv)O3, we selected 46 initial features. Subsequently, using a conditional variational autoencoder (CVAE), we synthesized 20 000 new data points from 234 original samples to expand the dataset and verify the credibility of the generated data. Finally, through a machine learning strategy, multiple algorithm models were established for training and prediction Pr; the determination coefficient (R2) of the XGBoost (XGB) model was 0.94 for training and predicting Pr, and through a series of feature selection processes, ultimately four kinds of key descriptors that affect Pr were identified: Matyonov–Batsanov electronegativity, Shannon ionic radius, tolerance factor, and core electron distance (Schubert) of A-site elements. The model accurately predicted the properties of two ceramic systems, including samples with elements beyond the original input space, and the model still showed strong predictive ability. This study not only offers valuable insights for enriching sparse datasets in materials science via data augmentation but also demonstrates an effective strategy for accelerating the prediction of remnant polarization in complex ferroelectric systems.