Open Access Article
Victor Posligua
*a,
Karina Landivar
a,
Elena R. Remesal
a,
Gerda Rogl
b,
Peter F. Rogl
b,
Javier Fdez Sanz
a,
Jesús Prado-Gonjal
c,
Antonio M. Márquez
a and
Jose J. Plata
*a
aDepartamento de Química Física, Facultad de Química, Universidad de Sevilla, Seville, E-41012, Spain. E-mail: jplata@us.es
bInstitute of Materials Chemistry, University of Vienna, Waehringerstraße 42, Wien, A-1090, Austria
cDepartamento de Química Inorgánica, Universidad Complutense de Madrid, Madrid, Spain
First published on 20th April 2026
The integration of artificial intelligence and machine learning is rapidly transforming the landscape of materials discovery, facilitating unprecedented acceleration in the exploration of vast chemical spaces and the prediction of material properties. However, the adoption of these advanced techniques has not been uniform across all subfields of materials science; thermoelectrics, for instance, has experienced a relatively slower penetration. This lag can be attributed to several inherent challenges, including the physical complexity of thermoelectric phenomena, the scarcity and reliability issues of available data, and limitations concerning the applicability of general models. To address these challenges, in this work, we have developed a machine learning model, based on neural networks, specifically for the accurate prediction of the thermoelectric figure of merit (zT) in skutterudites. This model demonstrates high accuracy, with prediction errors approaching the range of experimental uncertainties reported for zT measurements. Furthermore, it offers the crucial capability of extracting design rules grounded in the underlying physics and chemistry of these materials, providing valuable insights for optimization. Most importantly, our model is applicable for the high-throughput screening of extensive chemical spaces, facilitating the efficient discovery of novel and high-performance thermoelectric materials.
Thermoelectric efficiency is typically measured by the thermoelectric figure of merit, zT. This magnitude directly depends on electronic (Seebeck coefficient and electrical conductivity) and thermal (thermal conductivity) properties. The dual nature of zT, encompassing both electronic and thermal transport phenomena, presents a key challenge: the strong interdependence between these properties, which makes enhancing zT particularly difficult.8 Additionally, transport properties are strongly sensitive to synthesis and processing conditions. Small changes in variables such as chemical composition,9 grain size,10 or carrier concentration11 can significantly alter some of these properties, even by orders of magnitude. This is why some of the earliest applications of ML in thermoelectrics have focused on the specific prediction of thermal conductivity,12,13 Seebeck coefficient,14,15 or electrical conductivity.16
The lack of sufficient and reliable databases for thermoelectric materials is another impediment to the widespread adoption of ML in this field. The sensitivity of transport properties to a large set of variables makes it extremely difficult to establish a complete enough dataset that not only includes thermoelectric properties but also a sufficiently detailed characterization of the samples. During the last years, the scientific community is addressing this issue through: (i) automatic data mining and curation of existing experimental reports,17–19 and (ii) theoretical predictions, which complement experimental efforts by generating large datasets of material properties through simulations.20 However, both experimental and theoretical databases have to be used with caution. On the one hand, experimental measurement of transport properties is associated with moderate uncertainties. It is well documented that measurement discrepancies for zT, in different round-robin studies can vary between ±11.5% and ±17.0% from the averages, depending on the temperature range and the material.21,22 Some pioneering works have attempted to address this issue through the in-house synthesis of large experimental datasets, with properties measured under the same conditions.23 This approach has been successfully applied to medium-sized chemical spaces, such as subsets of Ag–Cu-based chalcopyrites, but remains feasible only for a few laboratories and can be very time- and resource-intensive for larger chemical spaces. On the other hand, theoretical prediction of transport properties is severely hampered by the trade-off between accuracy and computational cost. Over the past decade, it has been demonstrated that first-principles-based predictions of transport properties can be highly accurate,24–27 however this accuracy is closely linked to a high computational cost, which impedes the generation of very large datasets. The main challenge here is the development of high-throughput frameworks that combine accuracy, low computational cost, and automation.28
The obstacles mentioned above have led to the creation of only a few ML models for predicting zT. While some models focus on specific materials and explore minor modifications,29–31 others aim to be more broadly applicable.32–34 However, most of these models remain mostly structure-agnostic, which highlights the necessity for a more focused strategy to address two indirect obstacles: synthesizability and dopability.35 By confining the chemical space to a well-defined structure with easily verifiable synthesizability, while still being large enough to allow for the discovery of improved candidates, more applicable ML models can be developed. This has been done before with Heusler alloys,36,37 which present a large compositional space, a well defined structure and its stability can be initially assessed with simple valence electron counting;38 however, beyond synthesizability, finding high zT also requires the possibility of doping control. Tuning the right carrier concentration is critical for optimizing zT. Skutterudites are a good example of a material family that fits perfectly well with the characteristics mentioned above. Not only are their structures governed by fundamental electron-counting rules,39 but the use of multiple fillers can: (i) tremendously expand their chemical space and (ii) be used to tailor their carrier concentration. In this work, we have created an ML-based model to predict the thermoelectric figure of merit of skutterudites and search for new high-performance thermoelectric materials.
The database construction began with a collection of experimental data on skutterudites synthesized between 1996 and 2022.43 Data from this collection were further augmented through manual digitalization of relevant literature within that collection, as well as utilizing the Starry Data repository.18 The data are manually curated to ensure the use of actual rather than nominal compositions and to exclude samples with significant segregation or secondary phases, which are typically poorly characterized. This enhances the reliability of the dataset for machine learning model development. In total, the database includes approximately 4000 skutterudites with diverse compositions, carrier concentrations at 300 K (n300) and temperature (T) profiles, along with labels like thermoelectric properties (e.g., zT). For this study, we considered skutterudite composition that could incorporate up to five distinct atoms in the filler position, three cations and three anions, enabling a wide range compositional variability (Fig. S1 in SI). The data were stored and managed using MongoDB,44 a NoSQL database, to ensure scalability and efficient querying.
Atomic descriptors were extracted using RDKit,45 Mendeleev46 and Pymatgen47 libraries, specifically chosen to highlight essential compositional attributes. These descriptors include ionization potential (ip), electron affinity (ea), electronegativity (elecneg), valence electrons (val_e) and atomic mass (mass) of the filler-ratlers (f), anions (a), and cations (c), providing a detailed representation of the chemical and compositional attributes of the materials. Properties such as ionization potential, electron affinity, and electronegativity offer insights into the electronic structure and bonding characteristics that are critical for optimizing carrier transport in thermoelectric materials. Meanwhile, the mass of constituent atoms plays an important role in tuning phonon scattering, thereby modifying lattice thermal conductivity.
In addition to these primary descriptors, derived features were calculated to enhance the representation of compositional variability and its impact on thermoelectric properties. Weighted averages (aver) and standard deviations (dev) of descriptors such as ionization potential, electron affinity and electronegativity were computed using atomic fractions. These measures capture both the central tendency and variability in the compositional properties. Ratios, such as anion-to-cation fractions, were derived to reflect charge carrier balance, while the total number of valence electrons in the system provided additional insight into its electronic structure and potential contribution to thermoelectric properties.
The final feature set comprised 37 descriptors, including intrinsic properties like T and n300, along with the derived features explained above (a detailed list of all descriptor is provided in Table S1 in the SI). To assess potential redundancy and interdependencies among the features, we computed the Pearson correlation matrix across all descriptors and the target variable zT (Fig. S2). This analysis revealed some moderate correlations between chemically related descriptors, e.g., valence electrons and electronegativity. However, no features were removed to preserve the interpretability of the model and allow the network to capture potentially nonlinear interactions between descriptors. For instance, although features such as electronegativity, ionization potential, and electron affinity are correlated, their importance may differ depending on whether the thermoelectric performance of p-type or n-type semiconductors is being evaluated, and they have been simultaneously used in previous studies.34
![]() | (1) |
is the mean residual and xstd is the standard deviation of the residuals.Residuals with E values within the range [−3, 3] were considered acceptable and retained in the filtered development set, as they fall within the expected range of a normal distribution. Residuals outside this interval were classified as outliers and removed from further model development (Fig. 1a). This outlier treatment was performed before splitting the filtered development dataset into training and validation subsets. For the present dataset, the LightGBM-based residual analysis revealed approximately 4005 acceptable data points and 75 outliers. Further inspection of the excluded points suggested potential causes such as repeated counting, incorrect typification or digitalization, and measurement inaccuracies. Removing these entries reduced inconsistencies in the development dataset and improved the overall reliability of the subsequent model training.
Since the outlier detection step relies on a supervised LightGBM regressor trained on the development dataset, it may in principle bias the retained data toward compositions that are more consistent with the overall trends learned by the model. In the present study, however, this effect is limited because only a small fraction of entries is removed and a separate robustness check showed that the identified outliers changed only marginally when the LightGBM residual analysis was repeated with and without prior descriptor standardization. In addition, the excluded points are discarded entirely rather than being relabeled or reused. Furthermore, the independent external test set is not subjected to the LightGBM filtering, so the external benchmark remains unaffected by this outlier selection step.
Feature normalization was performed using the StandardScaler function fitted only on the training set. The same scaling parameters were then applied to transform the validation and external test sets. This procedure ensures that the normalization step is learned exclusively from the training data and avoids potential scaling leakage into the validation or external benchmarks.
In this way, the workflow (see Fig. 2) consists of: (i) generation of the descriptor matrix from the curated skutterudite dataset, (ii) residual analysis using LightGBM to identify and remove outliers from the development dataset, (iii) splitting of the filtered development pool into training and validation sets, (iv) feature normalization using the StandardScaler function fitted on the training set and applying the same transformation to the validation and external test sets and (v) training and evaluation of the model.
Hyperparameter optimization was carried out using a custom script, allowing efficient exploration of the parameter space while maintaining computational feasibility. The optimized hyperparameters included:
• Number of hidden layers: tested from 1 to 20 to determine the optimal model depth for capturing complex relationships.
• Activation function: evaluated tanh, ReLU, eLU and LeakyReLU with varying slopes. LeakyReLU with a slope of 0.05 was selected for its ability to handle negative values while maintaining gradient flow.
• L2 regularization: strengths ranging from 10−5 to 0.75 were tested to balance weight penalization and model complexity. A value of 0.1 was chosen as it mitigated overfitting by discouraging overly complex models, which was particularly beneficial given the moderate dataset size.
• Dropout rate: explored between 0.05 and 0.75, with 0.05 identified as optimal to balance neuron co-adaptation and model complexity.
• Learning rate: values from 10−6 to 5 × 10−2 were examined to explore both fine-tuning and rapid convergence scenarios, with 6 × 10−6 providing stable and efficient convergence.
• Batch size: sizes ranging from 2 to 128 were considered, with 32 balancing computational efficiency and gradient stability.
The final model consisted of 10 hidden layers with 37 neurons per layer and was trained using the AdamW optimizer.51 Model performance was evaluated using the root mean squared error (RMSE) and the coefficient of determination R2. RMSE was used as the primary metric because it directly quantifies the prediction error on the same scale as the target variable, while R2 was used as a complementary measure of how well the variance in the experimental data was captured by the model. The number of training epochs was selected from the convergence behaviour of the training and validation RMSE shown in Fig. 1b. Based on this analysis, 6500 epochs were used for the final reported model. Validation metrics were therefore used only as an internal estimate during model development, whereas generalization was assessed separately using independent test and benchmark sets. To assess the sensitivity of the internal validation metrics to the particular 80/20 split, this procedure was repeated using five different random seeds (7, 13, 21, 42 and 84). Across these runs, the validation RMSE was 0.079 ± 0.012 and the validation R2 was 0.931 ± 0.022, indicating that the model performance is reasonably stable with respect to the specific train/validation partition. To ensure reproducibility, deterministic settings were used during training, including fixed random seed initialization and TensorFlow operation determinism, so that repeated runs under identical conditions produced consistent results.
While these metrics confirm the overall performance of the model, a closer inspection of the training and validation predictions (Fig. 3a and b) reveals a consistent underestimation of high zT values, particularly above 1.2. This deviation likely stems from the absence of descriptors capturing performance enhancements introduced during post-synthesis processing, e.g., high-pressure torsion,61 spark plasma sintering,62 cold sintering process,63 spinodal decomposition64 or secondary phase segregation,65 that are known to significantly boost zT. These factors are often inconsistently reported or not systematically quantified in the literature and were thus excluded from the final descriptor set used for model development. As a result, the model learns only from compositional and structural descriptors, limiting its ability to capture extreme zT enhancements arising from non-compositional factors. However, this conservative bias may prove advantageous for screening applications by minimizing false positives and prioritizing compositions that perform well under standard synthesis conditions.
![]() | ||
| Fig. 3 Model performance on (a) training (blue), (b) validation (red), (c) external test set (orange) and (d) comparison of round robin experimental data (black squares with associated uncertainty bars) with predicted zT values (green triangles) for Co0.97Ni0.03Sb3.22 | ||
To contextualize these results, we benchmarked our deep learning model against a LightGBM regressor trained on the same curated training and validation sets (excluding outliers). The LightGBM model yielded lower training RMSE (0.041) and higher R2 (0.982), but its validation metrics (RMSE = 0.122, R2 = 0.848) showed a greater gap between training and validation performance. This suggests that the model may be overfitting, effectively memorizing patterns rather than learning generalizable trends. In contrast, the deep learning model, though slightly less accurate on the training set, exhibited better validation performance, indicating its enhanced ability to capture nonlinear and complex interactions among the compositional descriptors.
For an individual sample, a positive SHAP means that the corresponding descriptor pushes the predicted zT to a higher value, whereas a negative SHAP value means that it lowers the prediction. The magnitude of the SHAP value indicates how strongly that descriptor influences the model output. In the SHAP summary plots (Fig. 4 and 6), each point represents one sample and is colored according to the descriptor value, from low (blue) to high (red). This allows the trends to be interpreted directly: if high descriptor values are mainly associated with positive SHAP values, increasing that descriptor tends to increased the predicted zT; conversely, if high descriptor values are mainly associated with negative SHAP values, increasing that descriptor tends to reduce the predicted zT. SHAP therefore provides a clear way to connect physically motivated descriptors with the model predictions, while still describing learned associations rather than strict causality.
![]() | ||
| Fig. 4 SHAP summary plot for validation and external test set, highlighting the most impactful features and how their values influence predicted zT. | ||
As shown in Fig. 4, the top-ranked features span thermal, electronic and compositional domains. Temperature, T, and semiconductor type, p_n, emerge as key drivers, consistent with their known influence on charge carrier behavior and transport mechanisms. The SHAP color gradient for T indicates that higher temperatures (in red) are systematically associated with increased SHAP values, meaning that the model has learned to predict higher zT at elevated temperatures.78 This aligns with previous reports classifying skutterudites as mid- to high-temperature thermoelectric materials, which perform optimally in the 700–800 K range. The semiconductor type feature, p_n, is binary (0 for p-type, 1 for n-type) and the SHAP distribution shows that n-type samples (in red) consistently yield positive SHAP values. This suggests the model has captured the tendency of n-type skutterudites to achieve higher zT, consistent with their favorable conduction band filling and ability to exploit secondary pockets.79 This trend is further explored in a focused analysis of double-filled skutterudites presented later.
Some features linked to anions exhibits a broad SHAP distribution, specially those with high values contributing positively to zT. Notably, the standard deviation of the ionization potential and the standard deviation of the electron affinity are ranked among most important features. This suggests that local fluctuations in anionic site potential, likely tied to electronegativity variation, modulate both the electronic structure and scattering processes. Thus, chemical inhomogeneity in anion environments can affect carrier mobility and lattice thermal conductivity in filled skutterudites.
Substituting Sb with atoms of different ionization potentials and electron affinities modifies the typical four-member anion rings in the skutterudite framework, potentially impacting both the electronic structure and phonon transport. The antibonding states of these rings are the responsible of the secondary pocket that increases the power factor in skutterudites.80 A distortion of these bonds can modify the energy separation between bonding and antibonding states, thereby changing the energy of the secondary pocket.
As proof of concept, the electronic structure of CoSb3 doped with Ge and Se was analyzed to understand the potential influence of doping on zT. First, a 2 × 2 × 2 supercell was modeled, including Ge and Se atoms at opposite corners of one anionic four-membered ring. The unfolded band structure and Ge and Se projections (Fig. 5a) were then calculated for direct comparison with pristine CoSb3 (Fig. S3). In addition to a small reduction of the band gap, both the main and secondary pockets of the conduction band were not well-defined, suggesting that doping with Ge and Se breaks the degeneracy of these states. If Ge and Se states are projected and their intensity is magnified, the splitting of states becomes clearer. Additionally, Se states at H are almost at the same energy as the main pocket. To confirm this trend, the same doping was performed in a smaller cell, increasing the Ge and Se content (Fig. 5b). The effects mentioned previously are reinforced here: the band gap is drastically reduced, and the splitting of the states of both pockets is enhanced. A third pocket, almost at the same energy as the main pocket centered at Γ, is observed at H with a strong contribution from Ge and Se. Based on these results, it can be highlighted that appropriate doping of the anionic sites can not only modify the energy of the secondary pocket but also lead to a band convergence phenomenon that should enhance the power factor. Very recently, Wang et al. reported what they term “electrical compensation-bonding modulation” method, in which Sb4 rings are reconfigured through doping with Te, Ge, and even Se.81 This strategy not only modulates the bond hierarchy, improving transport properties and opening the door to the use of electronegative fillers, but also achieves zT values close to 2. Our model reproduces these results with high accuracy up to zT around 1.2, beyond which it slightly underestimates the values (Fig. S4). These discrepancies can be related to the absence of a significant amount of data in the training set with zT values above 1.2; however, they demonstrate that the model can predict with good accuracy new strategies based on previously undescribed physico-chemical phenomena.
![]() | ||
| Fig. 5 Unfolded band structures and Ge and Se projections for (a) CoSb2.97Ge0.015Se0.015 and (b) CoSb2.75Ge0.125Se0.125. | ||
Overall, this SHAP analysis not only validates the importance of expected thermoelectric descriptors such as T and semiconductor type but also reveals nuanced trends in compositional and electronic variability. These observations can not only complement previous experimental discoveries—providing a better interpretation of the roles of descriptors (e.g., ionization potential, filler mass, and anion electronegativity)—but also anticipate and discover new trends when combined with high-throughput screening searches led by the ML model.
Among the top 100 compounds (Table 1), experimental reports exist for some compositions. For instance, Salvador et al. reported Ca,Yb double-filled skutterudites.82 Although they did not synthesize a sample with the precise stoichiometry Yb0.1Ca0.2Co4Sb12, they characterized a close composition by electron probe microanalysis as Yb0.12Ca0.15Co4Sb12.09, with a zT value of 0.75 at 700 K.82 Using this experimental stoichiometry, the model slightly overestimates the predicted zT, possibly due to secondary phases such as Yb2O3 and CaO reported in the samples (Fig. S5). As expected, better agreement is found when experimental samples do not contain secondary phases. For instance, the low Yb content in Y0.05Ybx–CoSb3 minimizes the appearance of nanoprecipitates or phase segregation,83 obtaining better agreement between our predictions—adapted to the actual compositions—and the experimental values (Fig. S5). Other experimental reports supporting the double-filler skutterudite screening include Ca,Ba–CoSb3,84 Ca,Ce–CoSb3,85 Ba,Sr–CoSb3,86 and Yb,Ce–CoSb3.87 Most importantly, beyond its predictive accuracy, the model demonstrates that it can be used to efficiently explore large chemical spaces to identify high-zT candidates.
| Composition | zT | Composition | zT | Composition | zT | Composition | zT |
|---|---|---|---|---|---|---|---|
| Gd0.1Y0.2Co4Sb12 | 1.16 | Sr0.1Tb0.2Co4Sb12 | 1.06 | Sm0.1Ca0.2Co4Sb12 | 1.04 | Ba0.1Y0.2Co4Sb12 | 1.01 |
| Sr0.1Li0.2Co4Sb12 | 1.15 | Mg0.1Y0.2Co4Sb12 | 1.06 | Ba0.1Ca0.2Co4Sb12 | 1.04 | Eu0.1Gd0.2Co4Sb12 | 1.00 |
| Y0.1Dy0.2Co4Sb12 | 1.15 | Ca0.1Al0.2Co4Sb12 | 1.06 | Sr0.1Ce0.2Co4Sb12 | 1.04 | Tl0.1Eu0.2Co4Sb12 | 1.00 |
| Eu0.1Y0.2Co4Sb12 | 1.14 | Tb0.1Li0.2Co4Sb12 | 1.06 | Ba0.1K0.2Co4Sb12 | 1.04 | Tl0.1Tb0.2Co4Sb12 | 1.00 |
| Gd0.1Yb0.2Co4Sb12 | 1.13 | Sr0.1Sm0.2Co4Sb12 | 1.06 | Ta0.1Sr0.2Co4Sb12 | 1.04 | Sn0.1Yb0.2Co4Sb12 | 1.00 |
| Li0.1Ca0.2Co4Sb12 | 1.12 | Sm0.1Yb0.2Co4Sb12 | 1.06 | Pr0.1Y0.2Co4Sb12 | 1.04 | Mg0.1Yb0.2Co4Sb12 | 0.99 |
| Sm0.1Y0.2Co4Sb12 | 1.12 | Eu0.1Dy0.2Co4Sb12 | 1.06 | Y0.1Li0.2Co4Sb12 | 1.03 | Tb0.1Gd0.2Co4Sb12 | 0.99 |
| Ce0.1Yb0.2Co4Sb12 | 1.11 | Eu0.1Ce0.2Co4Sb12 | 1.06 | Al0.1Tl0.2Co4Sb12 | 1.03 | Yb0.1Ca0.2Co4Sb12 | 0.98 |
| Tb0.1Y0.2Co4Sb12 | 1.11 | Sn0.1Gd0.2Co4Sb12 | 1.06 | Ca0.1Tb0.2Co4Sb12 | 1.03 | Al0.1Eu0.2Co4Sb12 | 0.98 |
| Ce0.1Y0.2Co4Sb12 | 1.10 | Eu0.1Yb0.2Co4Sb12 | 1.06 | Nd0.1Dy0.2Co4Sb12 | 1.03 | Al0.1Sm0.2Co4Sb12 | 0.97 |
| Y0.1Yb0.2Co4Sb12 | 1.10 | Pr0.1Ce0.2Co4Sb12 | 1.06 | Tl0.1Gd0.2Co4Sb12 | 1.03 | Tl0.1Pr0.2Co4Sb12 | 0.96 |
| Tb0.1Yb0.2Co4Sb12 | 1.10 | Gd0.1Ce0.2Co4Sb12 | 1.05 | Yb0.1Li0.2Co4Sb12 | 1.03 | Al0.1Nd0.2Co4Sb12 | 0.96 |
| Sn0.1Y0.2Co4Sb12 | 1.09 | Mg0.1Tb0.2Co4Sb12 | 1.05 | Mg0.1Dy0.2Co4Sb12 | 1.02 | Al0.1Pr0.2Co4Sb12 | 0.96 |
| Tb0.1Ce0.2Co4Sb12 | 1.08 | Pr0.1Ca0.2Co4Sb12 | 1.05 | Sr0.1Nd0.2Co4Sb12 | 1.02 | K0.1Yb0.2Co4Sb12 | 0.96 |
| Dy0.1Yb0.2Co4Sb12 | 1.08 | Sr0.1Gd0.2Co4Sb12 | 1.05 | Pr0.1Dy0.2Co4Sb12 | 1.02 | Al0.1Yb0.2Co4Sb12 | 0.94 |
| Ce0.1Dy0.2Co4Sb12 | 1.08 | Sr0.1Eu0.2Co4Sb12 | 1.05 | Ta0.1Ca0.2Co4Sb12 | 1.02 | Mg0.1Sn0.2Co4Sb12 | 0.91 |
| Sr0.1Yb0.2Co4Sb12 | 1.08 | Nd0.1Y0.2Co4Sb12 | 1.05 | Ca0.1Dy0.2Co4Sb12 | 1.02 | Yb0.1Ta0.2Co4Sb12 | 0.90 |
| Gd0.1Dy0.2Co4Sb12 | 1.07 | Sn0.1Tb0.2Co4Sb12 | 1.04 | Mg0.1Gd0.2Co4Sb12 | 1.02 | Tl0.1Ba0.2Co4Sb12 | 0.90 |
| Ba0.1Li0.2Co4Sb12 | 1.07 | Sr0.1Dy0.2Co4Sb12 | 1.04 | Ca0.1Eu0.2Co4Sb12 | 1.02 | Sr0.1Ba0.2Co4Sb12 | 0.90 |
| Tl0.1Yb0.2Co4Sb12 | 1.07 | Nd0.1Ca0.2Co4Sb12 | 1.04 | Pr0.1Sr0.2Co4Sb12 | 1.01 | Ce0.1Sn0.2Co4Sb12 | 0.90 |
| Ce0.1Ca0.2Co4Sb12 | 1.07 | Sm0.1Dy0.2Co4Sb12 | 1.04 | K0.1Sr0.2Co4Sb12 | 1.01 | Pr0.1Mg0.2Co4Sb12 | 0.90 |
| Y0.1Ca0.2Co4Sb12 | 1.07 | Nd0.1Yb0.2Co4Sb12 | 1.04 | Ce0.1Li0.2Co4Sb12 | 1.01 | Ba0.1Al0.2Co4Sb12 | 0.90 |
| Sm0.1Ce0.2Co4Sb12 | 1.07 | Pr0.1Yb0.2Co4Sb12 | 1.04 | Li0.1Dy0.2Co4Sb12 | 1.01 | Gd0.1Pr0.2Co4Sb12 | 0.90 |
| Sn0.1Dy0.2Co4Sb12 | 1.07 | Tb0.1Dy0.2Co4Sb12 | 1.04 | Sn0.1Ta0.2Co4Sb12 | 1.01 | Mg0.1Ta0.2Co4Sb12 | 0.89 |
| Nd0.1Ce0.2Co4Sb12 | 1.06 | Sn0.1Ca0.2Co4Sb12 | 1.04 | Sn0.1Li0.2Co4Sb12 | 1.01 | Sr0.1Tl0.2Co4Sb12 | 0.89 |
Previous SHAP analysis was performed using a large data set in which all features were considered and modified, which can make it more difficult to extract design principles based on filler optimization. That is why a SHAP analysis has been performed on the double-filled skutterudite list (Fig. 6). The analysis, performed separately for p- and n-type samples to reveal potential differences in performance optimization, used a background sample size of 1000 to ensure a balance between computational efficiency and attribution stability. While in the previous SHAP analysis, the main features were well-known variables such as T, semiconductor type, or the amount of fillers, here all these variables have been kept fixed, so the analysis focused on filler features. First, feature importance varies depending on the type of skutterudite. For p-type skutterudites, the variables with the greatest impact on zT are the standard deviation of the fillers' masses, the standard deviation of the fillers' ionization potentials, and their average electronegativity. The ionization potential of the fillers influences the position of the filler states in the valence band, thereby modifying the band topology and carrier concentration. Moreover, the presence of rattlers with different masses should reduce the lattice thermal conductivity. Meanwhile, for n-type skutterudites, the three most important features—in addition to the standard deviation of the fillers' masses and ionization potentials—are the standard deviations of the fillers' electron affinities. Filler-related features such as electron affinity and electronegativity define the position of the filler states in the conduction band. Additionally, large differences in electron affinity are linked to fillers with different sizes and masses, which can produce more efficient phonon scattering. In both types of semiconductors, the standard deviation of the fillers' masses ranks among the top three features. Following this analysis, we investigated the influence of fillers on thermal conductivity by exploring the phonon density of states of filled skutterudites containing some of the most frequent candidate fillers from Table 1. In each case, the projected phonon density of the rattling atoms was calculated to evaluate the potential for dual-frequency resonant phonon scattering (Fig. 7 and S6).88 On one hand, heavy atoms such as Yb or Dy present their phonon density of states at very low frequencies, increasing scattering rates of acoustic modes and reducing their group velocities.89 These rattlers are key to reduce the large contribution of the acoustic modes to the thermal conductivity of these materials, as can be observed in the cumulative thermal conductivity of CoSb3 (Fig. 7). On the other hand, lighter atoms such as Mg and Ca present their vibrational pDOS in the 2–6 THz range enhancing scattering phenomena of low energy optic modes. The contribution of low energy optical modes is not as important as acoustic modes but represents around 15–20% of the total κl. Top candidates of the list combine both, heavy and light atoms, thereby promoting broadband phonon scattering essential for significant thermal conductivity reduction. This phenomenon can also be observed in some of the synthesized n-type samples with the highest reported zT, where 3 or 4 rattlers (In, Sr, Ba, Yb) are combined.69,90 Our DFT calculations corroborate that most of these rattlers present resonant frequencies in different areas of the spectra (Fig. 7 and S6), maximizing the number of scattering processes, while the ML model is in good agreement with experimental results (Fig. S7). These results highlight how the model provides critical insights into the design principles for optimizing the thermoelectric performance of skutterudites through strategic filler selection.
To address data scarcity, a combination of digital repositories and manual digitalization of works comprising diverse compositions and carrier concentrations led to a dataset with more than 4000 entries. Data curation constitutes the cornerstone of the methodology regarding reliability. Each experimental entry was analyzed, with particular attention paid to accurately representing actual chemical compositions, rather than nominal ones, and carrier concentrations. To further ensure data integrity and prevent noise from propagating into the model, feature normalization and statistical outlier detection techniques were implemented. We leveraged LightGBM for residual analysis and applied the 3σ method to identify and exclude outliers, thereby minimizing inconsistencies and improving the overall reliability of the development dataset. The inherent physical complexity of thermoelectric phenomena, stemming from the strong interdependence between electronic and thermal transport properties, is addressed by using meaningful features. These features are directly related to thermal and electrical transport properties, including elemental properties and derived features such as weighted averages and standard deviations of ionization potential, electron affinity, and electronegativity. Furthermore, the application of SHAP analysis is critical for verifying that the model not only performs predictions but also implicitly acquires and reflects the fundamental physical laws governing thermoelectricity, such as the influence of temperature and semiconductor type on zT. The applicability of the model is ensured by two key pillars. First, the selection of skutterudites, with stability determined by simple, well-defined rules and carrier concentration levels tunable through the nature and concentration of their rattlers, ensures the model's capability to discover new materials. Second, the exclusive use of simple compositional features and elemental properties ensures that the process of exploring new candidates is computationally fast and resource-efficient.
To the best of our knowledge, our manually curated dataset represents the largest collection of skutterudites for thermoelectric applications, which underpins the strong potential of our ML model. We have demonstrated the model's high accuracy, achieving an RMSE of 0.068 and an R2 of 0.947 during validation. These metrics are comparable to, or even surpass, those obtained by other ML models focused on different families of thermoelectric materials. Most importantly, the errors associated with the model align well with the inherent experimental uncertainties typically observed in thermoelectric measurements, which can vary significantly across different laboratories and characterization protocols. In part, the model's accuracy stems from the data curation, which involved: (i) minimizing the presence of poorly characterized samples or those reported with secondary phases, (ii) including large variability in temperature, carrier concentration, and composition across the entire periodic table, and (iii) selecting samples with low, medium, and high zT values to ensure the model performs well over the widest possible range.
Beyond its accuracy, it has been shown that the model has captured a deep understanding of the physical and chemical phenomena controlling thermoelectric performance. For instance, the model identifies how anion substitutions, such as doping with Ge and Se, can drastically modify the topology of the conduction band, leading to band convergence phenomena that enhance the power factor and consequently zT. Similarly, regarding rattlers, the model discerns the potential benefits of combining rattlers with differing resonant frequencies to enhance phonon scattering processes and efficiently reduce thermal conductivity. Finally, this ML model proves to be a powerful tool for exploring vast chemical spaces and efficiently identifying new thermoelectric materials with high zT, while requiring affordable computational resources. We have demonstrated that the model can identify compositions that have recently been experimentally reported as promising. To further contribute to the scientific community, we are currently developing an online API, which will facilitate researchers to use our model for identifying promising candidates for their experimental synthesis.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d5ta08841k.
| This journal is © The Royal Society of Chemistry 2026 |