Open Access Article
Anh D. Phan
*ab,
Vu Bich Hanhc,
Ngo T. Quea,
Nguyen T. T. Duyenc,
Do T. Ngad and
Baicheng Mei
e
aCenter for Materials Innovation and Technology, VinUniversity, Hanoi 100000, Vietnam. E-mail: anh.pd@vinuni.edu.vn; adphan35@gmail.com
bCollege of Engineering and Computer Science, VinUniversity, Hanoi 100000, Vietnam
cFaculty of Materials Science and Engineering, Phenikaa University, Hanoi 12116, Vietnam
dInstitute of Physics, Vietnam Academy of Science and Technology, 10 Dao Tan, Giang Vo, Hanoi 100000, Vietnam
eSchool of Materials Science and Engineering, Beijing Institute of Technology, Beijing 100081, China
First published on 27th May 2026
This work develops a data-driven framework for predicting the thermal conductivity of metals and multi-component alloys and for inversely proposing compositions that meet a target conductivity. We collect, to our knowledge, the largest experimental dataset containing 6259 data points spanning 49 elements and temperatures from 0 to 1400 K. Using alloy composition and temperature as inputs, we train and benchmark several regression models and obtain high predictive accuracy reaching R2 > 0.99 and RMSE of 6–9 W m−1 K−1. The approach remains quantitatively reliable for challenging cases including dilute-doped Mg alloys and commercial steel over broad temperature ranges. Based on the trained forward model, we propose an inverse-design workflow to efficiently search composition space and suggest candidate alloys expected to achieve a specified thermal-conductivity target at a given temperature. The inverse search can identify composition windows where near-target conductivity is maintained over a finite concentration range to improve the practical ability for experimental validation and scalable process.
Experimental methods for measuring thermal conductivity can be typically classified into steady-state and transient categories depending on whether the system reaches thermal equilibrium during the analysis. Steady-state methods, such as the guarded hot plate and heat-flow meter,7,8 impose a constant heat flux and measure the temperature gradient. They are widely used for low-conductivity materials and plate-like samples including polymers, thermal insulators, foams, aerogels, and porous ceramics. However, their accuracy is often limited by heat losses, particularly convection and radiation. Thus, these methods are less suitable for high-conductivity materials or thin samples. In contrast, transient methods determine the thermal conductivity from the time-dependent temperature response rather than waiting for steady-state equilibrium. Typical techniques include the hot strip,9 hot wire,10 transient plane source,11 and laser flash12 methods. They are generally faster and applicable to a wide range of materials, but their accuracy depends on the assumed heat-transfer model, boundary conditions, and sample homogeneity.
To complement experiments, the thermal conductivity is also investigated using atomistic simulations, most commonly density functional theory (DFT) and molecular dynamics (MD). DFT-based approaches predict the lattice thermal conductivity by combining second- and third-order interatomic force constants with the phonon Boltzmann transport equation,13–16 or by using ab initio MD with the Green–Kubo formalism.17 Although these approaches are physically rigorous, their computational cost limits calculations to relatively small system sizes. This is especially problematic for low-doping materials, where representing dilute impurities without artificial impurity–impurity interactions requires very large supercells. In addition, it is difficult for DFT-based approaches to describe the thermal behaviors in complex, multicomponent, or chemically disordered materials where large-scale structural configurations are essential. Consequently, predicting thermal conductivity in commercial products such as steels and high-entropy alloys, where many alloying elements and microstructural features coexist, remains challenging for simulations. Simulations may require advanced treatments to capture strong anharmonicity at high temperatures. In addition, DFT phonon transport focuses on the lattice term and does not directly account for the electronic thermal conductivity, which is important in metals and many alloys. MD simulations can describe finite-temperature dynamics but its accuracy depends on the quality of the interatomic potential. Moreover, MD simulations do not include quantum statistics of lattice vibrations, which can reduce accuracy at low temperatures.18–20 These limitations motivate alternative approaches that can be both accurate and computationally efficient.
Motivated by the cost and practical constraints of experiments and atomistic simulations, machine learning has emerged as an efficient approach for predicting thermal properties. By learning from available datasets, machine-learning models can rapidly estimate the thermal conductivity and allow the high-throughput screening of composition space. However, most ML studies based on experimental alloy data21,22,25–28 have been trained on relatively small datasets (typically a few hundred up to ∼1200 samples) and focus on a single alloy family such as Al-based21 or Mg-based22 systems. Although these models can achieve good accuracy within their training domain, their applications to other alloy chemistries and to different temperature ranges remains uncertain. Other studies rely primarily on MD-based23 and larger-scale computational datasets.24,29 Such predictions can inherit biases from the underlying computational assumptions and quantitatively differ from experimental data. Several approaches require complex and expert-designed descriptors.22,25–28 A recent study26 reported promising predictions of temperature-dependent thermal conductivity for additively manufactured metallic alloys. However, its scope is largely limited to a small number of alloy families within a specific processing domain. Moreover, publicly available manufacturer datasheets often provide reference-grade properties because key processing details and product-specific specifications are not fully disclosed. Using such data can introduce variability and bias in model training. Consequently, the applicability of such models and data to broader alloy spaces and wider temperature ranges remains limited.
The above gaps raise several key questions. (1) Can a single ML model trained on experimental data reliably predict thermal conductivity across diverse alloy chemistries covering both low- and high-conductivity regimes over a broad temperature range? (2) How well does such a model generalize to practical materials including dilute-doped systems and commercial materials with complex compositions, where low impurity levels may still produce measurable changes in thermal transport? Can the influence of low impurity concentrations on thermal conductivity be predicted? (3) Can the chemical composition associated with the measurement temperature provide a minimal and transparent input representation with near-experimental accuracy, without relying on expert-engineered descriptors? (4) Can the forward model be exploited as a reliable surrogate for the inverse design, not only suggesting compositions that obtain a target thermal conductivity at a given temperature, but also identifying composition-tolerant “windows” where the target property is maintained under realistic deviations in alloy fractions? Answering these questions calls for a large and diverse dataset with broad temperature coverage, which not only improves predictive reliability but also provides interpretable insights into the key drivers of thermal transport and supports inverse-design approaches that remain practical for synthesis and scale-up.
In this study, we address the above questions by constructing, to our knowledge, the largest experimental dataset currently available for machine-learning prediction of thermal conductivity in metals and alloys. The dataset spans many alloy chemistries at different temperatures and covers a much broader range of thermal conductivity values than prior studies. Using this dataset, we train ML models with chemical composition alone as the input, which reduces model complexity while maintaining high predictive accuracy. Beyond forward prediction, we develop an inverse-design workflow to identify candidate alloy compositions to obtain a target thermal conductivity. Finally, we validate the approach by comparing model predictions with experimental data.
![]() | ||
| Fig. 1 (Color online) The machine learning workflow for forward prediction and inverse design prediction. | ||
![]() | ||
| Fig. 2 (Color online) Distribution of (a) measurement temperature and (b) thermal conductivity values in the dataset. | ||
To improve consistency across sources, we standardized all units and temperature scales, and each alloy composition was converted into a 49-dimensional vector of elemental atomic percentages normalized to sum to 100 $\%$. Entries with missing or ambiguous compositions, inconsistent units, or nonphysical values were removed during manual curation. We note that cross-source differences in processing history, sample form, and measurement procedure are not always reported in sufficient detail to be fully homogenized. Thus, any remaining variability is treated as unavoidable experimental noise and its impact is assessed through validation/test splits and external benchmarks. A representative summary of the dataset is provided in Table 1.
| Alloy | TC (W m−1 K−1) | Temperature (K) | Al | Ag | Fe | Si | Bi | Sn |
|---|---|---|---|---|---|---|---|---|
| Bi95Ag5 | 9.61 | 323 | 0 | 5 | 0 | 0 | 95 | 0 |
| Ag92.01Si7.99 | 239.62 | 732 | 0 | 92.01 | 0 | 7.99 | 0 | 0 |
| Al94.7Si5Fe0.3 | 165.5 | 298.15 | 94.7 | 0 | 0.3 | 5 | 0 | 0 |
| Fe94.91Al5.09 | 30.8 | 564.17 | 5.09 | 0 | 94.91 | 0 | 0 | 0 |
| Ag44.3Bi42.9Sn12.8 | 13.92 | 373 | 0 | 44.3 | 0 | 0 | 42.9 | 12.8 |
| Size of data | Type of data | Range of data (W m−1 K−1) | DL/ML model | R2 | RMSE (W m−1 K−1) | Reference |
|---|---|---|---|---|---|---|
| 271 | Experiment | [84, 243] | Gradient boosting | 88 | 12.03 | 21 |
| CatBoost | 88 | 12.21 | ||||
| XGBoost | 91 | 10.58 | ||||
| Stacking ensemble algorithm | 83 | 14.54 | ||||
| KNN | 50 | 19.6 | ||||
| Linear regression | 61 | 21.98 | ||||
| Decision tree | 83 | 14.41 | ||||
| AdaBoost algorithm | 80 | 15.73 | ||||
| Random forest | 84 | 14.18 | ||||
| 1139 | Experiment | [8.1, 167.0] | XGBoost | 97.0 | — | 22 |
| 120 | MD simulation | ∼2, 5 | SVR | 91.0 | 1.128 | 23 |
79 200 |
Simulation | ∼ [200, 700] | Linear regression | — | 101 | 24 |
| Ridge regression | — | 101 | ||||
| LASSO regression | — | 101 | ||||
| Support vector regression | — | 36 | ||||
| Feed-forward neural networks | — | 7 | ||||
| CNN | — | 7 | ||||
| 279 | Experiment | [0.24, 35] | XGBoost | 79.0 | 1.98 | 25 |
| SVM | 82.0 | 1.80 | ||||
| KNN | 81.0 | 1.87 | ||||
| Kernel ridge | 81.0 | 1.87 | ||||
| Gaussian process | 81.0 | 1.86 | ||||
| 294 | Experiment | [8.8, 343] | Random forest | 92.28 | ∼2.6 | 26 |
| Gradient boosting | 90.86 | ∼2.5 | ||||
| XGBoost | 96.18 | 1.63 | ||||
| Kernel ridge | 69.53 | ∼4.6 | ||||
| Lasso | 70.23 | ∼4.6 | ||||
| 350 | Experiment | — | LSTM | 88.66 | 8.36 | 27 |
| Linear regression | 80.96 | 26.49 | ||||
| Kernel ridge | 81.03 | 26.37 | ||||
| Stochastic gradient descent | 82.41 | 17.49 | ||||
| Linear SVR | 84.79 | 11.98 | ||||
| Sigmoid SVR | 94.32 | 20.61 | ||||
| Rbf SVR | 75.47 | 14.62 | ||||
| Poly SVR | 74.96 | 12.02 | ||||
| Decision tree | 53.48 | 19.37 | ||||
| Gradient boosting decision trees | 81.58 | 10.36 | ||||
| Random forest | 87.67 | 9.63 | ||||
| LightGBM | 73.98 | 12.99 | ||||
| ANN | 85.93 | 8.72 | ||||
| RNN | 87.48 | 8.37 | ||||
| CNN | 87.99 | 8.406 | ||||
| Random forest | 87.67 | 9.64 | ||||
| 756 | Experiment | [10.9, 83.8] | Bayesian neural network | — | 3.9 | 28 |
| 5412 | DFT simulation | ∼ [0, 115] | Gradient boosting | 76.60 | 7.63 | 29 |
| 6259 | Experiment | [0.18, 480] | XGBoost | 99.07 | 9.12 | This work |
After hyperparameter optimization, the best configuration of each algorithm is evaluated on fixed training, validation, and test sets using a 80
:
20 split. Predictive performance is quantified using the coefficient of determination (R2) and the root mean square error (RMSE), defined as
![]() | (1) |
![]() | (2) |
Fig. 3 shows the parity plot between the true and the predicted thermal conductivity on the training and testing dataset. On the training set, the tree-based models reproduce experimental data very closely with most predictions concentrated near the parity line. Extra Trees shows the closest agreement with the training data, and XGBoost, CatBoost, and Random Forest also reproduce the training data well. On the test set, these models retain high accuracy. Extra Trees obtains the best performance with R2 = 99.61% and RMSE = 5.68 W m−1 K−1, while CatBoost and XGBoost reach R2 = 99.12%, RMSE = 8.83 W m−1 K−1, and R2 = 98.97%, RMSE = 9.58 W m−1 K−1, respectively. Gradient Boosting and Random Forest also perform well, whereas Decision Tree exhibits the largest error. These results indicate that alloy composition and measurement temperature contain sufficient information for accurate data-driven prediction of the thermal conductivity. We also evaluated an expanded feature set using Matminer-derived composition descriptors following ref. 29. In this scheme, descriptors are generated directly from the chemical formula by combining elemental properties,29,31 including electronegativity, covalent radius, valence electron counts, and periodic-table attributes, into statistics over the constituent elements. This procedure results in 181 input features in total. However, adding Matminer composition descriptors does not improve test-set accuracy compared with the composition–temperature input (see Table S3).
To further validate the Extra Trees model on an external benchmark, we compare its predictions with independent experimental data taken from ref. 32 and 33. As shown in Table 3, the predicted thermal conductivities of Mg98.82551Zn1.17147Si0.00106Ca0.00130Fe0.00067 agree closely with experimental values32 over 348–498 K with deviations of only ∼1–2%. For Mg99.67488Al0.32440Si0.00017Ca0.00024Fe0.00030, the model captures the weak temperature dependence of the thermal conductivity in experiment but overestimates the magnitude by about 8–10 W m−1 K−1 over the same range. We test two additional Mg–Al–Zn alloys, Mg96.9Al2.7Zn0.4 and Mg94.1Al5.5Zn0.4, using different experimental data from ref. 33. The comparisons between our predictions and experimental data are shown in Fig. 4. In all Mg-based cases, the model consistently captures the correct temperature trends and the remaining discrepancies are moderate. These findings suggest that composition and temperature alone can provide reliable screening-level predictions even for dilute alloying additions, while some chemistries may benefit from additional training data or refined descriptors for fully quantitative agreement.
| Temperature (K) | Mg98.82551Zn1.17147Si0.00106Ca0.00130Fe0.00067 | Mg99.67488Al0.32440Si0.00017Ca0.00024Fe0.00030 | ||
|---|---|---|---|---|
| Experiment (W m−1 K−1) | Prediction (W m−1 K−1) | Experiment (W m−1 K−1) | Prediction (W m−1 K−1) | |
| 348 | 123.47 | 133.63 | 130.10 | 130.94 |
| 398 | 122.05 | 134.51 | 130.60 | 131.41 |
| 448 | 125.24 | 133.35 | 128.30 | 129.97 |
| 498 | 125.35 | 134.21 | 129.03 | 130.80 |
![]() | ||
| Fig. 4 (Color online) Experimental thermal conductivity data (open data points) and our Extra-Trees predictions (solid curves) as a function of temperature for M g96.9Al2.7Zn0.4 and M g94.1Al5.5Zn0.4. Experimental data are taken from ref. 33. | ||
Although we standardize units and compositions across sources, residual variability due to differences in sample form, processing history, and measurement protocols cannot be fully removed because such metadata are not consistently reported. Therefore, part of the prediction error reflects unavoidable experimental noise. The model performance is expected to be strongest in composition–temperature regimes that are well represented in the dataset. It may degrade for sparsely sampled alloy classes such as highly multicomponent high-entropy alloys. Expanding experimental coverage and incorporating relevant metadata when available are expected to further reduce uncertainty and improve quantitative accuracy.
We now evaluate the Extra-Trees model on a commercial steel with complex and multicomponent chemistry. Fig. 5a compares our predictions with experimental thermal-conductivity data for a plain carbon steel containing minor Mn–Si–P–S additions. Our results show the high conductivity at low temperature, the strong reduction at intermediate temperatures, and the low-conductivity plateau at high temperature. A good quantitative agreement between ML predictions and experiment is observed. The agreement is reliable at low and intermediate temperatures, where the dataset is densest.
![]() | ||
| Fig. 5 (Color online) Experimental data (symbols) and our predicted thermal conductivity (solid curves) of (a) plain carbon steel (type 316)35 and (b) NiCo and NiCoFe alloys.36 | ||
In Fig. 5b, we apply the model to mid-entropy alloys Ni50Co50 and Ni33.33Co33.33Fe33.34. The predicted thermal conductivities agree well with the experimental data and reproduce the monotonic decrease with increasing temperature for both alloys. The model also captures the large separation between the higher-conductivity binary NiCo alloy and the lower-conductivity ternary NiCoFe alloy over a wide temperature range. By contrast, when we tested a high-entropy alloy with more equiatomic components, the model overestimates the thermal conductivity. This is likely due to the limited number of multicomponent materials in the training dataset. Improving prediction accuracy for high-entropy alloys therefore requires more experimental data and remains an important direction for future work.
We assess feature importance using the mean absolute SHAP value, where SHAP (SHapley Additive exPlanations) assigns each input feature a contribution to the model prediction.34 As shown in Fig. 6, temperature has the largest impact on the predicted thermal conductivity. This is expected because heat transport in metals and alloys is highly temperature dependent. Increasing temperature enhances phonon scattering and reduces the lattice contribution. While the electronic contribution also changes with temperature because electron–phonon scattering increases and electrical resistivity varies accordingly.
Feature importance analysis reveals that Cu, Ag, Al, and Au have the largest influence on the predicted thermal conductivity. Additionally, Ni, Pd, and Fe are found to contribute significantly to the output. This suggests that fluctuations in concentrations of these specific elements drive the most substantial variations in the thermal conductivity within our dataset. The trend is physically reasonable because Cu, Ag, Al, and Au are inherently high-thermal-conductivity metals. So increasing their content typically elevates the electronic contribution to heat transport and shifts an alloy toward the high-conductivity regime. In contrast, Ni, Pd, and Fe are transition-metal constituents with partially filled d-bands that can strongly affect carrier scattering and electronic structure in alloys. Changes in their concentrations can modify the density of states near the Fermi level and enhance alloy-disorder scattering. These reduce electronic thermal transport and help distinguish lower-conductivity compositions. These mechanisms explain why Cu/Ag/Al/Au and Ni/Pd/Fe emerge as key contributors in the SHAP analysis and why the model can capture transitions between low- and high-conductivity regimes.
A practical question for synthesis is whether an inverse-designed candidate remains close to the target thermal conductivity when its composition deviates slightly from the original recipe. To evaluate this sensitivity, we vary the atomic fraction of one element in each binary candidate and use the trained model to predict the thermal conductivity across the full composition range as shown in Fig. 7. AuSi and WAg alloys exhibit broad plateau-like regions in Fig. 7a and b, where the predicted conductivity is approximately 80 W m−1 K−1 over a relatively wide interval of Au or W fraction. Such plateaus are interesting for fabrication because modest composition errors are less likely to move the property away from the target. In contrast, AgPt reaches 80 W m−1 K−1 only within a narrow composition window as shown in Fig. 7c and this reveals that a precious control of the Ag fraction is needed. Meanwhile, MgBi shows an almost monotonic dependence of the thermal conductivity on Mg fraction. A small composition shifts lead to noticeable property changes and, thus, this alloy is a less favorable option for producing an 80 W m−1 K−1 alloy without high compositional precision. Overall, the inverse-design workflow identifies not only target-matching compositions but also composition-tolerant windows that are better aligned with practical synthesis and scale-up.
| This journal is © The Royal Society of Chemistry 2026 |