Open Access Article
Prakash Thakolkaran†
a,
Yiwen Zheng†
b,
Yaqi Guoa,
Aniruddh Vashisth
*b and
Siddhant Kumar
*a
aDepartment of Materials Science and Engineering, Delft University of Technology, 2628 CD Delft, The Netherlands. E-mail: Sid.Kumar@tudelft.nl
bDepartment of Mechanical Engineering, University of Washington, Seattle, WA, USA. E-mail: Vashisth@uw.edu
First published on 17th October 2025
The thermal conductivity of covalent organic frameworks (COFs), an emerging class of nanoporous polymeric materials, is crucial for many applications, yet the link between their structure and thermal properties remains poorly understood. Analysis of a dataset containing over 2400 COFs reveals that conventional features such as density, pore size, void fraction, and surface area do not reliably predict thermal conductivity. To address this, an attention-based machine learning model was trained, accurately predicting thermal conductivities even for structures outside the training set. The attention mechanism was then utilized to investigate the model's success. The analysis identified dangling molecular branches as a key predictor of thermal conductivity, leading us to define the dangling mass ratio (DMR), a descriptor that quantifies the fraction of atomic mass in dangling branches relative to the total COF mass. Feature importance assessments on regression models confirm the significance of DMR in predicting thermal conductivity. These findings indicate that COFs with dangling functional groups exhibit lower thermal transfer capabilities. Molecular dynamics simulations support this observation, revealing significant mismatches in the vibrational density of states due to the presence of dangling branches.
A key property dictating the applications of COFs is thermal transfer. For example, low thermal conductivity is desired for thermoelectrics to maintain large internal thermal gradients and increase efficiency.19 On the other hand, a high thermal conductivity is desired for gas adsorption and separation where efficient thermal transfer is important for the longevity and stability of the nanoporous membranes.20 COFs are promising nanoporous candidates for such applications as the ability to design their crystalline topology (e.g., lattice type, pore size) and chemistry via the choice of knots and linkers opens up a large and diverse space of thermal conductivities.1,2,21–23 Therefore, it is of great interest to gain a deeper understanding of the thermal transfer mechanism and to obtain strong structure–property trends, which in turn enables the application-specific design of COFs.
Two approaches can be used to elucidate the thermal structure–property relationships of COFs: experimental trial-and-error methods which involve synthesis and characterization,24,25 and computational high-throughput screening which relies on first-principles calculations.26–28 With COFs offering practically an unlimited design space, an experimental trial-and-error approach to screen new COF candidates (including developing new synthesis routes for each candidate) is prohibitively inefficient. Moreover, experimentally synthesized COFs will always contain crystalline defects,29,30 which greatly influence the thermal conductivity. This makes it difficult to relate the thermal transfer mechanisms to the geometrical and chemical make-up of a COF structure. In contrast, molecular dynamics (MD) simulations31 offer a high-throughput virtual screening alternative to lab-based experiments. However, despite recent advances in computing hardware, even virtual screening can be prohibitively inefficient for a large design space. This is highlighted by the example that generating a dataset of just 2471 two-dimensional COFs and their thermal conductivity for this study required 1.3 million CPU-hours in cloud computing (translating into approximately four months and $70
000 in time and cost, respectively). This underscores the need for efficient structure–property maps that are accurate, generalizable, and interpretable.
Recently, Islamov et al.32 performed a high-throughput screening study of over 10
000 MOF structures and their thermal conductivity (κ) using MD. They found that the majority of the MOFs possess κ < 1 W m−1 K−1 with a few exceptions possessing ultra-high thermal conductivity (κ > 10 W m−1 K−1). While COFs are generally more thermally stable (owing to the strong covalent bonding of the building blocks), a recent study by Thakur et al.33 on a high-throughput screening study on over 10
000 COFs demonstrated that, generally, COFs also possess a similar range of κ, with the majority of the structures exhibiting κ < 1 W m−1 K−1. Interestingly, MOFs and COFs possess similar structure–property trends. (i) Increasing the pore size is mostly sufficient to achieve a low thermal conductivity. (ii) Additionally, one can increase the void fraction, the surface area, or include heavy atoms to further decrease thermal conductivity. (iii) However, to increase the thermal conductivity, there are many factors that need to align. Islamov et al.32 demonstrated that while lowering the pore size and increasing the density results in a higher ceiling of attainable thermal conductivity, other factors, such as the topology, mass-mismatch, and linker length also need to be accounted for. Thakur et al.33 came to similar conclusions alongside the observation that aligning the polymeric chains in the structure to the heat flow direction raises the ceiling of attainable thermal conductivity. However, these hand-crafted guidelines and ad hoc correlations are insufficient for developing an accurate and generalizable predictive model that captures the thermal conductivity structure–property relationships of COFs. Our study shows that no combination of commonly used descriptors consistently predicts thermal conductivity. Consequently, a definitive method for identifying COFs with tailor-made thermal properties for various applications remains elusive.
To address this knowledge gap, we turn to machine learning (ML) with an emphasis on interpretability and explainability. Previous efforts in ML-assisted design for thermally conductive organic materials include the discovery of polymers with high κ through a hierarchical feature selection process,34 a reinforcement learning approach using SMILES-based representations,35 and methods leveraging molecular fingerprints as input features, such as fine-tuning a pre-trained regression model followed by screening,36 training a neural network,37 and employing an active learning approach.38 Hu et al.39 provide a review of recent efforts and outlook on ML for thermally conductive organic materials. For the ML modeling of structure–property maps of porous polymers (including but not limited to COFs) there are primarily two routes:
• high interpretability, limited accuracy: using a hand-crafted and pre-extracted high-level of the crystalline network (e.g., pore size, density, surface area, atomic composition) as input to classical regression algorithms;40–42
• high accuracy, limited interpretability: using information-rich but raw graph representation of the crystalline network (where nodes denote atoms with features such as position and atom type, and edges denote bonds with the bond order as the feature) as input to end-to-end deep learning-based regression algorithms.43–46
Here, we bridge the two routes to develop an accurate predictive model using deep learning while also combining interpretability insights from deep learning with prior knowledge and descriptors to explain the thermal conductivity structure–property relationships of COFs (see Fig. 1 for an overview).
In the following, we first conduct a large-scale data analysis of the thermal conductivity structure–property relations of COFs and identify the deficiencies in commonly used descriptors. Next, we provide a deep learning model that accurately predicts the thermal conductivity of COFs. Subsequently, an analysis of the attention scores of the deep learning model uncovers the presence of dangling atoms in the crystalline network of COFs as a strong and so-far missing key predictor for thermal conductivity. Additional physics-based analysis sheds light on the important role of dangling atoms in lowering the thermal conductivity of COFs by disrupting heat transfer pathways. We close by utilizing the ML model for efficient high-throughput screening and identifying COFs with extreme thermal conductivities.
From the above initial analysis, we notice that the geometrical descriptors used here are insufficient to provide solid trends with respect to thermal conductivity. For example, with all the optimal descriptor values, i.e., a large density, a low pore size, a large void fraction, and an intermediate GSA, it is still not guaranteed that we will obtain a COF with a high κ. To elucidate this further, we choose four COFs – denoted by (i)–(iv) – and compare them in Fig. 2a–d.
COF (i) has the highest thermal conductivity in the dataset with κ = 4.025 W m−1 K−1. The structure has a relatively small pore size of 13.71 Å (see Fig. 2a), a medium density of 0.928 g cm−3 (see Fig. 2b), a relatively high void fraction of 0.844 (see Fig. 2c), and an intermediate surface area of 7381 m2 g−1. In contrast, the second COF (ii) possesses a similar pore size of 9.88 Å, a similar density of 0.938 g cm−3, a similar void fraction of 0.869, and a similar surface area of 7171 m2 g−1, but exhibits thermal transfer capabilities almost seven times lower than that of (i) with κ = 0.603 W m−1 K−1.
The third COF (iii) has a pore size of 33.73 Å in the intermediate-to-high range, a low density at 0.408 g cm−3, a high void fraction of 0.932, and a slightly larger surface area of 8741 m2 g−1. Following the trends described in the literature,33,48 the geometrical descriptors are not optimal for a high κ. Nonetheless, COF (iii) has rather high κ = 1.96 W m−1 K−1. Lastly, COF (iv) has an exceptionally high pore size of 80.68 Å and a low density of 0.262 g cm−3 but high void fraction of 0.958. Due to the extreme geometry, the COF (iv) has a very low thermal conductivity of κ = 0.289 W m−1 K−1.
Furthermore, we trained classical ensemble regression models using these descriptors to predict thermal conductivity (detailed results provided in SI Section 2) and observed that the models yield poor prediction accuracy. In summary, the correlation analysis with the previously introduced descriptors and the presented examples illustrate the fact that the thermal conductivity structure–property relationships are complex and call for further examination.
To enable universal transfer learning, the transformer encoder in PMTransformer is pre-trained on an extremely large dataset of 1.9 million hypothetical porous materials to predict easily obtainable yet essential properties such as topology, void fraction, and building block prediction. This approach ensures the encoder captures critical information necessary for accurately predicting other, more complex properties in downstream tasks with much smaller datasets. For more details regarding the pre-training of PMTransformer, refer to ref. 46.
In this study, we leveraged the pre-trained transformer encoder of the PMTransformer model to perform transfer learning for the prediction of the thermal conductivity of COFs. We fine-tune the transformer model using the pre-trained weights as a starting point and jointly train a prediction head based on a multi-layer perceptron architecture to map the output of the transformer encoder to the thermal conductivity of COFs. We used the mean squared error (MSE) as a loss function for training. The dataset of 2471 COFs is randomly split into two subsets: 90% of the data is used for training (out of which 10% for validation), and the remaining data is used for testing. We repeat the training across five different random seeds, where each seed initializes the prediction head randomly while keeping the transformer initialized with the same pre-trained weights. Additional details of the training protocol and an ablation study are provided in SI Section 3.
We demonstrate the model's prediction accuracy on the test dataset in Fig. 3, where it achieves a goodness-of-fit R2 of 0.909 ± 0.006 and a mean absolute error (MAE) of 0.075 W m−1 K−1 for predicting κ. The model's strong performance suggests that there may be gaps in the feature sets typically used to describe COFs. It also highlights the possibility that additional key descriptors, beyond those commonly used, could play an important role in predicting thermal conductivity, thus warranting further exploration of structure–property relationships.
Additionally, we identify the main branch of a COF as the shortest continuous path connecting the boundary points necessary for periodicity. We then classify all the atoms extending and excluded from the main branch as dangling mass. More details on how we classify each atom as either part of the main branch or part of a dangling side branch are presented in SI Section 4. We observe distinct patterns in the attention assigned to various atomic sites within a COF's graph representation, which correspond to the location of the dangling masses. Two representative examples are shown in Fig. 4 and 5.
Consider the first example (see Fig. 4a and b). The top row illustrates a COF structure (Fig. 4a) from the test dataset with a thermal conductivity of 1.105 W m−1 K−1 and no dangling masses except hydrogen atoms. Additionally, the atom-wise attention scores are uniformly distributed and low for all the carbon atoms with some elevated attention to the nitrogen atoms. In the second row, we illustrate a COF structure (Fig. 4b) with the same topology as the previous one, but with a much lower thermal conductivity of 0.533 W m−1 K−1. Its attention profile reveals that certain atom groups are being paid special attention to by the transformer, i.e., specifically the –NO2 functional group on the benzene ring that is dangling from the main branch of the COF. Notably, the same groups of atoms that exhibit higher attention scores are also classified as dangling masses (see Fig. 4b attention profile & dangling mass).
In the second example (Fig. 5a and b), the COF structure on the top does not contain any dangling mass except hydrogen atoms and has a thermal conductivity of 2.715 W m−1 K−1 with a highly uniform attention profile (see Fig. 5a). Analogously, we pick a second COF structure with the same topology and comparable geometrical descriptors. Once more, we observe that the second structure contains a significant amount of dangling mass (i.e., –CN branches extending from the rings) with corresponding elevated attention scores and has a lower thermal conductivity of 1.611 W m−1 K−1.
The same trend is observed for numerous examples, with additional ones presented in the SI, Section 5. We hypothesize that this increased attention is indicative of the deep learning model's understanding that these dangling masses are significant in predicting thermal conductivity. Furthermore, it suggests that the presence of dangling masses disrupts the heat transfer pathways and thereby reduces the thermal conductivity through the material. A similar effect of dangling mass lowering the thermal conductivity has been reported previously in singular polymer chains53 and amorphous polymers.54 This effect, while known in disordered systems, has not been examined in crystalline COFs. Here, we demonstrate that it remains relevant even in ordered, porous frameworks, inviting future investigation with high-fidelity modeling methods (such as density functional theory) to examine the underlying mechanism in more detail.
To further validate the structure–property relationship interpreted from the deep learning model, we examine the impact of dangling atoms on the vibrational density of states (VDOS) for the aforementioned contrasting COF examples. VDOS, which characterizes the distribution of vibrational modes in a system as a function of frequency, is known to impact thermal transfer properties. Overlaps in VDOS profiles between different atoms have been shown to affect these properties in both COFs55 and MOFs.56
Through MD simulations, the VDOS is calculated using the Fourier transform of the normalized velocity autocorrelation function of specific groups of atoms (see SI Section 6 for details). Fig. 4 and 5 (third column) show the VDOS profile of carbon, nitrogen, and oxygen atoms in the representative COF examples. For any COF, we define a VDOS overlap metric S as the ratio of the area under the curve (AUC) of the minimum VDOS across all atom types (at each frequency) and the AUC of the maximum VDOS across all atom types (at each frequency), i.e.,
![]() | (1) |
In the COF structures, which lack or have minimal dangling atoms, the VDOS profiles of the atoms are broad and overlap significantly, e.g., S ≈ 0.41 and 0.49 for examples in Fig. 4a and 5a, respectively. This results in an even distribution of vibrational modes across a wide frequency spectrum for all atoms. In contrast, the COFs with substantial dangling atoms exhibit a VDOS profile with minimal overlap, e.g., S ≈ 0.08 and 0.09 for examples in Fig. 4b and 5b, respectively. Notably, dangling atoms significantly reduce the VDOS overlap in both low-frequency (0 to 20 THz) and high-frequency (40 to 60 THz) modes. The vibrational modes of individual atoms in these COFs are confined to much narrower frequency bands, which leads to a less harmonized VDOS profile. The lack of overlap suggests that the vibrations are highly localized around the dangling atoms, which causes the phonon waves to be scattered, disrupting their ability to transfer heat efficiently through the material.58 We interpret the mismatch in the VDOS profiles as an energy barrier for phonon transport, where phonons encounter resistance, preventing smooth energy transfer across atoms. This suggests that the absence of dangling mass is crucial for enhancing thermal conductivity in COFs. Dangling atoms create these mismatches in VDOS, which in turn hinders phonon transport and reduces thermal conductivity. A similar trend is observed in additional pairs of COFs (Fig. S9, SI) and the correlation between κ and S for a selected representative set of eight COFs is presented in Fig. S6c (SI).
To further confirm the effect of dangling atoms on phonon dispersion in COFs, we calculate the phonon spectral energy density (pSED) of the example COFs and their dangling counterparts, as presented in Section 5 in the SI. The pSED of COFs containing more dangling atoms exhibits higher magnitudes and broadening bands, indicating stronger anharmonicity and vibrational scattering in the vibrational modes, consequently resulting in reduced thermal conductivity.
To quantify the role of dangling masses in the structure–property relationship for thermal conductivity of COFs, we introduce the dangling mass ratio (DMR). The DMR ∈ [0, 1] quantifies the ratio between the mass of dangling atoms and the total mass of the COF, i.e.,
![]() | (2) |
and
, respectively. Higher DMR values indicate a greater proportion of dangling masses relative to the COF's total mass. In SI Section 1.3, we qualitatively show the relationship between DMR, the VDOS overlap S, and κ.
We then introduce DMR as an additional feature, alongside pore size, density, void fraction, and surface area, and train classical ensemble regression models, such as the Random Forest,59 Gradient Boosting,60 XGBoost,61 and AdaBoost62 and outline their performances in Table 1. These simpler models are employed not to compete with the PMTransformer in predictive accuracy, but to provide directly interpretable feature importance scores and help in understanding the relevance of DMR. In contrast, extracting meaningful and quantitative feature importance from the PMTransformer is non-trivial due to its complex model architecture and the absence of an explicit correspondence between input features and learned representations. To assess the importance of individual features, including DMR, we analyzed their Gini importance and conducted a permutation feature importance analysis.63 These values quantify the contribution of each feature to the overall predictive power of the model. The Gini importance measures how much each feature reduces the Gini impurity or randomness when making predictions in the ensemble model.59 On the other hand, permutation feature importance measures the impact of a feature by assessing how much the model's performance decreases when the values of that feature are randomly shuffled. A greater drop in performance indicates that the feature is more important in predicting the target variable.59 We note that for the best performing regressor, the Random Forest, DMR emerges as the second most influential predictor with 27.8% Gini importance, closely following behind density (see Table 2). Similarly, the permutation feature importance analysis also highlights DMR as a key feature, further supporting its significant role in predicting thermal conductivity, with its importance consistently ranking just below density. Finally, we conducted SHAP analysis64 on the Random Forest to better understand the impact of each feature on the individual test data points. As shown in Fig. 6, the analysis reveals that higher DMR values are associated with lower predicted κ, confirming the trend we observed. The consistency of these findings across different methods—Gini, permutation feature importance, and SHAP—reinforces the importance of DMR as a key factor in predicting thermal conductivity. We note that the computed importance scores are inherently linked to the design space and dataset introduced by Mercado et al.47 Specifically, if the dataset did not contain structures with dangling masses, the DMR feature would naturally show little to no importance. Thus, the relevance of DMR as a descriptor depends on its variability within the dataset under consideration.
| Regression model | R2 |
|---|---|
| Random Forest | 0.728 ± 0.069 |
| Gradient Boosting | 0.688 ± 0.070 |
| XGBoost | 0.691 ± 0.035 |
| AdaBoost | 0.416 ± 0.125 |
| Feature | Gini | PFI |
|---|---|---|
| Density | 0.397 | 0.670 |
| DMR | 0.278 | 0.511 |
| LPD | 0.185 | 0.455 |
| Void Fraction | 0.070 | 0.093 |
| GSA | 0.069 | 0.022 |
638 two-dimensional COFs in the unlabelled dataset by Lan et al.65 This dataset covers a vastly different design space with a variety of linkers and knots previously unseen by our trained model.
The screening is performed with the objective of facilitating the identification COFs with extremely high or low thermal conductivity without resorting to computationally expensive simulations. We observe an average screening rate of 0.07 seconds per COF structure (see SI Section 7 for details on computational runtimes), thereby enabling a speedup of almost seven orders of magnitude compared to MD-based screening.
Once potential candidates with extreme thermal conductivities are identified using the PMTransformer, we then calculate their thermal conductivities through MD simulations for these selected few, rather than for the entire datasets. This approach circumvents reliance on the PMTransformer's absolute predictions by validating promising candidates with high-fidelity MD calculations. The results are summarized in Fig. 7. Our screening process on the dataset by Mercado et al.47 revealed COFs exhibiting thermal conductivities up to twice the maximum value observed in the training set, but stemming from the same design space. By screening the dataset by Lan et al.,65 we discover COFs with thermal conductivities five times as high as the ones in our training set. Additionally, we discovered several COFs with thermal conductivities lower than any values previously recorded in our dataset. A visual inspection reveals that these COFs feature a combination of large pore sizes, low density, and a substantial presence of dangling masses. For all the identified COFs, we observe that thermal conductivity shows a strong trend with both dangling mass ratio (DMR) and VDOS overlap ratio (S), which is in agreement with the previous observations.
![]() | ||
| Fig. 7 High-throughput screening. (a) Distribution of κ in the training dataset. Dashed lines indicated minimum, mean, and maximum κ across the training dataset. (b and c) Distribution of κ of selected COF candidates identified in high-throughput screening for low and high κ in the unlabeled datasets of (b) Mercado et al.47 and (c) Lan et al.65 While the former dataset shares the same design space as the training dataset, the latter has a different design space. All thermal conductivities shown here are computed using MD. (d) Representative COFs identified through high-throughput screening with κ lower and higher than the minimum and maximum, respectively, across the training dataset. Also indicated are their corresponding DMR and S values. | ||
Further analysis using the transformer's attention mechanism revealed that COFs with higher amounts of dangling atoms exhibited lower thermal conductivities due to disrupted heat transfer pathways, identifying a novel and significant predictor of thermal conductivity. The random forest regressor, identified as the best-performing regression ensemble model, was used primarily as a simple tool to assess feature importance rather than to compare predictive performance with the PMTransformer. It was analyzed using Gini importance, permutation feature importance, and SHAP values. These analyses consistently highlighted the dangling mass ratio as the second most important predictor of thermal conductivity. The vibrational density of states analysis further supported these findings, showing that dangling masses introduce mismatched vibrational modes that hinder thermal transfer. While our current analysis focuses on two-dimensional monolayer COFs, additional mechanisms may emerge in three-dimensional architectures and warrant a separate investigation.
We further leveraged the deep learning model to efficiently screen thousands of COFs, both within and beyond the design space we analyzed, to identify candidates with extreme thermal conductivities. This approach provides a rapid and reliable method as well as valuable insights for designing COFs with tailored thermal characteristics. To support further research, we release the dataset of COF thermal conductivities and encourage the research community to utilize and expand upon it. Finally, we emphasize the need to bridge high interpretability with high accuracy in machine learning models to navigate the complex structure–property space of COFs and other nanoporous materials, thereby enabling the design of optimal materials for applications such as gas separation and thermal management.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5dd00126a.
Footnote |
| † These authors contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2025 |