Amirhadi Alesadi†
a,
Zhaofan Li†b,
Amara Arshadc and
Wenjie Xia
*b
aDepartment of Civil, Construction and Environmental Engineering, North Dakota State University, Fargo, ND 58108, USA
bDepartment of Aerospace Engineering, Iowa State University, Ames, Iowa 50011, USA. E-mail: wxia@iastate.edu
cMaterials and Nanotechnology, North Dakota State University, Fargo, ND 58108, USA
First published on 20th August 2025
We present a cheminformatics model for predicting the glass-transition temperature (Tg) of conjugated polymers using four interpretable molecular descriptors. The model achieves high predictive accuracy (R2 ≈ 0.85), and molecular dynamics simulations validate the descriptor–Tg relationships. This integrated framework enables rational design of conjugated polymers with tailored glass-transition properties.
Predictive frameworks for polymer materials generally fall into two categories: models based on geometric features (e.g., connectivity and topology)9,11 and cheminformatics-based approaches12 that utilize physicochemical descriptors. In our previous work,9 we developed a machine learning (ML) model trained on a diverse dataset of CPs, which achieved an R2 of ∼0.85 for Tg prediction by identifying key structural patterns such as side-chain composition and aromatic ring connectivity. However, such ML models often rely on geometric heuristics and may lack physical interpretability.
While quantitative structure-property relationaship (QSPR) methods have shown excellent performance in predicting the thermal and mechanical properties of non-conjugated polymers,13 their application to CPs has mostly focused on optoelectronic performance and mechanical flexibility. In contrast, the use of QSPR to predict the Tg of CPs remains limited. It is unclear whether descriptor-based models can reliably capture Tg trends in CPs, particularly given their complex structures and rigidity effects. This motivates the present study, which combines interpretable molecular descriptors with MD simulation validation to build a physically grounded QSPR model for Tg prediction in CPs.
In this study, we build on our previous compiled dataset9 of CPs to develop a QSPR model that predicts Tg directly from the chemical structure of the building block. The model identifies key descriptors related to backbone flexibility, electronic delocalization, and steric characteristics, offering insights beyond geometric heuristics used in prior ML-based models. We assess predictive performance and compare descriptor relevance to features highlighted in earlier work.9 To support the physical basis of these descriptors, we also perform molecular dynamics (MD) simulations that elucidate how backbone rigidity influences chain mobility and packing, reinforcing the mechanistic understanding of Tg in CPs.
The dataset, previously used in our ML model,9 consists of 154 polymers and small-molecule acceptor units, spanning a Tg range from −30 °C to 220 °C (Fig. 1). It includes both flexible and rigid polymers, ensuring diverse structural representation. The majority of polymers exhibit Tg values between 0 °C and 170 °C, encompassing a broad spectrum of structural variations, from relatively flexible backbones influenced by side-chain mobility to more rigid systems with fused-ring structures and strong intermolecular interactions. This diversity enables a comprehensive structure–Tg relationship, ensuring the QSPR model captures key thermal behavior trends. While this dataset represents one of the largest available for Tg prediction in CPs, we acknowledge its limitations in size and origin. The proposed descriptor-based framework is modular and can be readily expanded as new experimental data become available, which will further improve model accuracy and generalizability. Further details on dataset distribution and molecular structures are provided in the SI (Table S1).
![]() | ||
Fig. 1 Distribution of the experimental glass-transition temperature (Tg) values of conjugated polymers used in this study. The solid red line represents the kernel density estimation. |
To select and evaluate the performance of the QSPR regression models, we use R2 and Q5-fold2 as statistical measures of model accuracy and generalizability. While R2 quantifies how well the model fits the training data, it alone is not sufficient to confirm predictive reliability, as adding more descriptors can artificially inflate R2 due to overfitting. To prevent this issue, we develop multiple QSPR models, each incorporating between 1 and 10 descriptors, and assess their performance using training (R2), cross-validation (Q5-fold2), and test set (R2) results. As shown in Fig. 2A, increasing the number of descriptors initially improves both R2 and Q5-fold2, indicating a better model fit and stronger generalizability. However, beyond four descriptors, test set R2 declines, suggesting that additional descriptors do not contribute meaningful information and instead introduce noise, leading to overfitting. The model with four descriptors achieves the best balance between predictive power and generalizability, making it the optimal choice for Tg prediction in CPs:
Tg (°C) = −130.14 – 1240.36RBF + 238.95SpMin1_Bh(i) − 84.18HATS6e + 27.10B08[N–S] |
![]() | ||
Fig. 2 Fig. 2. (A) Statistical analyses of R2 and Q2 as a function of the number of descriptors for QSPR models with 1–10 descriptors for training and test sets. The black dashed line indicates the selected 4-descriptor model. A correlation plot between the observed and predicted values of the Tg of polymers in (B) the 4-variable QSPR model, (C) Alesadi's ML model,9 and (D) Xie's predictive model.11 |
Here, RBF is the rotatable bond fraction, determining the number of bonds that allow free rotation. SpMin1_Bh(i) is the smallest eigenvalue of the Burden matrix weighted by ionization potential. HATS6e is the H-GETAWAY descriptor derived from atomic electronegativity distributions.15,16 B08[N–S] captures the presence/absence of N–S at topological distance of 8.17 Y-randomization tests and applicability-domain analysis confirm that the QSPR model is statistically robust, avoids spurious correlations, and is reliable for Tg prediction. Key dataset details and validation metrics are provided in the SI. The following sections further discuss its predictive performance, molecular interpretation, and applicability in the design of CPs.
Fig. 2B shows the correlation between predicted and experimental Tg values using a 70/30 training/test split. The QSPR model achieves R2 = 0.89 for training and 0.85 for testing, confirming strong predictive power. With just four descriptors, the model effectively captures structural diversity and predicts Tg directly from the monomer structure. Compared to our previous ML-based model9 (R2 = 0.85 overall, Fig. 2C), the QSPR approach offers slightly higher accuracy, particularly in distinguishing subtle variations in backbone rigidity and electronic effects. Xie's empirical model,11 based on a single mobility parameter (Fig. 2D), performs well for low-to-moderate Tg CPs but reaches only R2 ≈ 0.4 on our dataset, especially struggling with high-Tg CPs (Tg > 150 °C). This suggests that inter-chain interactions, backbone rigidity, and steric effects play larger roles in these materials, which single-parameter models may not fully capture. Thus, our QSPR model provides a more comprehensive and accurate framework for predicting Tg across diverse CPs, making it a reliable tool for materials design. It should be noted that this study is computational in nature and relies on the experimental Tg values of CPs (including homopolymers and donor–acceptor polymers with similar structural features of alternating copolymers) reported in the literature. While the model demonstrates strong predictive performance relative to established baselines, future validation using newly synthesized CPs would further strengthen its general applicability.
We next discuss each of the identified descriptors and their role in QSPR model Tg prediction for CPs.
RBF. The rotatable bond fraction (RBF) quantifies backbone flexibility by measuring the fraction of freely rotating bonds. A higher RBF value indicates greater conformational freedom, leading to increased chain mobility and free volume, which in turn lowers Tg. Conversely, a lower RBF reflects a more rigid backbone that restricts segmental motion and increases Tg. This trend aligns with our previous ML-based Tg model, where the alkyl side-chain fraction showed a negative correlation with Tg due to its disruption of chain packing. However, our QSPR model more comprehensively represents backbone flexibility by accounting for both side-chain effects and steric hindrance from the backbone. The inverse relationship between RBF and Tg is particularly evident in systems with conjugated aromatic units, fused rings, and sterically hindered groups—all of which reduce the RBF and elevate Tg by restricting rotational freedom. The extent of this effect depends on whether aromatic rings are isolated, fused, or bridged—while isolated rings allow some dihedral rotation, fused and bridged rings introduce significant steric constraints, further reducing the RBF and increasing Tg. Prior studies18–20 have shown that CPs with extended fused-ring backbones exhibit higher Tg (lower RBF) due to limited segmental motion, reinforcing the RBF's reliability in capturing structural rigidity. Similarly, alkyl side-chains counteract backbone rigidity by introducing rotatable bonds, thereby increasing the RBF and lowering Tg. Longer side-chains expand free volume and reduce inter-chain interactions, further lowering Tg.10
To establish a molecular-level understanding of the relationship between backbone flexibility and Tg, we conducted coarse-grained molecular dynamics (CG-MD) simulations21,22 using a chemistry-specific CG model informed by the all-atomistic (AA) model of PDPPT (Fig. 3A and B, detailed in the SI). Notably, PDPPT is included in the QSPR dataset (Fig. 1), enabling a direct conceptual link between descriptor-based predictions and simulation results. The simulation systematically varied backbone rotation (Krotation) and bending rigidity (Krigidity) to quantify their respective effects on thermal behavior. Fig. 3C presents density vs. temperature curves for different backbone flexibility conditions, where Tg is determined from the intersection of linear fits. According to free volume theory, increasing backbone flexibility by lowering Krigidity supresses chain paccking efficiency, increases available free volume, and thereby reduces Tg.23 Reduced torsional stiffness slightly increases density and modestly lowers Tg, while reduced bending rigidity causes a larger density increase and a pronounced Tg drop. These results align with previous CG-MD simulations by Xu and co-workers,24,25 who used the generalized entropy theory to show that backbone stiffness modulates Tg by altering configurational entropy, packing frustration, and segmental relaxation, supporting the entropy-driven mechanism observed here.
To further assess the role of backbone flexibility in segmental mobility, we calculate the Debye–Waller factor (〈u2〉), which quantifies fast segmental motion at short picosecond timescales (Fig. 3D). A higher 〈u2〉 indicates increased local mobility, corresponding to greater free volume and enhanced chain dynamics.9,14 Across all cases, lowering Krotation leads to a moderate increase in 〈u2〉, whereas decreasing Krigidity significantly enhances local mobility, reflecting a reduction in Tg-associated molecular constraints. The inset of Fig. 3D shows mean-squared displacement (MSD) curves, where the vertical dashed line at t = 4 ps marks the caging time at which 〈u2〉 is determined. This trend aligns with the plasticizer-like effect, where increased backbone flexibility promotes molecular rearrangement, leading to looser packing and lower Tg.26 These findings confirm that backbone flexibility directly influences Tg through its effects on molecular packing and segmental mobility, supporting the inverse correlation between the RBF and Tg observed in our QSPR model.
SpMin1_Bh(i). This descriptor represents the smallest eigenvalue of the Burden matrix weighted by ionization potential and shows a positive correlation with Tg. This trend may suggest that higher ionization potential—often associated with more electron-deficient and less polarizable monomers—correlates with reduced chain mobility. One possible explanation is that electron-deficient backbones exhibit greater electronic rigidity or limit π-electron delocalization, which in turn constrains segmental motion and elevates Tg. Alternatively, this correlation may arise from changes in chain packing or interchain interactions driven by electronic effects. These factors will be investigated separately in future studies to better understand the role of electronic properties in governing polymer thermal behavior.
HATS6e. HATS6e is a 3D H-GETAWAY descriptor that encodes a leverage-weighted autocorrelation of atomic Sanderson electronegativities at a six-bond topological lag, capturing both spatial and electronic characteristics of the molecular structure.27,28 In our model, HATS6e exhibits a negative correlation with Tg, suggesting that higher charge localization—reflected by greater variation in electronegativity along the polymer backbone—may reduce intermolecular cohesion and increase chain mobility. This effect may result from disrupted π–π stacking in conjugated systems or weakened dipolar interactions in polar polymers. This trend is supported by experimental observations where highly fluorinated CPs (extremely electronegative substituents) tend to pack poorly (lower crystallinity) and exhibit greater chain mobility (potentially lower Tg).29 Our results also confirm previous MD and ML-based9 findings that the presence of halogen atoms (e.g., fluorine and chlorine) correlates negatively with Tg, likely due to increased free volume and reduced cohesion.
B08[N–S]. The B08[N–S] descriptor, a topological measure of nitrogen and sulfur connectivity, shows a positive correlation with Tg, suggesting that N–S interactions may contribute to polymer rigidity and intermolecular cohesion. While direct evidence is limited, prior studies indicate that polymers with nitrogen and sulfur heteroatoms often exhibit higher Tg, possibly due to dipole interactions and secondary bonding effects. Our model captures a similar trend, aligning with reports that benzothiadiazole- and thiophene-containing CPs tend to have elevated Tg.9 Although heteroatoms have been identified in QSPR studies as factors influencing Tg, further validation is needed to fully clarify their role in restricting segmental mobility and enhancing intermolecular interactions.
In conclusion, we developed a QSPR model using four interpretable molecular descriptors—RBF, SpMin1_Bh(i), HATS6e, and B08[N–S]—to accurately predict the glass-transition temperature (Tg) of conjugated polymers. These descriptors capture backbone flexibility, electronic properties, charge distribution, and heteroatom interactions. The model achieves strong predictive accuracy (R2 ≈ 0.85), comparable to our previous ML-based model.9 CG-MD simulations support the descriptor–Tg relationships, while the model's generalizability to new polymer classes remains to be validated in future studies. For next-generation CPs with increasingly complex chemistries, this QSPR framework offers improved robustness and interpretability, making it well-suited to guide the rational design of advanced polymer materials.
This work was supported by the National Science Foundation (NSF) under Award No. 2237063.
Footnote |
† These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |