Open Access Article
Xiaojie Feng†
a,
Xiaoying He†b,
Jiayi Zhub,
Li-Hong Linc,
Qiaoyan Shangb,
Zheng-Hong Luo
c,
Yin-Ning Zhou
*c and
Fangyou Yan
*a
aSchool of Chemical Engineering and Materials Science, Tianjin University of Science and Technology, Tianjin 300457, P.R. China. E-mail: yanfangyou@tust.edu.cn
bSchool of Marine and Environmental Science, Tianjin University of Science and Technology, Tianjin 300457, P.R. China
cState Key Laboratory of Synergistic Chem-Bio Synthesis, School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China. E-mail: zhouyn@sjtu.edu.cn
First published on 15th December 2025
The rational design of polyester materials plays a crucial role in the development of functional polymers with tailored properties. In this work, we introduce a novel symmetry-guided molecular design strategy, which is a symmetry-aware, parameter-controlled design paradigm that both broadens and rationalizes the accessible chemical space of functional molecules. By introducing the concept of a pairwise atomic symmetry index (PASI) metric and applying targeted modifications to small molecules, a library of 10
614 diacids and 9983 diols is constructed, enabling a systematic and unexplored expansion of the chemical space of polyesters. The combinatorial pairing of these diacids and diols leads to the generation of over 100 million polyester structures. High-throughput prediction of the glass transition temperature (Tg) by the Tg-QSPR model aligns well with the typical thermal behavior in polyester materials. To validate the design methodology, a two-level verification process is performed. The predicted Tg values are first examined using molecular dynamics (MD) simulations and subsequently confirmed by differential scanning calorimetry experiments. The calculated Tg values show good agreement with both MD simulations (average absolute error (AAE) of 17.54 °C) and experimental measurements (AAE of 16.45 °C). These results further confirm the reliability and robustness of the proposed approach. This study not only provides an effective strategy for the large-scale generation of a polyester library and screening of property targeted polyesters, but also carries broader chemical implications beyond polyester design, offering potential insights for the development of functional molecules.
The emergence of polymer informatics has opened new opportunities for the expansion of polymers.17–21 The rapid development of polymer informatics has led to research into the computer-aided design of high-performance polymers. Researchers can explore the relationship between the structure and properties (e.g., thermal performances,22–27 transfer performances,28–33 electrical performances,34–36 mechanical performances,37–40 and optical performances41–43) of novel polymers to meet the needs of different fields.
Several frameworks have been developed to generate and evaluate polymers. Existing frameworks, such as the Open Macromolecular Genome (OMG)44 and Small Molecules into Polymers (SMiPoly),45 provide collections of commercially available or literature-derived monomers and define canonical polymerization pathways, enabling the construction of virtual polymer libraries. Generative models based on the Variational Autoencoder (VAE) framework46 are accessible to the inverse design of polymers with targeted topologies and properties. High-throughput and data-driven strategies are also applied to accelerate the discovery of functional polymers. For example, Yu et al.47 constructed a virtual space of over 100
000 polyimides and identified nine promising candidates for high-temperature energy storage through computational screening and molecular dynamics (MD) simulations. Similarly, He et al.48 generated over 95
000 polyester candidates by combining diacids and diols and experimentally validated a quantitative structure–property relationship (QSPR) model for glass transition temperature (Tg).
Despite these advances, most current strategies rely mainly on retrosynthetic or combinatorial approaches, and systematic monomer design remains underexplored. To address this gap, we introduce a symmetry-guided monomer design strategy leveraging the pairwise atomic symmetry index (PASI) to guide the generation of novel monomers. By explicitly incorporating atomic-level symmetry constraints, this approach enables systematic exploration of polyester chemical space, providing a conceptual framework for rational polyester design that complements existing generative polymer methodologies.
Building on this conceptual framework, we apply the PASI-guided strategy to develop a practical monomer design workflow. In this study, we focus on small-molecule modification and use the Tg of polyesters as a case study to broaden the chemical space and enable targeted screening of polyesters (Fig. 1). First, small molecules for designing diacids and diols are obtained by systematically modifying the collected organic molecules. Second, the concept of PASI is introduced for the first time to address the issue of symmetry in atomic pairs within molecules. Guided by the PASI theory and incorporating the modified fragments, diacids and diols are designed systematically. Subsequently, a library of hypothetical polyesters is generated through the enumeration of all possible diacid–diol combinations. To validate the design methodology, the Tg-QSPR model48 is used to conduct a high-throughput screening of the virtual polyester library. Also, mechanistic or chemical insights are also provided according to the distribution of polyester Tg values along with their chemical structures. MD simulations and experimental validation are then performed. This design strategy not only enhances the efficiency of polyester design but also provides innovative ideas and methods for discovering polymer materials.
000 organic molecules are collected from the National Institute of Standards and Technology (NIST) database.49 Owing to the complex structural features of some organic molecules, the organic structures are systematically modified according to the following rules (Fig. 1a): (1) exclusion of the charged species and cis–trans isomers; (2) removal of halogen atoms from organic molecules; (3) removal of metal atoms from organic molecules; (4) removal of intrinsic functional groups, including carboxyl, hydroxyl, ester, and amino groups; and (5) removal of duplicate small molecules. Furthermore, based on an analysis of the polyester database derived from PoLyInfo50 (see the SI for details; Fig. S1), the modified molecules (H-suppressed structures) are further screened according to the following principles (Fig. 1a): (1) the total number of heavy atoms, defined as non-hydrogen atoms including carbon (C), nitrogen (N), oxygen (O), phosphorus (P), and sulfur (S), in small molecules <80; (2) the molecular weight of small molecules <1000; (3) the maximum step (i.e., the longest topological distance) of small molecules <50; (4) the number of C atoms in small molecules <50; (5) the number of O atoms in small molecules <10; and (6) the number of N atoms in small molecules <10.
The synthetic accessibility score (SAscore) metric is used to assess the synthetic difficulty of a compound during the chemical synthesis process by analyzing its structural features.51 Synthetic accessibility analysis enables researchers to screen and design substances more effectively, thereby enhancing the success rate and efficiency of novel material development. Generally, compounds with a lower SAscore are more readily synthesized, requiring relatively simpler reaction conditions and fewer synthetic steps. To reduce the synthetic complexity, 4116 small molecules with SAscores of less than 4.0 are selected for the subsequent design of polyester monomers. Detailed distribution information is provided in Fig. S2.
(1) Calculate the following for atom i: the topological distance (D)52 between atom i and all atoms; the branched degree (bra); the sum of bond orders (∑bds), and the product of bond orders (∏bds). Additionally, record the atomic number (Z) and the number of bonded hydrogens (#H).
(2) The attribute tuples (D, Z, bra, ∑bds, ∏bds, #H) are sorted in ascending order following a lexicographic comparison scheme. Specifically, D is compared first; if entries have the same D value, Z is compared next, and the comparison proceeds sequentially through the remaining attributes. It should be noted that these attributes are treated as a set of parallel equivalence conditions rather than a weighted linear combination.
(3) Calculate the PASI between atoms i and j, as described in eqn (1).
![]() | (1) |
A representative example is provided to illustrate the PASI (Fig. 2). First, the atomic information (Z, bra, ∑bd, ∏bds, and #H) for all atoms is obtained to construct the atomic information matrix. The D values between the atom and all other atoms are then computed, forming the initial matrix. Each matrix is sorted in ascending lexicographic order according to the sequence (Z, bra, ∑bd, ∏bds, and #H). Finally, the sorted matrices of two atoms are compared, and the ratio of identical rows to the total number of rows is defined as the PASI between the two atoms. In this example, atoms a and b have identical matrices, giving a PASI of 1.0, whereas atoms a and c share no identical rows, resulting in a PASI of 0. This example demonstrates how PASI quantitatively captures topological equivalence based on parallel atomic attributes.
614 diacids and 9983 diols are successfully designed by introducing carboxyl and hydroxyl groups at the symmetric positions. Comprehensive details are provided in the SI (Data.xlsx). The diacids are labeled as A1-A10614, while the diols are marked as B1-B9983. Although constraining the design to PASI = 1.0 reduces the design space, this is an intentional and adjustable choice. The PASI enables quantitative control of atomic-level topological symmetry, allowing the design space to be flexibly expanded or contracted according to the application.
![]() | ||
| Fig. 3 Distributions of PASI values for (a) diacid and (b) diol sites in the monomer dataset (sourced from He et al.48). | ||
In terms of data scale and chemical diversity, the diacid and diol monomers included in several representative frameworks are compared, as summarized in Table 1. SMiPoly45 collected 1083 small molecules extracted from the literature, including 81 diacids and 63 diols, whereas OMG44 screened 3.1 million molecules from the eMolecules database and identified 1911 diacids and 6581 diols. In this work, the PASI-guided design strategy generates 10
614 diacids and 9983 diols. Fig. 4a illustrates the visualization of the Morgan fingerprint feature (radius = 2, fpSize = 2048) for diacids and diols, respectively, obtained using the t-distributed stochastic neighbor embedding (t-SNE) algorithm.53–55 Compared to the existing methods, this work spans a broader chemical space. It highlights that our method introduces a symmetry-aware, parameter-controlled design paradigm that both broadens and rationalizes the accessible chemical space.
![]() | ||
| Fig. 4 Information on the designed diacids and diols. (a) Chemical space visualization of the designed diacids and diols in datasets OMG,44 SMiPoly,45 He et al.,48 and this work. (b) Counts of the designed diacids and diols across different molecular weight ranges. (c) Distribution of ring atom ratios in the designed diacid molecules. (d) Distribution of ring atom ratios in the designed diol molecules. (e) Distribution histogram of the SAscore of the designed diacids and diols. | ||
Additionally, the PASI-guided monomers were evaluated by searching the designed diacids and diols in the PubChem database (https://pubchem.ncbi.nlm.nih.gov/), which contains 122 million compounds. The results show that 77.9% of the designed diacids and 67.8% of the designed diols are not present in PubChem, indicating high novelty. These findings confirm that PASI-guided selection effectively explores previously unreported chemical space.
Furthermore, Fig. 4b shows the molecular weight distribution of the diacids and diols. The molecular weight of the diacids is primarily concentrated in the range of 150 to 540 g mol−1, while the molecular weight of the diols is mainly distributed between 120 and 480 g mol−1. Ring atom distributions (Fig. 4c and d) reveal that over 60% of monomers contain cyclic substructures, with the ratio of ring atoms to heavy atoms (RA/HA) values spanning a wide range. This variability allows systematic tuning of polyester properties. For example, lower RA/HA values enhance flexibility and processability, while higher RA/HA values improve rigidity and thermal stability. Fig. 4e shows that the SAscores of both the diacids and diols are concentrated between 1.7 and 4.0, indicating that the synthesis of the designed diacids and diols is acceptable and feasible under certain conditions and thus accelerate synthesis.
Finally, over 100 million virtual polyester molecules are successfully generated utilizing computational methods to identify characteristic functional groups, such as carboxyl (–COOH) and hydroxyl (–OH) groups, within the simplified molecular input line entry system (SMILES) of the monomer. In the future, one can also adopt symmetry constraints with extra expert knowledges as new design principles to control physicochemical properties of polymers (e.g., controlling chain rigidity or crystallinity).
![]() | (2) |
Fig. 5a presents the distribution histogram of the polyester Tg values. This distribution trend aligns with the typical thermal stability characteristics of polyester materials (He et al.48), suggesting that this design strategy is feasible and effective. It is worth noting that there are also Tg values beyond the plotted range (−200 °C to 400 °C): 0.0189% polyesters have Tg values below −200 °C, and 0.005% have values above 400 °C. Such extreme values are likely due to model-induced deviations when operating outside its applicable domain. Fig. 5b–d show representative polyester samples selected from different Tg ranges. Analysis of these structures reveals a clear trend that polyesters with higher Tg values typically contain a higher fraction of cyclic units (e.g., aromatic or alicyclic rings), whereas those with lower Tg values generally contain fewer ring units and often feature longer aliphatic chains. This behavior arises from the intrinsic rigidity of cyclic groups, which restricts local segmental mobility and consequently increases the Tg. In contrast, longer aliphatic chains increase conformational flexibility and enhance segmental mobility, ultimately leading to lower Tg values. In addition, several representative Tg values of commonly used commercial polyesters from open reports and model predictions are listed in Table S1 to provide a reference for the Tg range of the designed polyesters.
A total of 19 polyesters (Fig. 6a and b) were selected based on their predicted Tg values, which are randomly distributed within the range of −80 °C to 180 °C. This ensures that the MD validation covers a broad chemical space. The detailed MD simulation results are provided in Fig. S3 in the SI. Fig. 6c shows the correlation between the Tg values obtained from MD simulations and those predicted by the Tg-QSPR model. The shaded region represents the convex hull of the Tg-QSPR model. All MD data points fall within this convex hull, indicating that the MD predictions are consistent with the reasonable distribution domain of the Tg-QSPR model and further confirming the rationality of the design strategy. The maximum absolute error (AEmax, SI eqn (S1)) is 38.94 °C, and the average absolute error (AAE, SI eqn (S2)) is 17.54 °C, closely matching the model's AAE (17.72 °C). These findings suggest that the selected polyesters preliminarily exhibit the targeted thermal properties. It should be noted that, as not all PCFF parameters are directly available in LAMMPS, missing terms are generated using the automated conversion script insight2lammps.pl (https://www.MatSci.org). This process may result in minor deviations in bond-angle or torsional parameters, which can have a slight impact on the MD-predicted Tg values.
![]() | ||
| Fig. 6 Summary of MD simulations and experimental validation. (a) Polyester structures with only MD simulations. (b) Polyester structures with both MD simulations and experimental validation. (c) A comparison of the calculated Tg, MD-predicted Tg, and experimental Tg, with Dataset B sourced from He et al.48 | ||
Fig. 6c illustrates a comparison of the calculated Tg values (Calc.), MD-predicted Tg values, and experimental Tg values (Exp.). Similarly, all experimental data points fall within the convex hull. The AEmax between the Calc. and Exp. values is 36.02 °C, with an AAE of 16.45 °C. A similar consistency is observed between the MD and Exp. values (AEmax of 42.25 °C and AAE of 19.55 °C). These results demonstrate that the Tg-QSPR model produces consistent results with both experimental measurements and MD simulations. They further confirm the effectiveness of the proposed polyester design strategy, providing a reliable approach to the high-throughput screening and rational design of polyesters with the desired thermal properties.
614 diacids and 9983 diols with SAscores ranging from 1.7 to 4.0.
Combinatorial enumeration of these designed diacids and diols generated over 100 million polyester structures, greatly enriching the diversity of candidate materials. A high-throughput evaluation of the Tg across the designed polymer library reveals a consistent trend with the typical thermal behavior observed in polyester materials. This statistical trend supports the effectiveness of the proposed monomer-design based methodology. Furthermore, the strategy was validated through a two-level verification process, in which the Tg values predicted by the Tg-QSPR model were first examined by MD simulations and subsequently confirmed by DSC experiments. The calculated Tg values show good agreement with both MD simulations (AEmax of 38.94 °C and AAE of 17.54 °C) and experimental measurements (AEmax of 36.02 °C and AAE of 16.45 °C). This consistency further confirms the reliability and robustness of the design approach that significantly expands the chemical space of polyesters. The expanded polyester library is expected to accelerate real-world polymer discovery and enable the development of high-performance materials for packaging, biomedical devices, and sustainable plastics.
It is worth emphasizing that diacids and diols, as highly reactive key intermediates, play an important role in the construction of complex organic molecules such as drug compounds and fine chemicals. Therefore, this strategy also carries broader chemical implications beyond polyester design, offering potential insights for the development of functional molecules.
Supplementary information (SI): additional results. See DOI: https://doi.org/10.1039/d5sc07720f.
Footnote |
| † Both authors contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2026 |