Huan
Tran
*a,
Chiho
Kim
a,
Rishi
Gurnani
a,
Oliver
Hvidsten
a,
Justin
DeSimpliciis
a,
Rampi
Ramprasad
a,
Karim
Gadelrab
b,
Charles
Tuffile
b,
Nicola
Molinari
b,
Daniil
Kitchaev
b and
Mordechai
Kornbluth
b
aMatmerize Inc., Atlanta, GA 30332, USA. E-mail: huan.tran@matmerize.com
bRobert Bosch LLC, Watertown, MA 02472, USA
First published on 3rd July 2025
Polymer composite performance depends significantly on the polymer matrix, additives, processing conditions, and measurement setups. Traditional physics-based optimization methods for these parameters can be slow, labor-intensive, and costly, as they require physical manufacturing and testing. Here, we introduce a first step in extending Polymer Informatics, an AI-based approach proven effective for neat polymer design, into the realm of polymer composites. We curate a comprehensive database of commercially available polymer composites, develop a scheme for machine-readable data representation, and train machine-learning models for 15 flame-resistant, mechanical, thermal, and electrical properties, validating them on entirely unseen data. Future advancements are planned to drive the AI-assisted design of functional and sustainable polymer composites.
Designing polymer composites, i.e., rationally identifying formulations that meet predefined criteria for specific applications, is traditionally challenging, costly, and time-intensive, as candidates must be physically synthesized and tested.1 Because of the inherent complexity of these materials, physics-based evaluation methods like molecular dynamics simulations and finite-element analysis are highly intricate, while quantum mechanical approaches such as density functional theory remain largely out of reach. Empirical models and rules, such as group contribution, the “rule of mixtures”,24–27 the “Cox-Merz rule”,28 and the “Halpin–Tsai equations”,29,30 provide practical alternatives within specific domains but come with their own limitations.31 Accordingly, a new robust evaluation method is essential to support experiments in polymer composite design.
Since the 2010s, machine-learning (ML) techniques have emerged as valuable complements to traditional approaches in materials science.32–40 In the field of polymer composites, ML has been used to accelerate simulations41 and predict physical properties42 such as conductivity,43,44 tensile strength,45–47 fracture behavior,48 and ductility.45,46 The training data of these models are predominantly experimental in nature,45,46,48 while some of them were generated using finite element method.43,44 The data volume is typically small, ranging from less than ten47 to a few dozens,45,46,49 and up to a few hundreds at most.48 Apparently, data shortage is a major challenges in the future of accelerated design of polymeric materials.39
This work aims to develop a set of robust ML models for polymer composites. To this end, we compiled and curated a database of over 5000 polymer composites, fabricated in laboratories and/or industry, with multiple measured properties. Using this database, six multi-task ML models were trained and deployed to predict 15 properties in 4 groups, including flame resistance, mechanical, thermal, and electrical characteristics. The developed models demonstrate good performance on the validation data curated separately and kept unseen to the entire process. We believe that ML, when combined with sufficiently large and diverse datasets and suitable representations, offers a pathway toward the accelerated design of polymer composites.
![]() | ||
Fig. 2 Two sources of polymer composites data curated for this work are (a) research articles and (b) technical datasheets/brochures provided by the manufacturers/distributors of commercialized products. Panel (a) was adapted from ref. 50 with permission while panel (b) was taken from a product brochure obtained from https://www.albis.com. |
Data provided in technical datasheets of commercialized polymer composites are generally less detailed. Fig. 2(b) shows a top part of the brochure of “Ultramid® B3G7 R02”, a product of BASF. This material is labeled as PA6-GF33, implying that it consists of Nylon 6 (PA6) as the polymer matrix and 33% of glass fibers (GF). Such conventions are fairly standard across the polymer composite industry,21 although interpretations are not always straightforward. In case of “ALCOM® PA66 910/1.3 CF/GF30”, a product of MOCOM Compounds Corporation, the label PA66-(CF + GF)30 implies that it contains PA66 polymer matrix and a total of 30% of glass fibers and carbon fibers (CF), but their separate compositions are unknown. Likewise, in the label of (ABS + PA6)-GF8 used for “Terblend® N NG-02 EF” (supplied by INEOS Styrolution), the polymer matrix is a blend of ABS and PA6, but their compositions are also unavailable. In another example, the label of PA6-GF30 FR used for “ALTECH PA6 A 2030/140 GF30 FR” (also provided by MOCOM) indicates that this material contains some flame retardants (FR), but does not provide their identity and compositions.
Such information incompleteness is expected to impede the targeted models in certain ways, for example, by introducing some level of uncertainty in the model's inferences. Nevertheless, if the database is large enough, the undesirable effects of missing data might be partially neutralized and diminished. On the other hand, data extracted from technical datasheets is critically important for our users, as it pertains to materials that are currently available on the market and can be readily purchased in a large quantity.
Our polymer composite database, curated from the two major sources and summarized in Table 1, contains 15 datasets for 15 flame-resistant, mechanical, thermal, and electrical properties. The flame-resistant datasets were curated from hundreds of research articles while the mechanical, thermal, and electrical datasets were extracted from about 10000 technical datasheets, manually collected for about 5000 commercialized polymer composites. The reported properties were measured under some widely recognized standards, e.g., ASTM E1354 (Cone calorimeter) and ASTM E662 (smoke chamber) for the flammability properties and ISO 527-1/-2 for the mechanical properties. Therefore, testing/measurement conditions are consistent across different sources for the same property/group of properties of the polymer composites.
Class | Property | Standard | Unit | Data range | Data size |
---|---|---|---|---|---|
Flame resistant | TTI | ASTM E1354 | S | 3.0–281.3 | 527 |
PHRR | ASTM E1354 | kW m−2 | 12.9–1876 | 576 | |
AHRR | ASTM E1354 | kW m−2 | 58–750 | 100 | |
THR | ASTM E1354 | MJ m−2 | 2.5–609 | 316 | |
D s | ASTM E662 | — | 0.1–857 | 474 | |
D max | ASTM E662 | — | 1.0–964 | 124 | |
Mechanical | E | ISO 527-1/-2 | MPa | 7.4–38![]() |
4098 |
σ break | ISO 527-1/-2 | MPa | 12–329 | 2738 | |
Thermal | T g | ISO 11357-1/-2 | C | −109–337 | 608 |
T m | ISO 11357-1/-3 | C | 122–388 | 2044 | |
α long | ISO 11359-1/-2 | 10−6 K−1 | −2.4–250 | 3373 | |
α tran | ISO 11359-1/-2 | 10−6 K−1 | 1.17–230 | 2889 | |
Electrical | ε 100 Hz | IEC 62631-2-1 | — | 2.5–15.0 | 813 |
ε 1 MHz | IEC 62631-2-1 | — | 2.5–7.0 | 797 | |
E bd | IEC 60243-1 | kV mm−1 | 15–50 | 611 |
As discussed above, the description of the materials, needed for the inputs of the ML models, is generally more complete in the research articles than in technical datasheets. The identity and the composition of the polymer matrix and additives are available in the flame-resistant datasets. However, such information is not always available in the mechanical, thermal, and electrical datasets. In some entries, the compositions of polymer matrix blend and the additives may be missing. Notably, for those involving flame retardants, no information on their identity and composition is available. A snapshot of the flame-resistant, mechanical, thermal, and electrical datasets is given in Fig. 3 while more information on the polymer matrices, the additives, and the flame retardants can be found in ESI.†
![]() | ||
Fig. 3 Top ten base polymer matrices in four group of polymer composite datasets curated and used for this work. |
Traditionally, each ML model is trained independently on a single dataset in a procedure known as single-task (ST) learning. On the other hand, multi-task (MT) learning combines multiple related datasets to train a single model, leveraging potential correlations among material properties rooted in physical and chemical laws. Technically, these datasets are stacked together and indicated using an additional selector vector appended to the standard descriptors. The combined dataset can be used for any learning algorithm. In this work, MT learning is referred to as “physics-informed” (Pi) learning, as it uses augmentation data to implicitly convey these physics-containing correlations without requiring explicit mathematical expressions. Pi/MT approach is different from “physics-enforced” learning methods, which rely on directly encoding the correlations, given in terms of specific mathematical expressions, into the model. This study examines the Pi/MT approach against traditional ST learning for developing the targeted ML models (see Section 3 for details).
Feature | Description | Applicable to |
---|---|---|
cat_polym | Categorical, PA6, ABS, PBT, etc. | All models |
num_gf | Numerical, weight fraction of glass fibers | All models |
num_cf | Numerical, weight fraction of carbon fibers | All models |
num_gb | Numerical, weight fraction of glass beads | Thermal, mechanical, & electrical models |
num_md | Numerical, weight fraction of minerals | Thermal, mechanical, & electrical models |
num_density | Numerical, material density (g cm−3) | Thermal, mechanical, & electrical models |
cat_impact | Categorical, yes/no, if impact modifier included or not | Thermal, mechanical, & electrical models |
cat_condition | Categorical, dry/conditioned, measurement condition | Thermal, mechanical, & electrical models |
cat_rif1 – cat_rif2 | Categorical, identity of other reinforcements if included | Flame-resistant models |
num_rif1 – num_rif2 | Numerical, weight fraction of other reinforcements if included | Flame-resistant models |
cat_adv1 – cat_adv2 | Categorical, identity of other additives if included | Flame-resistant models |
num_adv1 – num_adv2 | Numerical, weight fraction of other additives if included | Flame-resistant models |
cat_fr1 | Categorical, yes/no, if first flame retardant included | Thermal, mechanical, & electrical models |
Categorical, identity of first flame retardant if included | Flame-resistant models | |
num_fr1 | Numerical, weight fraction of first flame retardant | Flame-resistant models |
cat_fr2 – cat_fr4 | Categorical, identity of other flame retardants if included | Flame-resistant models |
num_fr2 – num_fr4 | Numerical, weight fraction of other flame retardants | Flame-resistant models |
num_cone_heatflux | Numerical, incoming heat flux (kW m−2) in ASTM E1354 test | TTI, PHRR, AHRR, & THR models |
num_cone_thickness | Numerical, thickness (mm) of the sample in ASTM E1354 test | TTI, PHRR, AHRR, & THR models |
num_smoke_heatflux | Numerical, incoming heat flux (kW m−2) in ASTM E662 test | D s & Dmax models |
num_smoke_thickness | Numerical, thickness (mm) of the sample in ASTM E662 test | D s & Dmax models |
cat_flaming | Categorical, true/false, flaming mode in ASTM E662 test | D s & Dmax models |
num_smoke_time | Numerical, time (s) of the optical smoke density measurement | D s & Dmax models |
The flame-resistant models share several descriptors with the thermal, mechanical, and electrical models, including cat_polym, num_gf, num_cf, and cat_fr1. For cat_fr1 specifically, this descriptor specifies the identity of the first flame retardant, if present, while num_fr1 gives its composition. This numerical descriptor is unique to the flame-resistant models due to the absence of such data, as discussed above, in models of the other properties. Since materials in the flame-resistant datasets can contain up to four flame retardants, additional descriptors (cat_fr2, num_fr2, cat_fr3, num_fr3, cat_fr4, num_fr4) were included. Similarly, to account for up to two additional reinforcements and two additives beyond glass and carbon fibers, the descriptors cat_rif1, num_rif1, cat_rif2, num_rif2, cat_adv1, num_adv1, cat_adv2, and num_adv2 were used.
Beyond material descriptors, additional features are required for the specific tests measuring flame-resistant performances. Cone calorimeter tests, conducted under ASTM E1354, measure time to ignition (TTI), peak heat release rate (PHRR), average heat release rate (AHRR), and total heat release (THR). Two key parameters of the tests, i.e., the incoming heat flux and the sample thickness, are described by num_cone_heatflux and num_cone_thickness. Likewise, smoke chamber tests, following ASTM E662, measure optical smoke density (Ds) and maximum optical smoke density (Dmax) under flaming or non-flaming mode. Therefore, for Ds and Dmax models, num_smoke_heatflux, num_smoke_thickness, cat_flaming (flaming vs. non-flaming mode), and num_smoke_time (measurement time) are included, as Ds is time-dependent.
This choice of descriptors may not be ideally comprehensive or complete, potentially omitting useful information if SMILES strings or polymer categories are available and usable. Nevertheless, for the curated data, this technical solution offers not only respectable model performance (discussed in Section 2.4) but also the convenient simplicity needed by the majority of model users.
As expected, the physics-informed MT models are systematically better than the corresponding ST models in multiple measures of performances, including the determination coefficient R2, the absolute root-mean-square error aRMSE, and the relative root-mean-square error rRMSE, defined as the ratio between aRMSE and the whole range of the true data. While aRMSE cannot be compared across different datasets and models, rRMSE is more reliable for this purpose. These 3 performance metrics, computed on the training data, are summarized in Table 3. Among 15 models, 12 of them reach R2 > 0.9, while other 2 models have R2 > 0.8; rRMSE metric for all of them is about 5–6% and below. The electric strength model has a moderate R2 = 0.57 and rRMSE ≃ 12%. This result is reasonable and promising, given that our database suffers from unavoidable missing information and that the electric strength is related to and governed by multiple physics-based processes, spanning over multiple length and time scales, and thus understanding it is always highly challenging.52–54 These 5 models, visualized in Fig. 4, are available in PolymRize™.55
Model | Training | Validation | ||||
---|---|---|---|---|---|---|
R 2 | aRMSE | rRMSE | R 2 | aRMSE | rRMSE | |
TTI | 0.95 | 9.9 | 0.036 | 0.73 | 17.7 | 0.071 |
PHRR | 0.94 | 86.1 | 0.046 | 0.74 | 154.7 | 0.124 |
AHRR | 0.96 | 32.5 | 0.047 | 0.81 | 57.3 | 0.124 |
THR | 0.97 | 17.9 | 0.029 | 0.34 | 35.05 | 0.172 |
D s | 0.99 | 18.4 | 0.021 | 0.78 | 116.2 | 0.142 |
D max | 0.99 | 25.1 | 0.026 | 0.89 | 69.7 | 0.105 |
E | 0.97 | 944 | 0.025 | 0.98 | 624 | 0.030 |
σ break | 0.91 | 16.3 | 0.052 | 0.92 | 14.0 | 0.065 |
T g | 0.98 | 8.76 | 0.020 | 0.97 | 8.54 | 0.038 |
T m | 0.98 | 6.78 | 0.025 | 0.98 | 3.13 | 0.033 |
α long | 0.92 | 13.6 | 0.054 | 0.92 | 11.2 | 0.064 |
α tran | 0.83 | 14.3 | 0.062 | 0.52 | 13.7 | 0.121 |
ε 100 Hz | 0.97 | 0.65 | 0.052 | 0.81 | 1.3 | 0.096 |
ε 1 MHz | 0.85 | 0.25 | 0.055 | 0.48 | 0.41 | 0.140 |
E bd | 0.57 | 4.21 | 0.120 | 0.14 | 5.04 | 0.219 |
These deployed models were then validated on 15 completely unseen datasets curated independently. For each of them, the data were featurized and the targeted properties were predicted and compared with the ground truth. Predictions for time to ignition TTI, peak heat release rate PHRR, averaged heat release rate AHRR, total heat release THR, optical smoke density Ds, and maximum optical smoke density Dmax, tensile modulus E, stress at break σbreak, glass transition temperature Tg, melting temperature Tm, longitudinal coefficient of thermal expansion αlong, transverse coefficient of thermal expansion αtran, relative permittivity at 1 MHz ε1 MHz, relative permittivity at 100 Hz ε100 Hz, and breakdown electric strength Ebd on the unseen validation data are shown in Fig. 5. For all of the models, the predictions agree very well with the ground truth and aRMSE that is comparable with that reported in Table 3. In summary, all 5 MT models for 15 flame-resistant, mechanical, thermal, and electrical properties can reasonably predict the unseen data, suggesting that the training data of these models are sufficiently big and diverse to represent the common cases of polymer composites.
The main rationale of the Pi/MT approach is that by deliberately generating, producing, supplying, and thus, “informing” the training process with data of related properties, the target ML models can be improved.39 There is, in principle, no limit in the nature and the volume of the augmented data. Moreover, the expected correlations among the datasets are not required to be materialized into any solid mathematical expression. With these two major advantages, the physics-informed MT approach is expected to be widely used in the research area of polymer composites.39
From the ML perspective, the physics-informed MT learning approach consistently outperformed traditional ST learning, where each model is independently developed for a single property. Prior studies56,57 suggest that MT architectures can capture hidden correlations among related properties. This work supports that theory. Nevertheless, small data size and large data noise, both of which are common in practice, can suppress the correlations and limit the MT learning efficiency. Addressing these issues remains open for future works.
Manual data curation, as performed here, is unsustainable given the abundance of polymer composite data. Advances in natural language processing, including large language models, named entity recognition, normalization, relation extraction, and co-referencing, may soon offer scalable solutions. Additionally, representing base polymers by name or label, as done in this study, is suboptimal. Future improvements could involve acquiring SMILES strings51 for all polymers and extending chemical fingerprinting schemes36,37 to better handle cross-linking polymers and other complex classes, further advancing model performance.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4py01417k |
This journal is © The Royal Society of Chemistry 2025 |