Open Access Article
Joseph A.
Pugar†
a,
Calvin
Gang
b,
Isabelle
Millan
a,
Karl
Haider
c and
Newell R.
Washburn
*ab
aDepartment of Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA. E-mail: washburn@andrew.cmu.edu
bDepartment of Chemistry, Carnegie Mellon University, Pittsburgh, PA 15213, USA
cCovestro AG, Pittsburgh, PA 15205, USA
First published on 4th November 2025
Polyurethane prepolymers are essential intermediates in the production of polyurethane foams, films, and elastomers, with viscosity playing a critical role in formulation, processing, and manufacturing. Despite its importance, there are no models that quantitatively predict the viscosity of a given polymer as a function of monomer chemistry. Traditional empirical models can effectively capture viscosity trends but often require extensive experimental datasets and provide limited interpretability, particularly when applied to novel formulations. Here, we explored regression options for representing polymer chemistry and for modeling the form of the temperature dependence. Monomers can be represented as a formulation in which they are labeled by monomer name or in a physicochemical framework where they are labeled by molecular characteristics derived from experimental measurements or computational methods. The overall form of the temperature-dependent viscosity can be modeled through a generic regressor function or by assuming the empirical form of the Andrade equation. A 39-sample training library was used to evaluate both approaches, with machine learning models achieving a coefficient of determination (r2) of 0.71 for the chemical model testing data and 0.86 in predicting the Andrade equation parameter, which provides interpretable access to the continuous viscosity-temperature curve, for previously untested compositions. While chemically defined models offer a direct path to high-accuracy predictions within known compositional spaces, physicochemical informed models provide deeper insight into structure–property relationships, facilitating extrapolation to novel materials. This work underscores the tradeoffs between empirical and physics-informed modeling strategies and offers a structured approach to integrating domain knowledge into predictive frameworks for complex material systems.
The need to accurately predict prepolymer viscosity has motivated the development of both composition (chemical) and physics-informed (physicochemical) modeling approaches. Recent advances in machine learning–assisted modeling have demonstrated the potential to predict viscosity evolution in polyurethane systems across a range of chemistries and processing routes, from composite molding to modified asphalt binders.17–20 These studies illustrate how data-driven models, often coupled with rheokinetic analysis, can accelerate formulation optimization and process control. Composition models, driven primarily by formulation-specific data, offer practical and accurate predictions within known compositional spaces. Furthermore, design-of-experiments (DOE) approaches—such as factorial designs and response-surface methodologies—provide a systematic framework for exploring key formulation factors and their interactions with minimal experimental runs.21,22 However, empirical models lack interpretability and fail to generalize beyond the data used in training, limiting their utility in deeply studying the material library or for generalizing the findings to discover novel materials. In contrast, physics-informed models leverage molecular-scale features (some of which can be directly borrowed from existing empirical or physical models), such as hydrogen bonding interactions and molecular weight distributions, to establish a more mechanistic understanding of structure–property relationships.23,24 By integrating physicochemical insights into DOE, one can both guide the selection of experimental factors and interpret complex interaction effects, thereby reducing the experimental burden while enhancing the mechanistic validity of the resulting models. Building on this foundation, the present work extends these concepts to prepolymer viscosity prediction by comparing both composition-driven and physicochemical modeling frameworks.
These models hold potential for extrapolating viscosity predictions to novel formulations, as illustrated in the schematic in Fig. 1, albeit at the cost of greater complexity in data acquisition and feature engineering. This duality presents a fundamental trade-off in the predictive modeling of complex systems, demonstrated in the following analysis with PUR prepolymer viscosity. Balancing these contrasting objectives requires strategic consideration of the intended application—whether the goal is formulation optimization within a known space or the discovery of new material combinations with improved performance characteristics. In this study, we aim to address this balance by presenting a dual modeling framework that leverages both chemical and physicochemical perspectives, offering insights into how these approaches can be strategically integrated to optimize both prediction accuracy and generalizability.
![]() | ||
Fig. 2 Urethane reaction for varying diisocyanate amounts. (A) Prepolymer reaction with 2 : 1 NCO : OH stoichiometry. (B) Semi-prepolymer reaction with 4 : 1 NCO : OH stoichiometry. | ||
The chemical diversity of polyurethane prepolymers further complicates viscosity prediction. Common polyol types (e.g., polyether, polyester, and polycarbonate) differ substantially in their hydrogen bonding potential, crystallinity, and hydrolysis resistance, impacting the mechanical and thermal performance of the final elastomer.1,24,28,29 Polyesters are most often used in elastomer formulations where the mechanical toughness that results from their ability to form strong inter-chain hydrogen bonds and their crystallinity is desired. Polyethers are amorphous and do not participate as significantly in hydrogen bonding, which results in a more compliant material, but they have the advantage of being more resistant to hydrolysis than polyester based PU elastomers. Polycarbonate polyols offer the combination of mechanical performance and hydrolysis resistance and therefore are used as building blocks for high-toughness PUR elastomers. Other processing considerations stem from polyesters and polycarbonates having high glass transition temperatures (Tg) due to backbone ordering and high polarity while polyethers have much lower Tg. During polymerization, the diisocyanates connect the polyols via urethane linkages to extend the chains to higher molecular weights. While doing so, thermodynamic and electronic dissimilarities between the alternating segments can alter chain dynamics with effects from intermolecular length scales to bulk network mobility.2,30–32 These same factors that govern the physical properties of the solid materials are also expected to influence melt properties, such as thermodynamics, dipole forces, and hydrogen bonding.
The temperature-dependent viscosities of polymer melts have been studied and modeled in a variety of different approaches. For moderate to high molecular weight polymers, complex viscoelastic responses dependent on shear history and shear rate complicate the creation of models that can capture behavior over a broad range of processing temperatures.33,34 For low molecular weight (less than a few thousand g mol−1) and under low-shear conditions, a first approximation can be made with an Arrhenius law or the related Andrade equation describing the viscosity (η) of liquids as a function of temperature (eqn (1)).
![]() | (1) |
![]() | (2) |
Similarly, the incorporation of the Tg into the temperature-dependent viscosity is also present in the WLF equation, which relates viscosity at the glass transition temperature (ηTg) to any temperature (T) with a shift factor (aT). The constants (C1, C2) are specific material constants usually found experimentally. The relationship is outlined in eqn (3).
![]() | (3) |
A further extension of both the Andrade and WLF equations for polymer solution viscosity is the Mark–Houwink (MH) equation (eqn (4)), which uses the molecular weight of the polymer (M) and fitted parameters (K, α) defined by the chemical environment. When tabulated, the two parameters are often accompanied by the temperature at which the experiment was conducted and thus provide an approximate temperature range for the accuracy of using the values to predict viscosity.
| η = KMα | (4) |
The empirical and analytical models used to describe PUR viscosity commonly rely on descriptors such as free volume, molecular configuration, and molecular size. However, the vast chemical diversity of PURs, encompassing numerous polyol backbones (e.g., polyethers, polyesters, and polycarbonates) and a wide range of diisocyanate structures, introduces substantial complexity into viscosity modeling. Even among industrially standard formulations, the number of variables influencing viscosity is considerable, making empirical curve-fitting for each new formulation inefficient.
To address this challenge, we explored two distinct but complementary modeling strategies. The first approach leverages composition vectors to represent each unique sample: a chemical or “black-box” approach more traditional to formulation optimization. These models require minimal domain knowledge and provide high accuracy within the compositional space on which they are trained. However, they lack physical interpretability and cannot extrapolate to different monomers. In contrast, the second approach aims to capture the underlying physicochemical drivers of viscosity by explicitly modeling the temperature-dependent response (e.g., via Andrade equations) and relating the fitted parameters to molecular-scale features. Here, variables derived from thermodynamics, electronic structure, and molecular geometry—such as dipole moment, polarizability, and topological surface area—are used to construct interpretable, mechanistically grounded models. This approach is more time-intensive and requires significant feature engineering, but it offers a clearer link between chemical structure and macroscopic behavior, enabling greater generalizability to untested formulations. Together, these approaches illustrate a strategic tradeoff between predictive convenience and mechanistic insight. The chemical model is well-suited for rapid screening and optimization within known design spaces, while the physicochemical model provides a pathway for materials discovery and formulation guidance when experimental data are sparse or unavailable. Bridging these perspectives offers a robust framework for both empirical deployment and scientific understanding.
The goal of the training library design was to maximize the range of polarity of the polyol backbone, the variety of chemical structures of the diisocyanate, and the range of stoichiometries used to prepare the prepolymer leading to a distribution of different prepolymer chain lengths, all while staying within the compositional bounds of industrial applications. One of each common polyol chemistries (ether, ester, carbonate) was incorporated into the library design. Secondly, the diisocyanate chemical structure was varied to incorporate differing thermodynamic incompatibilities throughout the dataset as well as the reactivity of functional groups. Lastly, for a fixed chemical structure combination, a series of different %NCO by weight formulations were made to create a series of viscosities within a suitable range. The final library had a total of 39 unique prepolymers. Table 1 contains a list of each prepolymer formulated.
| Sample IDs | Polyol | Diisocyanate | %NCO(s) |
|---|---|---|---|
| P_44M_4 (6, 8, 10) | PTMEG | 4,4-MDI | 4, 6, 8, 10 |
| P_MLQ_4 (6, 8, 10) | PTMEG | Mondur MLQ | 4, 6, 8, 10 |
| D_44M_4 (6, 8, 10) | Desmophen 2000 | 4,4-MDI | 4, 6, 8, 10 |
| D_MLQ_4 (6, 8, 10) | Desmophen 2000 | Mondur MLQ | 4, 6, 8, 10 |
| C_44M_6 (8, 10) | Desmophen-C 2202 | 4,4-MDI | 6, 8, 10 |
| C_MLQ_4 (6, 8, 10) | Desmophen-C 2202 | Mondur MLQ | 4, 6, 8, 10 |
| P_TD80_4 (7, 10) | PTMEG | TD80 | 4, 7, 10 |
| D_TD80_7 (10) | Desmophen 2000 | TD80 | 7, 10 |
| P_TDS_4 (7, 10) | PTMEG | TDS | 4, 7, 10 |
| P_I_5 (9) | PTMEG | Desmodur I | 5*, 9* |
| P_W_5 (9) | PTMEG | Desmodur W | 5*, 9* |
| D_I_5 (9) | Desmophen 2000 | Desmodur I | 5*, 9* |
| D_W_5 (9) | Desmophen 2000 | Desmodur W | 5*, 9* |
Samples were made using the first of the 2-step synthesis procedure for PUR polymerization. The macro polyols were heated in an 110 °C oil bath for 10 minutes before the stoichiometrically predetermined amount of diisocyanate was introduced to the mixture via syringe. The prepolymer reaction continued for 2 hours under the same temperature conditions and constant mechanical stirring at 200 RPM. After the allotted reaction time, each prepolymer liquid was poured into a polypropylene centrifuge tube, capped, and placed on the bench top.
After one week of ambient storage, each sample's temperature-dependent shear viscosity was measured with a DHR Rheometer. Each sample was heated to 40 °C and poured onto a Peltier plate which was subsequently heated to 80 °C with a (2 °C) per min temperature ramp. While the sample was heated, a 40 mm diameter 1° cone geometry sheared the sample at a constant 1 s−1 shear rate with viscosity and temperature data points acquired every 10 seconds. This low shear rate was deliberately chosen to isolate effects driven by chemical structure and thermophysical interactions, minimizing the confounding effects of shear-thinning behavior or entanglement typical of high molecular weight polymers. Each unique formulation was synthesized and tested once to generate a single viscosity–temperature curve, reflecting the common constraint of limited material availability in industrial screening and formulation environments.
Subsequently, each formulation's raw viscosity-temperature data was fit using the Andrade expression (eqn (2)), in which A represents the viscosity of the material as T → ∞, B represents the activation threshold for the material's viscous behavior, and C represents the reference temperature for viscous behavior or significant free volume change in the polymer material (Tg). The B-parameter for each unique sample was found by minimizing the chi-squared statistic between the raw data collected from the rheometer and an Andrade expression when C was set equal to each sample's base polyol Tgi.e., polytetrahydrofuran (PTMEG) = −80 °C, poly(ethylene adipate) diol (PEAD, commercial name: Desmophen 2000) = −73 °C, poly(hexamethylene carbonate) diol (PHMCD, commercial name: Desmophen-C 2202) = −64 °C. The A parameter was found to be randomly distributed values about a 0.001 Pa s mean and was not further considered during the modeling. Each error minimization resulted in an r2 ≈ 0.99 fit or better for the Andrade expression fittings. The collection of B-parameter values was therefore used to build GPR models trained to predict this viscosity activation energy parameter provided a physicochemical feature space modeled from the known chemical structure and stoichiometry of the prepolymer sample. To model the physicochemical feature space, stoichiometry and relative reactivities of the two isocyanate functional groups on the diisocyanate (provided by Covestro LLC) were used to convert percent weight NCO into the average degree of polymerization and average prepolymer molecular weight features. Density Functional Theory (DFT) calculations and cheminformatics computations were used to generate electronic, size, and shape descriptors of the prepolymers.
As discussed, molecular weight is the hallmark descriptor for viscosity in macromolecular fluids and with any condensation polymerization, the degree of polymerization given the stoichiometry of the motifs can be approximated with eqn (5) using equivalent weight ratios, assuming complete conversion. The average degree of polymerization of the prepolymers can also be used to approximate the molecular weight of the average prepolymer (PPMW) chain with eqn (6). Either of these transformations could be suitable proxies for chain length.
![]() | (5) |
| PPMW = DP(Mpolyol + Mdiisocyanate) | (6) |
Quantum chemical calculations of ground state electronic parameters of the repeat units of synthesized polymers have been a popular methodology for describing the electronic properties of chain segments.35 Motivated by the substantial literature on the roles of phase segregation and hydrogen bonding given the polarity of polyol backbone and hard segment structure, each motif's dipole moment (μ), polarizability (α), hyperpolarizability (β), and electronic energy (EE) were targets of DFT calculations. The repeat unit structures were drawn directly into the AMPAC 10.1 software, and the energy calculations were run with the Gaussian 16W engine. The B3LYP/6–311++G(2d,p) basis set was used for all calculations, performed in the gas phase with no implicit solvent model, using default tight SCF convergence criteria. No empirical dispersion correction was applied. All DFT calculations were executed using Gaussian 16 (Rev. C.01), with typical CPU time per optimization under 10 minutes on a 6-core processor.36 Electronic, shape and size cheminformatics descriptors were also calculated with the RDKit package.37 Both DFT and cheminformatics-based calculations were performed on either the polyol repeat units or the urethane-terminated prepolymer structures (Fig. 2). The calculated features were extracted from their respective output files and tabulated alongside the polyol Tg, DP, and PPMW in the candidate feature space (Table 2). Lastly, the reactivities of the first and second isocyanate groups were provided by collaborators at Covestro LLC.
| Feature | Description | Source |
|---|---|---|
| T g | Glass transition temperature of the base polyol | Data sheet |
| DP | Degree of polymerization of the prepolymer | Stoichiometry |
| PPMW | Molecular weight of the prepolymer | Stoichiometry |
| μ | Dipole moment | DFT |
| α | Polarizability | DFT |
| β | Hyperpolarizability | DFT |
| EE | Electronic energy | DFT |
| TPSA | Topological polar surface area | ChemI |
| ISF | Inertial shape factor | ChemI |
| R G | Radius of gyration | ChemI |
| M V | Molar volume | ChemI |
| Log P | Partition coefficient | ChemI |
| k 1 | Reactivity of the first isocyanate group | Covestro LLC |
| k 2 | Reactivity of the second isocyanate group | Covestro LLC |
Lastly, to reduce the dimensionality of the models built from the physicochemical feature space (Table 2) and improve model interpretability, we removed co-linear features and only kept the top ranking of the remaining after a SHapley Additive Explanations (SHAP) analysis. The co-linear removal consisted of taking every pair of features with > 0.8 Pearson scores and discarding the one with the lower co-linearity to the response variable. The SHAP analysis was subsequently performed on the remaining dataset to extract feature importance within a fit model. SHAP values provide a model-agnostic method for quantifying the marginal contribution of each feature to the prediction of the output parameter by computing the average effect of including each feature across all permutations of the feature set.38 This analysis allowed us to rank features by their global importance and identify the subset most strongly influencing model predictions. Using this approach, we iteratively down-selected the original high-dimensional feature library down to just the top three dominant physicochemical descriptors. These selected features captured the essential structure–property relationships governing viscous flow activation in the prepolymers and were sufficient to retain high model performance while enabling interpretation grounded in molecular behavior. For completeness, SHAP analysis was also performed on the chemical model to evaluate the contribution of temperature and formulation-specific components (e.g., specific polyol identity) to the prediction task. While these results are inherently less interpretable due to categorical input encoding, they are provided in the SI.
In a complementary approach, we introduced the physicochemical feature space into the model framework to attempt to capture mechanistic structure–property relationships with the η(T) response. Following feature selection (Fig. A1) and 5-fold cross-validation, this model achieved a moderate testing performance rtest2 = 0.73, RMSE = 16.58 Pa s (RF rtest2 = 0.60, RMSE = 20.12 Pa s) utilizing temperature, PPI, PolyLogP, k2, and IsoDM suggesting viscosity behavior can be captured with the right combination of temperature, size, polyol and isocyanate physicochemical features. However, it generalized poorly to withheld validation sets (H12MDI and IPDI), yielding an rval2 of −0.60, RMSE = 9.04 Pa s (RF rval2 of −0.33, RMSE = 8.22 Pa s). This discrepancy reflects a key limitation of models trained directly on discrete viscosity data: they risk overfitting and fail to capture the latent physical and chemical drivers of viscous flow. Motivated by this, we next turn to modeling a material's activation energy which provides a more generalizable and interpretable framework for predicting viscosity across diverse chemistries.
To develop a more generalizable model, the raw viscosity–temperature profiles of each of the 39 unique prepolymers were fit to a general exponential express of Andrade form. The B parameter, from the least squares fit of raw viscosity-temperature data, corresponding to the activation threshold for viscous flow, was used as the regression target. The models were trained to predict this B value using a sparse set of three physicochemical features identified via SHAP analysis (Fig. 4). The beeswarm–style plot indicates the importance of the average prepolymer molecular weight PPMW, the topological polar surface area of the polyol repeat unit Polyol TPSA, and the inertial shape factor of the isocyanate motif Iso ISF. At this time during modeling all other features were disregarded due to sufficient correlation with other features or because their presence did not increase model performance.
Performance of the physicochemical model is summarized in Fig. 5. On the training set, composed of prepolymers excluding those based on IPDI and H12MDI, the model achieved an rtrain2 = 0.96 RMSE = 24.76 °C (RF rtrain2 = 0.98 RMSE = 15.71 °C). On the validation set, which included previously unobserved IPDI- and H12MDI-based formulations, the model retained strong performance rval2 = 0.86 RMSE = 65.89 °C (RF rval2 = 0.78 RMSE = 83.53 °C), indicating successful extrapolation beyond the training distribution. The model's sparse physical feature set provides interpretable insight into the molecular mechanisms underlying viscosity activation. The dominant role of PPMW is consistent with empirical relationships such as the Mark-Houwink-Sakurada equation, which links molecular weight to viscosity through entropic constraints on chain mobility. Polyol TPSA captures the contribution of polar, hydrogen-bonding surface area to interchain interactions, effectively differentiating polyether, polyester, and polycarbonate polyols (further demonstrated by the clustering in this feature in Fig. 4). Iso ISF reflects differences in diisocyanate geometry (between aromatic and aliphatic diisocyanate structures) and polarity that may influence segmental rigidity and inter-chain orientation.
The chemical model, trained using GPR and RF on an expanded dataset of 117 data points, achieved a high degree of accuracy with no feature engineering and minimal preprocessing of the dataset. Its success highlights the value of structured composition vectors and serves as a benchmark for rapid screening applications. However, such models are fundamentally limited in their ability to extrapolate to new chemistries or uncover mechanistic trends (Fig. 1). To address these limitations, we transformed the modeling task using the Andrade equation, reducing each prepolymer's viscosity–temperature profile to a single activation parameter B, which encodes the energy barrier for flow. This scalar response variable, grounded in physical chemistry, was then modeled using a sparse set of three physicochemical descriptors selected using SHAP-based feature importance ranking from an initial high-dimensional library that included DFT- and cheminformatics-derived descriptors.
Each final model feature carries physical significance. PPMW captures chain size and entropic resistance to flow and aligns with classical scaling laws such as the Mark–Houwink–Sakurada equation, where polymer viscosity scales sublinearly with molecular weight (0.5 ≤ α ≤ 0.8). Polyol TPSA, a geometric proxy for polar surface area, captures hydrogen bonding potential and inter-chain interactions, distinguishing polyether, polyester, and polycarbonate architectures. Iso ISF, a shape-based descriptor, quantifies the anisotropy of the diisocyanate motif and reflects how steric factors and segmental symmetry influence packing, rigidity, and ultimately viscous resistance. Collectively, these three features span chain length, intermolecular cohesion, and molecular geometry—dimensions that directly influence viscous activation.
The resulting GPR model trained on this physicochemical feature space demonstrated strong generalizability, achieving r2 = 0.96 on the training set and r2 = 0.86 on a validation set comprising previously untested diisocyanate chemistries specifically H12MDI and IPDI. Its ability to interpolate across untrained regions of this defined chemical space suggests that the model captures the latent structure–property relationships governing viscous flow in aliphatic and cycloaliphatic diisocyanates. However, other chemical diversity and physicochemical gaps may still exist in the model e.g., capturing the nature of highly polar or highly flexible polyol backbones in PDMS-based or natural oil-based polyols are likely to result in predictive failure. The SHAP-based feature selection further emphasizes this boundary: polarizability, hydrogen bonding capacity, and backbone rigidity emerged as dominant factors influencing viscosity, highlighting the need to sample more chemically diverse formulations. This limitation represents a clear opportunity for future dataset expansion aimed at improving model robustness across broader formulation spaces.
By comparing these two approaches—the empirical chemical model and the physicochemical model—we highlight the trade-offs surrounding generalizability. The chemical model is quick to deploy, requires no feature engineering, and performs well within the bounds of known compositions. In contrast, the physicochemical model requires domain knowledge, descriptor curation, and greater upfront effort, but yields models that not only generalize beyond the initial library but also offer mechanistic insight into how structure governs flow, all the while providing the investigator with a fitted parametric equation for the prepolymer viscosity–temperature dependence η(T) = f(B).
While promising, this study also carries important limitations. Each formulation was synthesized and tested once, resulting in a small sample size with pseudo-replication introduced by temperature increments. The viscosity measurements were performed at a constant low shear rate, restricting applicability to Newtonian flow regimes. Additionally, although our library included chemically diverse diisocyanates it did not include more complex polyols, chain extenders, or additives/fillers that are often encountered in industrial formulations. These constraints reflect the realities of early-stage formulation and screening, where material throughput is limited, but they also define clear directions for expanding and validating this dual-modeling framework in future work.
Supplementary information: additional tables and figures. See DOI: https://doi.org/10.1039/d5dd00287g.
Footnote |
| † Present address: Department of Surgery, University of Chicago, Chicago, IL 60615. |
| This journal is © The Royal Society of Chemistry 2025 |