Open Access Article
Rigoberto
Advincula
*,
Ilia
Ivanov
,
Rama
Vasudevan
,
Rajeev
Kumar
,
Panagiotis
Christakopoulos
,
Marileta
Tsakanika
,
Jihua
Chen
,
Jan Michael
Carillo
,
Qinyu
Zhu
and
Bobby
Sumpter
Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA. E-mail: advincularc@ornl.gov
First published on 12th August 2025
Creating and curating new data to augment heuristics is a forthcoming approach to materials science in the future. Highly improved properties are advantageous even with “commodity polymers” that do not need to undergo new synthesis, high-temperature processes, or extensive reformulation. With artificial intelligence and machine learning (AI/ML), optimizing synthesis and manufacturing methods will enable higher throughput and innovative directed experiments. Simulation and modeling to create digital twins with statistical and logic-derived design, such as the design of experiments (DOE), will be superior to trial-and-error approaches when working with polymer materials. This paper describes and demonstrates protocols for understanding hierarchical approaches in optimizing the polymerization and copolymerization process via AI/ML to target specific properties, using model monomers such as styrene and acrylate. The key is self-driving continuous flow chemistry reactors with sensors (instruments) and real-time ML with an online monitoring set-up that allows a feedback loop mechanism. We provide initial results using ML refinement of the classical Mayo–Lewis equation (MLE), time-series data, and an autonomous flow reactor system build-up as a future data-generating station. More importantly, it lays the ground for precision control of the copolymerization process. In the future, it should be possible to undertake collaborative human–AI-guided protocols for the autonomous fabrication of new polymers guided by literature and available data sources targeting new properties.
This paper will give an example of ML-driven approaches in the synthesis, nanostructuring, and characterization of the polymerization process that is critical for any AI/ML-driven optimization and advanced manufacturing, using model monomers such as styrene and acrylate.3–5 At the Center for Nanophase Materials Sciences (CNMS), Oak Ridge National Laboratory (ORNL), we have gained much insight into developing the methods and instrumentation for ML-driven reactions with advanced characterization methods and access to high-performance computing (HPC) resources.6 This will have significant implications in using large language models (LLM) and other training/learning sets for optimizing specific properties and applications in soft matter. With continuous flow reaction chemistry,7,8 and/or tandem with time-stamped batch or semi-batch reactions, it is possible to incorporate a new instrumentation design application programming interface (API) in mechatronics or robotics. In situ or operando hyphenated analytical tool automation to provide a continuous feedback loop for optimization. Digital twins developed with atomistic to coarse grain methods, including molecular dynamics (MD) and density functional theory (DFT), make it possible to translate the molecular or macromolecular level connectivity towards optimized synthesis protocols. The plan is to integrate (Fig. 1): (1) a controlled and well-designed continuous flow chemistry reactor or separator (plug flow, packed bed, membrane separated, etc.), (2) an ML-driven workflow for in situ multimode chemical/materials characterization9 that can include instruments such as NMR, ESR, IR, Raman, UV-Vis, GC-MS, and HPLC, with flow adoptors; (3) use of an edge server or computer for integration of control and feedback/analysis/data storage of the control variables; (4) a core software stack that can be enabled for deep learning (DL) and reinforcement learning (RL); and (5) access to on-demand computing architectures that can parse calculations to compute resources needed. These computational resources include light-weight edge, mid-level edge (NVIDIA DGX-2), and connectivity to high-performance computing (HPC) or even exascale computing resources (Frontier). We demonstrate preliminary results and potential on how this ML-driven autonomous reactor system can enhance the ability to deliver homopolymers, copolymers, site-substituted molecules, and deuterated molecules.
![]() | ||
| Fig. 1 Here, we describe the steps for developing an AI/ML-driven autonomous continuous flow reactor synthesis, using model monomers such as styrene and acrylate. (The figure is from ref. 10 without change, under creative commons license CC BY-NC-ND 4.0.11). More engineering parametrization of organic or polymerization reactions can be done in a plug flow and continuous flow reactor system with a built-in sensor and edge server that controls and monitors the reaction. | ||
The microstructure and copolymer composition are essential in controlling the properties of polymer materials. The Mayo–Lewis equation (MLE) transformed copolymerization practice and theory with a correlation of co-monomer and copolymer composition as reactivity ratios (rr). Various forms of linear to nonlinear equations, all based upon the terminal model (TM) of copolymerization, have been developed to estimate rr values through the best fit of experimentally determined copolymer compositions with comonomer composition and/or monomer conversion, for monomers including styrene, acrylate, and others. Linear regression methodologies have been replaced by powerful nonlinear numerical methods that provide a statistically refined estimation of the rr. However, the MLE is not suitable for some systems where the MLE fails to capture the influence of penultimate unit effects, depolymerization, and system (e.g., solvent, concentration, pH) media. Discrepancies reported in the initial monomer/initiator and final copolymer composition can also be linked to kinetic fundamentals. Still, there is a lack of appreciation of reaction engineering and unit operations for copolymer synthesis. There is a need to demonstrate the use of ML and autonomous ML methods in continuous flow chemistry to update the MLE.12
Several key questions can be asked:
(1) How can the various mechanistic models for polymerization (step-growth, chain-addition, metathesis, etc.) be augmented by ML-driven control of the reaction in a unit operation?
(2) How can we harness theory and simulation (MD, DFT, SCFT, etc.) to create the polymerization process's effective and hierarchical digital twins?
(3) How can we utilize LLMs appropriately or effectively in defining the objectives or the training data for specific polymerization reactions or converting it to a ChatGPT experience? Is this necessary?
(4) What are the recent breakthroughs at the intersection of materials science, chemistry, and applied computing that can specifically accelerate the discovery of new polymer synthesis methods and properties?
(5) What are the critical challenges in developing self-driving autonomous reactors with homogeneous or heterogeneous platforms (with sensors and instruments) that are fundamentally relatable to physical and chemical principles in the polymerization process?
(6) How can we bridge the gap of decision-making timescales with the initially set ML-driven variables and, at the same time, take advantage of LLMs in polymer literature? How helpful is this exercise, and is the human-in-the-loop necessary?
In a recent ORNL-led workshop, “Shaping the Future of Self-Driving Autonomous Laboratories”, the meeting brought together scientists, computer experts, chemists, and other AI/ML experts to discuss the need for more AI-driven autonomous laboratories.13 In polymers, it is possible to revisit the high potential of self-driving laboratories to contribute much to science and manufacturing. In the future, there is a high potential for automated metadata collection systems for polymer science powered by AI and the creation of hybrid AI systems that combine data-driven learning with fundamental scientific principles. However, a big question in the future is the need for human oversight and expertise, the “expert-in-the-loop”, while leveraging automation, ensuring that human scientific intuition and creativity are preserved and enhanced rather than eliminated in polymer science.
Despite being extensively adopted in the pharmaceutical industry, continuous-flow polymerization is just emerging as a powerful technology for precision in polymer synthesis.10 The advantages of rapid mixing and good heat/matter transfer in a tubular reactor offer instantaneous initiation or termination of polymerizations with fast kinetics. One of the most critical parameters for polymerization underflow is the efficacy of mixing based on residence time distribution (RTD); a small RTD will result in well-controlled polymerization about monomer conversion, polymer molecular weight, and dispersity.10
An automated reactor sampling system: Mettler-Toledo EasySampler for an aliquot sample and programmable unattended representative sampling and dosing beyond pipettes and syringes were used; a JKEM programmable robotic dosing and sample platform. These are used for time-sampling batch reactions. For continuous flow reaction chemistry, we have used the following: (1) a Phoenix Flow Reactor system with high temperature/high-pressure capabilities to enable synthesis in a vast parameter space not achievable with standard laboratory equipment (lower pressures, ambient temperature, less variable flow rates). This system is capable of temperatures up to 450 °C and pressures up to 200 bar. The H-Genie hydrogen generator is for on-demand hydrogenation and various modules: a gas module, mixer module, pressure module, and a systems controller and software. (2) Vapourtec reactor: a continuous flow or plug flow chemistry (R-series) with temperature, pump (pressure), and mixing control. (3) The Snapdragon Chemistry (Cambrex) is the most modular and integrated system with its laboratory operating system (LabOS). It has operability up to large-scale GMP commercial manufacturing. All the systems, beyond our demonstration for polymerization, are helpful for reaction kinetics screening, photochemistry and electrochemistry, and continuous – extraction, separation, and crystallization.
The pumps and temperature controllers were controlled via the existing LabOS API used by the vendor (SnapDragon), which utilizes a web server interface from which customized experimental workflows can be set up within the environment. However, this setup is somewhat limited in flexibility and features. To facilitate programming via python, we utilized simple http requests to interrogate sensor readings and adjust the necessary flow rates directly via python's requests module, which we ran from a separate machine that could communicate with the server running LabOS. This way, we could separate the ‘action’ commands from the central command and control server. We further connected several independent instruments to this server, including the FT/IR, Raman, UV-Vis, and a quartz crystal microbalance (QCM) system to enable characterization data to be available during the experiment. Although python APIs were available for both the UV-Vis and QCM systems, making integration simple, neither the FT/IR nor the Raman systems allowed for external control and API limitation. Instead, we relied on automated batch data collection where spectra are continually acquired every N minute and stored in a folder, which was interrogated based on time stamps to link with the currently active experiment. Data visualization was provided to the user through plotly14 dash web applications, and data standardization was accomplished through the pycroscopy ecosystem's sidpy package.15 The entire setup enabled the easy setup of customized experimental protocols, entirely written in python and utilizing one or several of the available characterization tools. A connection to a GPU-enabled server (Nvidia DGX-2 workstation) was available for advanced data analysis.
Based on the Mayo–Lewis model, we established a minimal coarse-grained molecular dynamics (CGMD) simulation protocol of batch copolymerization. Using the LAMMPS simulation package,16 we implemented the fix bond/create command to facilitate the creation of bonds during the polymerization process.17–19 The simulations used an implicit solvent model, assuming good solvent conditions for both polymer species. The initial conditions include a predetermined quantity of dimer initiators and free monomers randomly distributed in the simulation box. Instead of directly setting up the reaction rate constants in the kinetic equations, we assigned the probability of each bond formation event. We captured the varying reactivity ratios associated with the limiting cases outlined in Table 1. Specifically, the probabilities of the various bond-forming events are denoted as P1, P2, P3, and P4, respectively. Thus, these probabilities allow the calculation of reactivity ratios r1 and r2 as r1 = P1/P2 and r2 = P3/P4, respectively.
The reactivity ratios, Q–e scheme, developed by Alfrey and Price,26 relates the reactivity ratios of monomers to their electronic and steric parameters. The Q (reactivity parameter) and e (polarity parameter) values are assigned to each monomer to capture electronic and steric influences on reactivity. Monomers with similar Q–e values tend to copolymerize more randomly, while those with disparate values lead to more blocky structures.24,26 Mark–Houwink–Sakurada correlation links reactivity ratios with intrinsic viscosity parameters to understand how monomer choice and reactivity ratios affect the molecular weight distribution in copolymers. It's often used with the Flory–Huggins interaction parameter to predict polymer properties related to solubility and compatibility in copolymer blends.27–29
In principle, all existing approaches for predicting the morphology of the copolymer can be combined. The computational output of the digital twin comprises time-dependency of reactants–product dependency, reactivity ratios (calculated using the Kelen–Tüdös method if not known), copolymer composition (based on the reactivity ratios), molecular weight, and polydispersity.
For example, the changes in the reactivity ratio of methacrylic acid are commonly explained in terms of solvent–monomer association, radical stabilization, solvent electron and H-donating properties, as well as solvent dielectric properties.30–33
Analysis of the reactivity ratio values of monomers for participating in copolymerization reactions can be used to predict the copolymer sequence.22–25,34 For instance, when the values of reactivity ratios of monomers M1 and M2 are around one, r1 = r2 ∼ 1, one should expect a random sequence of copolymer. An alternating copolymer product is anticipated for r1, r2 < 0.3, M1 preferred homopolymer (r1 > 1.0, r2 < 0.1), M2 preferred homopolymer (r1 < 0.1, r2 > 1.0), ladder-copolymer (−0.01 < r1 < 0.05, r2 = −0.01 < r1 < 0.05), gradient copolymer (r1 = 0.5, r2 = 2.0, 0.4 > r1 < 0.7, 0.2 < r2 < 0.4).
One would not expect a sharp transition between the regions with different sequences but a gradient transition from one sequence to another (producing a mixture of copolymer morphologies). Applying these extended conditions to selecting reactivity ratios for styrene copolymerization extracted from CopolyDB, we generated a graph of log-scale-reactivity ratios where the predicted copolymer sequence are color coded. Tuning the value of the reactivity ratios of monomers towards the sequence of copolymer may be challenging to achieve, especially in the case of gradient copolymers.
Ultimately, assuming copolymer production is the target of the research effort, the yield and the rate of copolymer production can be optimized using digital twins. Simulated synthetic data generated by the digital twin can be used as a training dataset for supervised training targeting, such as predicting rate constants/reactivity ratios from monomer chemical descriptors and reaction conditions. Uncertainty quantification would allow analysis of the sensitivity of input reaction variables for a better understanding of reaction control options.
The structure of the monomers participating in copolymerization determines the value of the reactivity ratio, which in turn defines the structure of the copolymer (using the Mayo–Lewis reaction mechanism). We compared the extracted monomer rdkit molecular descriptors (MolWt, log
P, and TPSA) and the monomer reactivity ratios summarized in the CopolDB.35 The results are summarized in the form of a correlation matrix (red color indicates positive and blue color negative correlations) in Fig. 2.
It was determined that the solvent has a significant effect on kinetics, thermodynamics, and the mechanism of the copolymerization reaction while explicitly not accounted for in Mayo–Lewis's reaction mechanism. The mechanism of the solvent effect on the copolymerization involves changes in material solubility, reaction thermodynamic parameters, and variations in the propagation and termination rates (particularly in viscous solvents with high dielectric constants). H-bonding interactions with the monomer may also influence copolymerization. Fig. 3 illustrates the variation in the values of the reactivity ratios for copolymerization of methacrylic acid and styrene. In protic solvents, the carbonyl group of methacrylic acid will act as an H-bond acceptor, changing the stability of the radical, reducing electron deficiency of the double bond, and decreasing the reactivity ratio of the methacrylic acid. This effect is minimal in aprotic solvents, and the reactivity ratio of methacrylic acid shows larger values. The tunability of the rr, using the solvent effect, enabled control over polymer morphology without changing the structure of the monomer. Fig. 3b shows variation in the copolymer morphologies anticipated in the reaction of methacrylic acid and styrene carried out in different solvents.
The changes in rr of methacrylic acid are explained in terms of solvent–monomer association, radical stabilization, solvent electron, and H-donating properties, as well as solvent dielectric properties.30–33 As solvent electron donating properties increases, and methacrylic acid's reactivity ratio decreases by ∼23% from 0.56 (solvent-free) to 0.47 (DMSO). In contrast, the reactivity ratio of styrene steadily increases by a factor of 2.8 (0.224 in solvent-free to 0.63 in DMSO).30 A line between corresponding reactivity ratios of monomers provides a visual reference that while solvents affect methacrylic acid reactivity ratio, in four reactions, the reactivity ratio of styrene is affected as well and exceeds that of methacrylic acid.33
Using previously established rr and copolymer sequence correlation, we anticipate that changes in the solvent could vary copolymer sequence from random in acetone, 1,4-dioxane (balanced reactivity ratios) to alternating in tetrachloromethane and chloroform (MAA moderately self-reactive, and styrene avoids homopolymerization) and gradient in acetonitrile (strong reactivity imbalance leading to MAA rich backbone). Based on the correlation between polarity, hydrogen bonding of solvents, and reactivity ratio, we identified DMF (good polar aprotic solvent for stabilization of MAA through dipole–dipole interaction without the adverse effects of hydrogen bonding, while increasing the reactivity ratio of styrene, it also has good solubility for monomers) as a good candidate for the synthesis of gradient copolymer. Furthermore, the integration of green solvent, whose polarity can be tuned while avoiding distillation cost, could be considered as a medium for copolymerization.36
All the reactions were done in homogeneous solutions in batch-time series aliquot or plug flow continuous flow reaction chemistry. The polymerization reactions were done as follows:
(1) For the polymerization of poly(sulfobetaine methacrylate) (PSBMA) in batch and flow, we used potassium persulfate as the initiator (K2S2O8). The polymerization was conducted in H2O at 70 °C.
(2) For the polymerization of poly(2-vinyl pyridine propane sulfone) (P2VPPS), the initiator was again K2S2O8 in H2O at 80 °C, and we run the polymerization for 24 hours in a batch-time series aliquot sampling.
(3) For the copolymerization of styrene and butyl methacrylate, we have used 10 mixtures of the monomers starting from St
:
BuMA 1
:
9 and up to 9
:
1 ratio. We used AIBN as the initiator for this polymerization and BTC as the RAFT agent. The polymerization was conducted in bulk at 60 °C in a batch-time series aliquot sampling.
(4) For the copolymerization of 2-vinylpyridine propane sulfone (2VPPS) and hydroxy ethyl methacrylate (HEMA) we used 50
:
50 molar ratio with K2S2O8 as the initiator in H2O at 60 °C, 70 °C and 80 °C in both batch-time series aliquot or plug flow continuous flow reaction.
(5) For the homopolymers P2VPPS and PSBMA, we collected samples using the Easysampler in batch-time series aliquot and plug flow continuous flow reactions. The experiment in the flow reactor was conducted to synthesize PSBMA, where we collected samples at 0.2 ml min−1 up to 1 ml min−1 using a step of 0.1 ml min−1.
Representative data for the homopolymerization of P2VVPS under flow 0.2 ml min−1 up to 1 ml min−1 using a step of 0.1 ml min−1 showed the correlation of monomer concentration and the percentage of conversion, which confirms a plug flow environment in solution (using a Thalesnano setup) is near similar to a batch-time series in a quiescent experiment under ambient time and pressures (Fig. 4).
We considered the design of the experiment protocol for both homo- and co-polymerization variables (T, ratio of monomers, and flow rate), and this was also demonstrated in the case of HEMA and VSB. We used stochastic and chemical master equation (CME) modeling of polymerization (limited to low concentration, hard calc.). In principle, we can determine the activation energy of copolymerization (batch) and do a Kinetic Monte Carlo simulation of the probability of polymer MWD from different monomer ratios and their reactivities. An example of NMR data that distinguishes copolymerization and the corresponding change in the monomer concentration is shown in Fig. 5. It is possible to obtain an Arrhenius plot of the natural logarithm of the rate constant against the inverse temperature of the copolymerization.
In Fig. 6, the capability of CGMD simulations to reproduce the different limiting cases is described in Table 1. The left panel displays a representative snapshot of the initial condition, where initiators are shown as larger beads within the simulation box. We highlight three distinct cases: homopolymers (r1 = r2 = 1000), random copolymers (r1 = r2 = 1), and alternating copolymers (r1 = r2 = 0.001). A key advantage of CGMD is that it allows not only tracking of polymerization kinetics but also access to microscopic details such as chain sequences, chain length distributions, polydispersity, etc. As a demonstration, we present the chain sequences obtained for the random copolymer case, showcasing the ability of our approach to capture sequence-level information. Direct comparisons with experiments and setting up a digital twin using the CGMD require introducing effects of fluid flow, mapping the time scales from the simulations to the experiments, and introducing termination and transfer processes during the copolymerization.
We revisited the classical Mayo–Lewis Equation (MLE) as a predictor of the type of copolymers obtained based on relative reactivity in chain-addition reactions: random copolymers (statistical), alternating copolymers (regular pattern), block copolymers (long sequences of blocks) – each monomer type arranged linearly (diblock, triblock) or in more complex architectures. The copolymer will have several controllable properties, including solubility, glass transition temperature (Tg), crystallinity, amphiphilicity, microphase separation, and copolymer synthesis route. For example, graft copolymers allow for combining properties from the backbone and graft polymers, e.g., high-impact polystyrene (HIPS) via polybutadiene onto the polystyrene backbone. Thus, the ultimate correlation is that chain architecture and composition influence their properties, including their chemical, mechanical, thermal, and rheological behavior. Without defining the distinct mechanistic routes, e.g., living polymerization, grafting, sequential addition, macro initiator or macromonomer, etc., we focused on the classical monomer reactivity ratios (rr) of the MLE to determine the ability of a growing polymer chain to add its monomer vs. another monomer to influence the final copolymer composition and sequence distribution. This is very useful in predicting and controlling copolymer properties. The MLE or copolymerization equation in chain-addition reactions defines the instantaneous copolymer composition as the monomer feed composition and rr. Reactivity ratio diagrams visually represent copolymer composition as a function of monomer feed composition.
Chain addition copolymerizations may be performed using heterogeneous and homogeneous methods, e.g., solution, bulk, suspension, emulsion, slurry, and gas phase processes. This can also be done in batch, semi-batch, and continuous-flow reactors. It should be noted, however, that the monomer concentrations local to the active centers control the copolymerization and not the average concentrations in the reactor. For example, this can affect the multisite nature of the catalysts in heterogeneous copolymerization reactions where the propagation constants vary from site to site, affecting polydispersity. However, control of the microenvironment of local concentration can bias the MWD. Copolymerization is still a chemical process subject to unit operation with controlled temperature, pressure, solubility, and, in the case of flow, flow rate, and dosing. The goal is to maintain and eventually obtain novel material properties.
Our attempts to use simulation and time-series copolymerization showed the possibilities for control and an ML-driven experiment. We first designed a digital twin for copolymerization using MLE, as it allows us to explore an extensive range of variable conditions. Digital twin provides visualization of reactant/product concentration–time profiles, a correlation matrix, and effects of reactivity ratio first to maximize the polymer yield as a function of temperature. Note that an extensive range of reactivity ratios and monomer concentrations can be compared to LLMs. It is possible to compare this with QSAR/QSPR modeling, clustering, and similar analysis of reactivity ratio and the structure–functional properties of monomer to the total mass or MW of the polymer. We employed ML modeling to ascertain the effect of monomer descriptors on the monomer property (reactivity ratio), which ultimately defines the copolymer structure of the polymer. A correlation matrix depicts the correlation where the higher MW is less mobile for diffusion in a solution. The positive correlation between monomer MW and the molecular weight varies based on the pseudo-kinetic rate constant method. This demonstrates how the polymerization process evolves at different reactivity ratios and temperatures even after 1000 seconds of reaction. This applies not only to the systems described by the terminal copolymerization model but to the higher-order Markov chain statistics such as the penultimate model. Thus, ML modeling allows us to generate kinetic profiles for specific structures of the copolymer… random, block ladder, and gradient. The kinetics of monomer consumption is another factor. Modeling the effect of reaction variables to optimize polymer yield and polydispersity modeling can take advantage of the Markov chain statistic penultimate model.
The MLE equation considers a monomer mixture of two components and the four reactions that can occur at the reactive chain end, terminating in either monomer placement. Each possibility can be defined by its reaction rate constants. The rr for each propagating chain end is defined as the ratio of the rate constant for adding a monomer of the species already at the chain end to the rate constant for adding the other monomer, with the concentrations of the components defined. The equation then gives the two monomers' relative instantaneous rates of incorporation. The ratio of active center concentrations can be found using the steady state approximation, meaning that the concentration of each type of active center remains constant. It is essential to distinguish the rate of formation of active centers of each monomer vs. the rate of their destruction. The ratio of monomer consumption rates is essentially a descriptor for the MLE. Using mole fractions to express the concentrations makes it easier to define the composition of the copolymer formed at each instant. However, the feed and copolymer compositions can change as polymerization proceeds. The limiting cases establish the type of copolymer that will be obtained (random, alternating, “blocky,” or just a mixture of two homopolymers). The composition drift is not easily accounted for, which is less ideal than an azeotrope system, where feed and copolymer composition are the same. Still, it is helpful to form databases of calculated reactivity ratios (plotting the copolymer equation and using nonlinear least squares analysis). This generally involves several polymerizations at varying monomer ratios, and the copolymer composition is determined by appropriate analytical methods (NMR, IR, or GC-MS). The polymerizations can also be carried out at low conversions, so monomer concentrations can be assumed constant. Refinements with Kelen–Tüdős or Fineman–Ross methods by linearization of the MLE can bias the results closer to the model data. Another method is the Q–e scheme, a semi-empirical method for predicting reactivity ratios that involves introducing proportionality constants and considering the monomer reactivity based on resonance stability and polarity. At this point, there is no discussion on how the kinetics bias or thermodynamics of the mechanism can be shifted by solubility, temperature, pressure, flow (turbulent or laminar), and dosing rate can affect the MLE. It is this possibility that an ML-driven autonomous flow chemistry set-up has the potential to control the copolymerization phenomena further and augment the MLE protocol.
Can we use the rr alone to model copolymerization via ML? Literature-reported data in databases such as the Polymer Handbooks is often insufficient. There is a need to extract this information from publications and LLMs to introduce it as a descriptor. A more detailed look at the “reactivity ratio” data summarized in the databases will allow some interesting insights into reaction recipes that may have been previously overlooked. Another is the solvent or solubility-controlled sequence, and the microstructure of copolymers from random to gradient can be biased. The value of the monomer rr is not constant, as evidenced by the composition drift. We demonstrate the dependency of the reactivity ratio of two monomers on their solvent environment. We have also illustrated the effect of polarity and hydrogen bonding on the reactivity ratio. Predicting the copolymer sequence and microstructure may be possible based on these values. We have already demonstrated two of three key components, and our next effort will focus on training the LLM on synthesis protocols.
We used a retrieval augmented generation (RAG) framework with a FORGE LLM to differentiate and provide selection protocols from the prompts to demonstrate the use of LLMs.37 The RAG-DB training was explicitly done for copolymerization. The FORGE must be integrated with a curated database. FORGE is an open foundation model for LLMs retrieval with up to 26B parameters using 257B tokens from over 200M scientific articles, with performance either on par or superior to other state-of-the-art comparable models. It was initially developed at ORNL. Using a Chat GPT 4.0 prompting on the embedded copolymerization database, we used a RAG protocol that includes reactants, products, specific T, P, and solvent and searched for references and patents. The following results yielded 5823 copolymerization items; 247 referred to butyl acrylate–styrene reactions, for example, derived from publications, patents, abstracts, etc.
Fig. 7 is a representative of the type of data that we obtain from this set-up: Optical diagnostics follow the thermal copolymerization of VSB and HEMA in K2S2O8 as initiators using thermally controlled cell block by measuring UV-vis absorption spectra as a function of increased concentration of monomer and time with copolymerization. The UV-vis absorption spectra of homopolymerizing with VSB with noticeable isosbestic points with kinetic traces taken at 10 nm intervals indicate simultaneous processes within the reaction. Principal component analysis (PCA) of the absorption spectra enables the choice of essential components of the kinetics experiment to follow. We differentiated a component fraction trace to follow the kinetic evolution of the copolymerization process.
In summary, the fine control and copolymerization for two types of monomers have the potential to produce a family of new functional polymers by AI/ML. Developing AI/ML-centric synthesis and characterization tools will open new scientific opportunities to generate high-quality, unique data sets and ML models that further progress in cross-disciplinary discovery and innovation.
In the future, for continuous flow reactor copolymerization, single objective optimization can be achieved with algorithms with Simplex and Stable Noisy Optimization by Branch and Fit (SNOBFIT), which seeks a global optimum of the objective function by iteratively exploring the search space and adaptively refining the sampling points. Bayesian Optimization (BO) algorithms can be used to improve future highly complex and non-linear chemical and reaction spaces for new microstructures, including branching to process even sparse datasets. Bayesian methods utilize probabilistic models to strike a balance between exploration and exploitation, making them well suited for handling noisy and computationally expensive objective functions. The goal is to develop ML algorithms for “self-optimizing” flow reactors integrated with online reaction monitoring. Multi-objective optimizations are standard for flow chemistry and other chemical engineering processes; therefore, Pareto Efficient Global Optimization (ParEGO) methods can be instrumental in addition to genetic algorithms. Thompson Sampling Efficient Multi-Objective (TS-EMO) can also tackle multi-objective optimization in automated flow-chemistry processes. It uses Gaussian processes as surrogates. It uses Thompson sampling with the hypervolume quality indicator and Non-Dominated Sorting Genetic Algorithm II (NSGA-II) to decide the next evaluation point after each iteration.
Flow chemistry approaches have been applied to many monomers and polymerization mechanisms for polymer synthesis, including anionic, cationic, radical, and ring-opening.7,39,40 Compared to traditional batch processing, chemical synthesis performed in continuous flow gives better reproducibility and minor time scales due to efficient mixing, heat- and mass transfer, and variable flow rates that can accelerate reaction rates. Other desirable attributes include real-time reaction monitoring, storage of intermediates, increased product quality, and enhanced safety due to flowing material within channels from stock containers to reactors. Some of the present drawbacks of the standard continuous flow approach, such as the rate at which reactions occur, can be optimized. Finally, this approach is well suited for automation. It can potentially enable intelligent reaction systems to operate safely, efficiently, and for production 24 hours per day.
Footnote |
| † This manuscript has been co-authored by UT-Battelle, LLC, under contract DEAC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://www.energy.gov/downloads/doe-public-access-plan). |
| This journal is © The Royal Society of Chemistry 2026 |