Open Access Article
Lijie
Ding
a,
Chi-Huan
Tung
a,
Zhiqiang
Cao
a,
Zekun
Ye
b,
Xiaodan
Gu
c,
Yan
Xia
b,
Wei-Ren
Chen
a and
Changwoo
Do
*a
aNeutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA. E-mail: doc1@ornl.gov
bDepartment of Chemistry, Stanford University, Stanford, CA 94305, USA
cSchool of Polymer Science and Engineering, Center for Optoelectronic Materials and Devices, The University of Southern Mississippi, Hattiesburg, MS 39406, USA
First published on 21st May 2025
Ladder polymers consisting of fused rings in the backbone have very limited conformational freedom, which results in very different properties from traditional linear polymers. However, accurately determining their size and chain conformations from solution scattering remains a challenge. Their chain conformations of kinked ladder polymers are largely governed by the structures and relative orientations or configurations of the repeat units, unlike conventional polymer chains whose bending angles between repeat units follow a unimodal Gaussian distribution. Meanwhile, traditional scattering models for polymer chains do not account for these unique structural features. This work introduces a novel approach that integrates machine learning with Monte Carlo simulations to construct a model that can describe the geometry of a type of kinked CANAL ladder polymers. We first develop a Monte Carlo simulation model for sampling the configuration space of CANAL ladder polymers, where each repeat unit is modeled as a biaxial segment. Then, we establish a machine learning-assisted scattering analysis framework based on Gaussian Process Regression. Finally, we conduct small-angle neutron scattering experiments on a CANAL ladder polymer solution to apply our approach. Our method uncovers structural features of such ladder polymers that conventional methods fail to capture.
Small angle scattering experiments,7 including X-ray scattering8 and neutron scattering9,10 are often used to study the characteristics of polymer system, and to unveil the single polymer structure using dilute polymer solutions. The scattering data is often analyzed using various polymer models to extract the polymer parameters, e.g. contour length, radius of gyration and persistence length. However, traditional polymer models, such as Gaussian coils11 or worm-like chains12 are inadequate for capturing the distinctive features of ladder polymers since they are designed to model the single-stranded polymers and discard bending. These models do not fully represent the inherent rigidity and extended conformation of the ladder polymer, thus fail to provide an accurate depiction of the ladder polymers structure.
To overcome these challenges and provide an accurate description of the ladder polymer structure using scattering data, we build a new model for the simplest CANAL ladder polymer consisting only fused norbornyl and benzocyclobutene units, produced from norbornadiene and dibromo-p-xylene. This model accounts for the biaxial nature of it's monomer structure and inherent rigidity. Due to the complexity of this model, it is difficult to derive the analytical form of the scattering function, which is typically required for fitting scattering data using traditional approaches. To address this, we leverage the power of Machine Learning (ML)13 and Monte Carlo (MC)14 simulations.
The recent advancements in ML have enabled numerous applications in materials science, including the analysis of scattering data15 without knowing an explicit analytical form. This approach relies on large data sets that include scattering functions and corresponding polymer parameters, allowing ML to learn the relationship between them. Meanwhile, MC can be used to build such data sets. Given a set of polymer parameters, such as contour length and bending rigidity, we can use MC simulation to generate an ensemble of the polymer conformations and calculate the structure factor, or scattering function. This combination of ML and MC provides a powerful framework for analyzing complex polymer systems and has been proven useful for various single-stranded polymer systems16–19 and other soft matter system.20 Other works such as SCAN automates structural analysis using predefined particle shape models, while CREASE employs genetic algorithms and surrogate ML to reconstruct 3D features—such as domain size, shape, orientation, and spatial distributions—from scattering profiles.21–24 Other ML approaches have been used for particle tracking in soft materials25 and for surface scattering analysis.26 Nevertheless, these methods do not provide insight for the model-specific parameters for systems like the ladder polymer and can not capture the unique structural nuances of such systems.
In this paper, we present a framework for analyzing the scattering data of ladder polymer using ML. We firstly introduce a model of the ladder polymer where the biaxiality, inherent rigidity and arrangement of successive monomers all play crucial role in determining the polymer conformation. We then carry out MC simulation to generate a large data set of the scattering data and train a ML model of Gaussian process regression27 (GPR) to obtain the mapping between scattering data and polymer parameters. Finally, we synthesize ladder polymer samples and measure the scattering function using small-angle neutron scattering (SANS) experiment and apply out method to the extract important polymer parameters for the measured sample. In contrast to conventional Gaussian process-based data inversion approaches,28–30 our approach avoid the potentially large computational cost in posterior sampling and predict each polymer parameters separately.
, where û is along the along axis of the segment, or the polymer tangent direction, and
is along the segment short axis and perpendicular to û. A polymer is then modeled as a chain of L segments, where L is the contour length in unit of monomer length B.
Unlike conventional polymer, the successive segments on the ladder polymer, i.e. catalytic arene-norbornene annulation (CANAL) polymer, tend to form a angle, as shown in Fig. 2(a and b). We introduce another two unit vectors, û′ and
′, to represent this preferred orientation for the successive segment. For two connecting segments i and j, the angle between (û,
) and (û′,
′) is the inherent bending and twisting. For this specific model we are concerned of, the
=
′, and we denote the inherent bending cos(α) = û·û′. There is a energy cost when (ûi+1,
i+1) tilt away from
, given polymer energy
, where the bending is
, twisting is
, and Kt and Kb are the twisting and bending modulus, respectively.
Finally, the preferred orientation for successive segment at each segment may not stay on the same side. Comparing Fig. 2(a and b), when they stay on the same side, we call them connected by syn links, the polymer rolls up and become coil shape as shown in Fig. 2(c). On the contrary, if they flip side, or connected by anti links, the polymer tend to extend longer, as shown in Fig. 2(d). We define the probability of a link being a anti link as anti rate Ra.
Given a contour length L, inherent bending angle α, anti rate Ra, bending modulus Kt and twisting modulus Kb, the ensemble of ladder polymer configuration is determined. The configuration can be captured by the intra-polymer structure factor, given by:7,9
![]() | (1) |
is the position vector of segment i and
. In addition, we also calculate the radius of gyration
, with 〈⋯〉i,j denoting the average over all pairs of segments. We will use MC and ML to understand the relationship between structure factor S(QB) and other polymer parameters (Ra, α, L, Rg2, Kt, Kb).
and
, as they are independent in the polymer energy
, which follow the Boltzmann distribution P(E) ∼ e−E/kBT. After sampling {(li, θi, ϕi)} for all segments based on their distribution, we calculate (ui, vi) and
of each polymer segments, then check the self-avoidance criteria
for all pairs of segments, only configurations satisfying these criteria are kept.
S(QB)train}, X* = {ln
S(QB)test} are the training set and test set, Y and Y* are the corresponding polymer parameters (Ra, α, L, Rg2). In our case, we use 70% of the data set F = {ln
S(QB)} as the training set, and the rest 30% as the test set. The joint distribution is for a Gaussian process is given by eqn (2)![]() | (2) |
are used, in which l is the correlation length, σ is the variance of observational noise and δ is the Kronecker delta function.
S(QB)} by generating conformations of ladder polymers using MC for 6000 random combination of (Ra, α, L, Kt, Kb) and calculate the corresponding Rg2 and S(QB). The S(QB) are calculated for 100 different QB ∈ [0.07, 3], such that the ln
QB grid is uniformly placed in this interval. The polymer parameters are sampled as Ra ∼ U(0, 1),
, L ∼ U(4, 50), Kt ∼ U(50, 100) and Kb ∼ U(50, 100), where U(a, b) is the uniform distribution in interval [a, b]. In practice, these simulations are carried out in parallel on different CPUs, and each simulation takes up to half hour to complete. Natural units are used, such that length are in unit of segment or monomer length B, and energy are measured in unit of thermal noise kBT. We firstly study the effect of polymer parameters on the structure factor, then validate the feasibility for ML inversion of each polymer parameter, train a GPR and test it using MC generated data. Finally, we carry out SANS experiment and applied the trained GPR to the experimentally obtained structure factor.
S(QB)} to polymer parameters Y = {(Ra, α, L, Rg2, Kt, Kb)}, following the similar ML inversion framework,15 we carry out principle component analysis of 6000 × 100 matrix F, by decomposing it into F = UΣVT using singular value decomposition (SVD), where U, Σ, and V are matrices of 6000 × 6000, 6000 × 100, and 100 × 100 sizes, respectively. V is consist of the singular vectors, and the entries of Σ2 are proportional to the variance of the projection of F onto corresponding principal vectors in V.
As shown in Fig. 4(a), the singular value decays rapidly with its rank, suggesting the projecting ln
S(QB) ∈ F onto the space spanned by the high rank singular vectors manifest good approximation of the entire ln
S(QB). Fig. 4(b) shows the first three singular vectors (V1, V2, V3), and Fig. 4(c) demonstrate the projection of ln
S(QB) on to these top 3 singular vectors do recover the original ln
S(QB) very well.
By projecting the F = {ln
S(QB)} onto the singular vector space of (V0, V1, V2), each ln
S(QB) become a coordinate in the three dimensional space, (FV0, FV1, FV2), and the entire set of coordinates provides a good proxy of the raw data set F. By plotting the distribution of polymer parameters Y in the (FV0, FV1, FV2), Fig. 5 provide insight for the feasibility of ML inversion of each of the polymer parameter, int which the corresponding value are represented by color distribution.
As shown in Fig. 5(a–d), the polymer parameters (Ra, α, L, Rg2) are well spread out in the (FV0, FV1, FV2) space, indicating a good reversed mapping from ln
S(QB) to these parameters, indicating they are good inversion targets. On the contrary, Fig. 5(e and f) show that the distribution of the bending and twisting modulus Kb and Kt are rather random, suggesting there they can not be easily extract from the ln
S(QB). This is in line with our expectation as the conformation of the ladder polymer is not sensitive to the wiggling around the inherent bending angle α since α is very large compare to the flexibility of the chemical bond.
S(QB) established, we test such inversion using simulation data. We divide the data set F = {ln
S(QB)} randomly into two parts, a training set {ln
S(QB)train} consisting 70% of F and a testing set {ln
S(QB)test} made of the rest 30%. We optimize the hyperparameters of the GPR model using the training set for each polymer parameter and then extract the corresponding polymer parameters (Ra, α, L, Rg2) from the ln
S(QB) ∈ {ln
S(QB)test}. The scikit-learn Gaussian process library36 was used for the training. Table 1 shows the optimized hyperparameters for each polymer parameters, obtained by maximizing the log marginal likelihood27 as shown in Fig. 6.
| l | σ | |
|---|---|---|
| R a | 6.497 × 10−1 | 6.301 × 10−4 |
| α | 3.570 × 10−1 | 1.585 × 10−2 |
| L | 1.043 | 4.442 × 10−3 |
| R g 2 | 3.843 | 1.447 × 10−7 |
Fig. 7 shows the comparison between the polymer parameters (Ra, α, L, Rg2)obtained from ML inversion and the corresponding MC references. The data agree very well, and lie closely to the diagonal line, with coefficient of determination r2 score close to 1. The high precision highlights the effectiveness of extracting key parameters from the structure factor and further confirms the robustness of our GPR model. These results also indicate that, for our model, these polymer parameter can be extracted from the scattering curve independently, whereas there may need to be additional constraints in some cases, e.g. charged polymer.18
![]() | ||
| Fig. 7 Comparison of polymer parameters in simulation and inverted by machine learning. (a) Anti rate Ra. (b) Inherent bending angle α. (c) Contour length L. (d) Radius of gyration Rg2. | ||
Fig. 8 shows normalized form factor measured from the SANS experiment and the ML implied curve. The SANS measured S(QB) shows good flat part in the low Q region in the log–log plot, allow us to fit for the normalization coefficient using Guinier approximation7,37S(QB) ∼ e−(QRg)2/3, and the monomer length B obtained by molecular structure optimization allow us to rescale the horizontal axis. By feeding the normalized experimental ln
S(QB) to the trained GPR, we obtain the polymer parameters (Ra, α, L, Rg2), as shown in Table 2, and then run MC simulation with these parameters to reconstruct the ML implied S(QB). The SANS measured S(QB) and the Ml implied one agree closely. As shown in Fig. 8, the black line, which reproduced using the mean value of the GPR predicted polymer parameter agrees with the experimental data very well, and the gray region are reproduced by taking the extreme of the error bar of each polymer parameter. We note that although the experimentally measured SANS data show high noise at low Q range, it is a common issue due to low neutron counts and since it is known that the polymer structure factor has the universal Guinier form at low Q, we firstly fit for the Guinier region37 to find the normalization factor of the scattering density and replace the low Q data with smooth Guinier form before feeding in to the GPR. In addition, while the SANS measured S(QB) exhibit different noise levels at different Q due to neutron counting and instrument error, we only used the mean for the inversion as it is conventional for the SANS analysis, such impact can be minimized by taking longer and costly SANS experiment or use higher concentration of the sample to improve the signal. The current method cannot account for the error bar in the experimental data when applying the GPR, which can result in a larger uncertainty for the extracted parameters.
| R a | α | L | R g 2 | |
|---|---|---|---|---|
| a The atomistic structure of the ladder polymer with 4 monomer units were optimized using the Forcite Module with COMPASS force field in Materials Studio 8.0, BIOVIA. | ||||
| Machine learning inversion | 0.14 ± 0.07 | 0.89 ± 0.05 | 11.6 ± 2.8 | 2.07 ± 0.28 |
| Molecular structure optimizationa38 | N/A | 0.96 ± 0.08 | N/A | N/A |
| Flexible cylinder fitting12,39 | N/A | N/A | 12.0 ± 0.8 | N/A |
| Guinier approximation fitting7,37 | N/A | N/A | N/A | 1.82 ± 0.15 |
Table 2 shows the four GPR predicted polymer parameters of SANS our synthesized CANAL ladder polymer along with comparison with parameters obtained from other traditional methods. Note that our ML inversion method can extract all parameters simultaneously, and those parameters that the traditional method can extract, (α, L, Rg2), show strong agreement with its results. Moreover, due to the special monomer structure of the CANAL ladder polymer, the anti rate Ra is a unique parameter that only can be obtained using the ML inversion method. The ML inversion method suggest our sample is relative short, with only about 12 segments, and the radius of gyration is even just Rg2 ≃ 2, fairly small for such contour length comparing to semiflexible chains.40 This discrepancy is explained by the low anti rate Ra ≃ 0.14, which suggesting the monomers are most connected through syn link, making the polymer roll up, as shown in Fig. 9. The tendency to have more coiling structure of ladder polymers has also been observed from other systems.41
![]() | ||
| Fig. 9 Sample ladder polymer configurations generated using MC with (L, Ra, α) = (12, 0.14, 0.89) and Kt = Kb = 100. | ||
Using the ML extracted polymer parameters, we can regenerate sample configurations using MC. It is expected that the CANAL ladder polymer sample we synthesized roll up to a coil or ring shape due to it’s low anti rate. Further studies on single polymer imaging using scanning tunneling microscope44 (STM) or ultra resolution atomic force microscopy45 (AFM) would be highly beneficial. Moreover, the sample we used in this work only have inherent bending, application of this ML inversion method for other CANAL ladder polymers with both inherent bending and twisting can also be carried out in the future.
We also note that the CANAL ladder polymer structure we studied is dominated by the inherent bending angle and anti rate, the effect of bending modulus Kb and twisting modulus Kt are too weak to be extracted for this system. For the study of these Kb and Kt, ladder polymer whose monomers are connected in a flat manner are more suitable, as well as conjugated polymer46,47 whose twisting can be more significant due to the existent of single bond.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5dd00051c |
| This journal is © The Royal Society of Chemistry 2025 |