Hesam Makki*a,
Colm Burkea,
Christian B. Nielsenb and
Alessandro Troisi
*a
aDepartment of Chemistry, University of Liverpool, Liverpool L69 3BX, UK. E-mail: h.makki@liverpool.ac.uk; a.troisi@liverpool.ac.uk
bDepartment of Chemistry, Queen Mary University of London, Mile End Road, London, E1 4NS, UK
First published on 14th May 2025
The molecular design of semiconducting polymers (SCPs) has been largely guided by varying monomer combinations and sequences by leveraging a robust understanding of charge transport mechanisms. However, the connection between controllable structural features and resulting electronic disorder remains elusive, leaving design rules for next-generation SCPs undefined. Using high-throughput computational methods, we analyse 100+ state-of-the-art p- and n-type polymer models. This exhaustive dataset allows for deriving statistically significant design rules. Our analysis disentangles the impact of key structural features, examining existing hypotheses, and identifying new structure–property relationships. For instance, we show that polymer rigidity has minimal impact on charge transport, while the planarity persistence length, introduced here, is a superior structural characteristic. Additionally, the predictive power of machine learning models trained on our dataset highlights the potential of data-driven approaches to SCP design, laying the groundwork for accelerated discovery of materials with tailored electronic properties.
New conceptsWe demonstrate that it is finally possible to rationalise the entire class of semiconducting polymers and make predictions on their charge mobility with a known confidence level. A new methodology presented here enables the generation of accurate predictive models for hundreds of polymers, bringing the power of digital discovery into the field of semiconducting polymers. Data analysis of such a large and homogeneous set provides new, statistically significant, structure–property relationships. We find, for instance, that polymer rigidity is not important for its charge transport properties, which are instead dominated by a structural parameter we introduce: the planarity persistence length. We also resolve debates that have long puzzled researchers, such as the relative importance of conformational and electrostatic disorder. Combining our methodology with machine learning techniques we realise the possibility of screening thousands of polymers providing an unprecedented tool for their design and discovery. |
A major opportunity to develop fully predictive models of SCPs lies in the availability of high-quality computational methods, established by various research groups,13–23 prove to be consistently capable of generating accurate descriptions of the local microstructure and electronic properties of SCPs.4,6,18,24,25 Until now, these models have been deployed on very few polymers, typically one polymer per investigation, and they are commonly used to complement experimental investigations on benchmark systems.4,6,10,18,25,26 In this work, however, we establish that such methods can be used at scale, to create a fairly exhaustive map of the relationship between chemical structure and electronic properties for the entire class of SCPs. The availability of a homogenous (and expandable) dataset also enables the derivation of statistically significant design rules, gradually replacing the traditional approach of formulating research hypotheses tested on a limited number of cases. To this end, we generate more than 100 state-of-the-art models of p- and n-type SCPs, including their microstructure and electronic properties. Through disentangling the structural features and analysing their specific correlations with the calculated electronic properties, we test several existing hypotheses regarding the structure–property relationships and introduce previously unrecognised structural characteristics strongly correlated with intra-chain charge carrier mobility. Additionally, our results illustrate how machine learning (ML) models, built on these computed results, can provide rapid and statistically significant predictions for the design of new polymers.
![]() | ||
Fig. 1 (a) Structures and acronyms of monomers used in this study. CH3 groups are used to indicate the connection with the alkyl side chain. (b) Schematic representation of the Embedded Chain (“embd”) model used for DOS and LL calculations, as described in ref. 32. (c) Example results for a p-type polymer based on the Embedded Chain model, showing DOS(E) and LL(E) with definitions of key metrics: band-edge (E0.05%), tail-width (WT = |E0.5% − E0.05%|, highlighted in orange), and bandwidth (WB = |E50% − E0.5%|, highlighted in green). (d) Correlation graph between WB/WT and LL (Spearman's rank correlation coefficient, S = 0.8). (e) Comparison of LL values calculated from the “embd” and “iso” input models. |
The methodology for performing simulations across a large number of systems builds on a workflow validated in ref. 32 and reported in the Method section below. Long polymer chains are modelled within an environment containing the corresponding RUs (referred to as the “soup” model) to replicate the bulk electrostatic and steric environment. As shown in the ESI,† Section S3, the quantities described throughout this work accurately reflect bulk properties and remain unaffected by the increase in the number of RUs of the chain beyond the 10-RU SCPs used in this study. Notably, microstructures of several SCPs generated by the bulk models using the same workflow have been previously compared to GIWAXS, showing excellent agreement.24,25,32 The electronic structure is computed for individual polymer chains embedded within an electrostatic environment (labelled as “embd” in the figures), represented with suitable point charges (Fig. 1(b)). To isolate the effects of the electrostatic environment, we also analyse the electronic structure of isolated chains in the same conformation but without embedding (labelled as “iso”).
The DOS(E), a critical descriptor for SCPs, captures the distribution of electronic states and their accessibility for charge carriers. To analyse 100+ cases we should introduce more concise descriptors of the DOS. Specifically, for p-type SCPs, we compute the energies E0.5%, E5%, and E50%, which correspond to the hole filling of 0.5%, 5%, and 50% of the valence band, respectively. We define the tail-width as WT = E0.5% − E0.05% and the (half) bandwidth as WB = E50% − E5% (Fig. 1(c)). An analogous definition is applied for n-type polymers and their conduction band. At each energy, it is possible to define a localisation length LL(E), computable from the electronic structure results.33 Evaluated at the band edge LL(E0.5%) (see Fig. 1(c)), this quantity expresses the delocalisation of the states more relevant for transport,34,35 and we refer to it simply as LL in the remainder of this work.
In our work, we employ LL as a proxy for mobility, which is justified by multiple lines of both experimental and computational evidence demonstrating its strong correlation with key electronic properties relevant to charge transport in semiconducting polymers, e.g., mobility. For example, Fig. 1(d) demonstrates the expected strong correlation between WB/WT and LL (Spearman's rank correlation coefficient, S = 0.8), highlighting how either DOS tail or localisation characteristics can be linked with charge mobility.35 Furthermore, other modelling studies based on model reduction calculations showed that LL and computed mobility are strongly correlated.36
The comparison of LL values calculated from the “iso”, LL(iso), and “embd”, LL(embd), models (Fig. 1(e)), underscores the impact of the environmental electrostatic effect on charge transport, with the expected trend: LL(iso) > LL(embd). This finding highlights that, while the inter-chain electrostatic environment plays a role, a significant portion of the charge localisation behaviour originates from the chain conformation itself. Note that we distinguish between n-type and p-type polymers in the scatter plot presentations, although we do not find significant differences between the two classes in any of the discussions presented below. On the basis of this hierarchy, the following discussion will first explore the contribution of chain conformation to LL by analysing the “iso” models, followed by the impact of electrostatic environment, whose contribution can be studied in isolation by comparing the “embd” and “iso” results. This approach allows us to directly examine the disorder arising from the intra- and inter-chain effects.
By generating over 100 realistic models and calculating properties correlated with mobility, we gain a unique opportunity to identify SCP structural features that contribute to high mobility. Our focus is on features that can be deliberately engineered to guide the synthesis of new polymers. For example, while polymer rigidity has long been considered a desirable property for SCPs,37–39 Fig. 2(a) reveals only a weak correlation (S = 0.49) between persistence length (Lp), a proxy for polymer rigidity, and LL. Among the structural properties analysed, the one most strongly correlated with LL (S = 0.75) is the chain planarity persistence length (Lpp), which quantifies how planar a polymer backbone remains along its length (Fig. 2(b)). To determine this, we calculate the angle distribution between monomers along the chain, combining all the relative angles across 200 input chain models per SCP, and identify the length at which the planarity is lost (see Method). Fig. 2(c) shows that Lpp, unlike Lp, exhibits a strong correlation with LL, and it is therefore a superior predictor of charge delocalisation, an idea proposed in more qualitative terms in studies of individual polymers.10,40,41
This can be rationalised by considering the introduction of disorder in a tight-binding Hamiltonian, which leads to localisation of the resulting orbitals that increases with disorder.42 A variation in the dihedral angle in a conjugated polymer chain induces changes in both the off-site (inter-monomer coupling) and on-site Hamiltonian elements which, in turn, causes disorder and state localisation. In ESI,† Section S4.6 we employ a tight-binding model Hamiltonian to more quantitatively investigate the dependence of LL and Lpp on the dihedral angle distribution.
A key advantage of Lpp as a descriptor is its rapid computability, making it useful for screening polymers prior to synthesis. As shown in Fig. S4.3.3 in the ESI,† Lpp can be derived almost exactly (S = 0.99) from the dihedral angle distribution of a polymer's conjugated fragments within a single RU, obtained from MD simulation trajectories. Furthermore, as demonstrated in Fig. 2(d), Lpp can be predicted with high accuracy (S = 0.95) using a small MD simulation of dimers (ESI,† Section S4.4), or estimated with reasonable accuracy (S = 0.86) for initial screening through torsional potentials in the gas phase (ESI,† Section S4.5) – a fast and routine calculation, as detailed in Table S4 of the ESI,† which compares the associated computational cost and accuracy. The quantities Lp and Lpp are often mistakenly conflated in discussions of design rules, while they are largely uncorrelated, as explicitly shown in Fig. 2(e). In practice, chain stiffness is not a particularly desirable attribute, whereas long-range planarity is crucial. To illustrate the difference between these two descriptors we visualise selected SCPs across Lp and Lpp ranges, using the common analogy43 with different pasta shapes: (i) SCPs with both large Lp and Lpp, resembling uncooked fettuccine, flat and stiff ribbons; (ii) SCPs with large Lp but small Lpp, comparable to uncooked fusilli; stiff but twisted chains (iii) SCPs with small Lp but large Lpp, similar to cooked fettuccine, locally flat but soft; and (iv) SCPs with both small Lp and Lpp, resembling cooked fusilli pasta, soft and twisted chains.
We now consider an approach to quantify the relative importance of electrostatic disorder, arising from intra- and inter-chain electrostatic interactions, and the electronic coupling disorder, which is caused by disruptions in π-conjugation along the polymer backbone, to the total disorder. To determine the two contributions from the DOS, we fit a model DOS to the computed DOS using a purely electronically disordered Hamiltonian of ten states, with one state |i〉 per site (RU) i and nearest-neighbour coupling
Mapping an extremely complex system into a reduced model with only four physically intuitive parameters offers significant advantages in rationalising the SCP materials class. For instance, it is evident that the on-site disorder, σ(α), is consistently larger than the disorder in the inter-site coupling, σ(β), with an average σ(α)/σ(β) ratio of 5.2 (the distribution is shown in Fig. S6, ESI†). This approach allows us to determine the range of parameters observed across the systems studied and identify the achievable optimum for each.44 For example, σ(α) spans from 0.09 to 0.31 eV, while β ranges from 0.02 to 0.21 eV (Fig. S6, ESI†). Since there is a negligible correlation between them (S = 0.05), they can be optimised separately to maximise the ratio β/σ(α), which has a very strong correlation (S = 0.83) with the localisation length shown in Fig. 3(b). It is worth noting that β is a property that can be readily computed for an isolated chain, while σ(α) has an intra-molecular component, driven by the individual chain conformation discussed earlier, and an inter-molecular component which is explored next.
It was noted in Fig. 1(e) that the electrostatic environment has an impact on the LL(embd) ranging from small to considerable, one of the elements complicating the straightforward rationalisation of SCP properties. In the 4-parameter model, this translates into a broad distribution of on-site disorder values, with σ(α)EMB–σ(α)ISO ranging from 0.025 to 0.20 eV across the polymers considered, with nearly identical average values for p-type and n-type systems. Interestingly, it is possible to predict the electrostatic effect of the environment based on the electrostatic properties of the isolated RU. As shown in Fig. 3(c), there is a very strong correlation (S = 0.85) between σ(α)EMB–σ(α)ISO and the dipole moment of the isolated RU, μGAS, a simple indicator of the electrostatic “noise” caused by the environment (ESI,† Section S3.3). This consideration can be integrated into polymer design since also this quantity can be rapidly obtained from the chemical structure of the RU.
The study presented so far shows that better SCPs can be designed to remain planar over a longer distance, have large effective coupling along the chain, and be subject to a smaller electrostatic disorder from the environment. Because of the sizeable dataset, we can derive a machine learning (ML) method to predict LL from easy-to-compute quantities, enabling an ultra-rapid screening of potentially interesting polymers. To estimate the LL, we use three descriptors: (1) coupling (β) computed from a single point electronic structure calculation (ESI,† Section S5), (2) μGAS from atomic charges and optimised structure of the RU, and (3) the Lpp computed from a small-scale MD simulation of dimers (ESI,† Section S4). We used the Random Forest algorithm45 to predict LL(embd) from these descriptors, selecting one-third of the dataset as a test set to validate the model (see details in Method). Such a simple model based on only three independent input parameters can effectively rank SCPs based on LL(embd) (Fig. 4(left)), with no SCPs from the top third of LL(embd) values being misclassified into the lowest third category. This is a promising outcome for such computationally inexpensive calculations. The ML study serves as a robust coarse-level screening tool, more than an order of magnitude faster than full calculations (an analysis of the computational cost is provided in ESI,† Section S7). It is ideal for screening thousands of SCPs, while full calculations-including MD and QC – can be reserved for the best candidates selected after the initial ML evaluation. Additionally, a Random Forest model also provides the relative importance of its input parameters, which can be translated into the relative strength of each design principle. As shown in Fig. 4(right), while coupling and electrostatic effects of the environment exhibit comparable impacts, the planarity persistence length emerges as the dominant factor due to its strong correlation with LL, as illustrated in Fig. 2(c). Furthermore, the present ML model is developed for undoped linear SCPs and different classes require dedicated alternative data sets. In particular, the highly relevant area of doped SCPs would require the inclusion of additional factors such as the chemical nature of the dopant, their concentration and interaction with the charge carriers.
While β and μGAS are intrinsic molecular properties, Lpp is also primarily governed by the chemical structure of the polymer, as supported by the strong correlation observed across different calculations, including torsional potential calculations (gas phase), “soup” simulations, and melt models (ESI,† Section S4). Nevertheless, Lpp may still be influenced by processing under certain conditions. Although our current analysis focuses on identifying key structure–property relationships under near-equilibrium conditions, processing-induced changes in planarity and charge transport could be a valuable direction for future investigation. The MD-based methods employed here can be extended to explore how conditions such as solvent evaporation or thermal annealing impact Lpp, opening up pathways for experimental control and optimisation. Furthermore, the present ML model is developed for intrinsic, undoped linear SCPs and extension to doped systems, where additional factors such as dopant interactions and concentration must be considered, is an important direction for future work.
This work provides a hierarchy of methodologies that can be used to screen polymers depending on the desired accuracy: descriptors for screening thousands, the atomistic soup model for hundreds, or full atomistic models for tens of SCPs. The throughput and automation of these methods are unprecedented for semiconducting polymers and enable, for the first time, a continuous feedback loop between modelling and experimentation. Theoretical predictions can be now considered much faster than synthesis and characterisation. This throughput also facilitates the adoption of advanced ML methods for result analysis, not only based on patterns in chemical structures but also on the most relevant and easily computable physical properties, advancing in silico design beyond the current paradigm of incremental optimisation of existing material structures.46 Finally, there are no inherent limitations in applying these methods to extract various properties (e.g., optical,47 thermoelectric,48 and mechanical49), including electronic processes such as doping50 or inter-chain charge transfer.25
Simulation workflow used across 105 SCPs includes four stages: (i) force field development and “soup” model construction, (ii) 1 μs molecular dynamics (MD) simulation at 500 K and 1 bar, (iii) 200 samples taken from the trajectory made in (ii) and cooled down to 300 K in 1 ns to generate input models for quantum chemistry (QC) calculations, and (iv) QC calculations on all input models to obtain average density of states (DOS) and localisation length (LL). A summary of the methods used is provided in ESI,† Section S2, and details in ref. 32. Model validation is explained in ESI,† Section S3.
Polymer persistence length (Lp) is defined as the distance over which correlations in the direction of the tangent are lost. We use a standard way of calculation:51 the angles, θ, between the tangents of monomers at distances ranging from L1 to LN (where N is the total number of monomers in chain) are calculated and averaged over 200 chain models. An exponential decay function, 〈cos(θ)〉 = exp(−LN/Lp), is then fitted. From this fit, Lp is determined. Details including examples of polymers with relatively low and high Lp are provided in ESI,† Section S4.1.
Polymer planarity persistence length (Lpp) defined as the length along the polymer over which chain planarity is lost. This is quantified by calculating the relative angles between the normal vectors to the monomer planes along the chain. Details of the method are provided in ESI,† Section S4.2. Instead of extracting the actual angles between all monomers along the chain, e.g., θ1−2, θ1−3,…, θ1−N, N = number of monomers in a chain, to create angle distributions, one can also estimate Lpp by convolving the angle distributions of only neighbouring monomers from MD simulation, LMDpp, or Boltzmann distribution obtained from torsional potential calculated in gas phase, LGASpp. The details of these methods are provided in ESI,† Sections S4.3–S4.5.
Reduced models of the polymer electronic structure were generated in two stages: (i) finding initial values of parameters α and β by fitting the eigenvalues of a tight-binding disorder-less Hamiltonian to the DFT-computed orbital energies of the periodic, rigid 10-mer, (ii) using a simulated annealing algorithm to fit DOS derived from a 6-parameters tight-binding Hamiltonian including disorder to DOS computed from full QM/MM models. Further details of the above procedure are provided in ESI,† Section S5.
For machine learning (ML) study, we employed a Random forest regressor to predict the LL(embd) of SCPs using a dataset containing three molecular descriptors: β (coupling, calculated by method explained ESI,† Section S5), Lpp (obtained from small MD simulation of dimers explained in ESI,† Section S4.5), and μGAS (from single-point calculation on repeat unit in vacuum). The data was subjected to multiple train-test splits using five different randomised splits to ensure robustness in the model's performance evaluation. Each split involved scaling the features using a standard scaler, training the Random Forest model with 100 estimators, and subsequently predicting the target variable on the test set. The model's performance was assessed using mean squared error (MSE), cross-validated MSE (with five folds), and Spearman's rank correlation coefficient.
All scripts necessary to reproduce the data presented in the paper, including those for density of states (DOS), localization lengths (LL), descriptors (such as tail and band (half) widths (WT and WB), LL(E), persistence length (Lp), planarity persistence length (Lpp), dipole moments (μGAS)), reduced tight-binding models, and machine learning analysis based on random forest, are available at https://github.com/HMakkiMD/SCP_HT. The codes for QC/MD calculations encompassing force field development and model construction, input generation for QC calculations, and codes for density of states and localisation length calculations are available at https://github.com/HMakkiMD/GAMMPS.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5mh00485c |
This journal is © The Royal Society of Chemistry 2025 |