Mapping binary copolymer property space with neural networks

We map the property space of binary copolymers to understand how copolymerisation can be used to tune the optoelectronic properties of polymers.


Introduction
Conjugated polymers are a highly versatile class of organic materials that can be used in a wide variety of applications such as photovoltaics, 1-5 light-emitting diodes, 6,7 eld-effect transistors, 8 batteries, 9 supercapacitors, 10 thermoelectrics, 11,12 and photocatalysts. [13][14][15][16][17][18] All of these applications exploit a combination of the optoelectronic and/or redox properties of the polymers, the earth-abundance of their constituents, and the relatively facile tunability of polymer properties. Generally, property tuning of conjugated polymers is performed through copolymerisation; combining different building blocks to yield a repeating motif, which is replicated to form the polymer chain. The properties of the resulting copolymers arise from a combination of those of the building blocks, although the exact connection between the two or between the properties of the copolymer and the related homopolymers is not clear. Models that aim to explain this connection for the optoelectronic properties in terms of the donor and acceptor character of building blocks have been proposed in the literature, but these are generally qualitative in nature. [19][20][21] While an attractive attribute of polymer chemistry, the ability to both tune polymer properties through copolymerisation, and to explore their compositional space presents a dimensionality problem that arises from the large number of available monomers and is exaggerated with increasing copolymer complexity. To illustrate this numerically, consider a pool of 500 different monomers. Combining these monomers in all possible ways results in 125 250 binary copolymer compositions, increasing to over 250 000 when we consider that each repeat unit (if asymmetric) has two isomers. With more complex repeat units, i.e. three-and four-component copolymers, 4,5 we arrive at billions of possible combinations. From a materials design standpoint, these astronomically large numbers make it impossible to explore the copolymer compositional space experimentally, even with high-throughput robotic synthesis and characterisation techniques, or computationally, particularly with more complex polymer repeat units, using standard approaches based around Density Functional Theory (DFT).
Naturally, we can overcome the copolymer dimensionality problem with a fast enough way of determining relevant properties for known copolymer compositions. A rst step towards this was a move from DFT to semi-empirical methods, which allowed for the screening of short oligomers for high efficiency organic photovoltaic materials. [22][23][24] In recent years, machinelearning techniques have emerged as a promising way of tackling analogous problems in other areas of organic and inorganic materials design, [25][26][27][28][29][30][31][32][33] and conceptually could allow for the exploration of much larger compositional spaces, unlimited by polymer length. In this context, (supervised) machine learning involves 'training' a model with examples of molecules/ materials for which the properties are known. Once trained, the model essentially acts as a function able to map molecular structure and/or composition to material properties. However, use of these techniques is oen prohibited by the requirement for large amounts of clean, high quality, data with which to conduct training. We could obtain training data from electronic structure calculations, where, in the context of organic materials, DFT is the standard. However, DFT is simply too computationally intensive to use for large numbers of conjugated copolymers, where representative oligomer models can contain upwards of 150 atoms. Indeed, recent work 34 on non-conjugated polymers using Gaussian Process Regressors trained using DFT data as input highlighted the challenge of exploring a wide chemical space with large numbers of possible compositions, as well as restrictions on the type of machine learning algorithms that are feasible, due to the limited size of the training data-set that is computationally affordable. Until recently, using semiempirical methods, as discussed above, to generate this data could mean signicantly reduced performance of a given machine learning model due to their lower accuracy with respect to DFT. 35 However, we recently showed that optoelectronic properties calculated with xTB 36-38a recently developed family of density functional tight binding methodscalibrated to a small, representative subset of (time-dependent-) DFTderived resultsprovides highly accurate copolymer optoelectronic properties with computational cost reduced by at least three orders of magnitude relative to DFT. 35 Further, we used the resulting high-throughput approach to demonstrate the weak dependence of the predicted properties on the exact polymer conformation. 39 In turn, these two observations suggest that (i) xTB can be used to generate DFT-quality training data and (ii) 3D structural models of polymer chains may not be necessary for the prediction of optoelectronic properties (i.e. we can ignore conformation effects while focussing only on composition, see below), permitting the use of 2D molecular representations as descriptors.
Here we show how high-quality training data obtained via xTB, in combination with 2D molecular descriptors (in this case, Extended-Connectivity (Morgan) Fingerprints 40 ), can be used to train a neural network model capable of the simultaneous, near-instant prediction of the key optoelectronic properties of copolymers with very high accuracy (RMSE < 0.12 eV). Using this model, we explore the binary copolymer property space spanned by a pool of 586 monomeric units that are compatible with Yamamoto, Suzuki-Miyaura or Stille coupling (see Fig. 1b for examples), generating around 350 000 possible unique copolymer structures. This library was compiled from commercially available aromatic dibromides and distannanes, as well as non-commercially available building blocks from the organic photovoltaics literature. With this large volume of data, we are able to identify general features of the property space of binary copolymers and their homopolymer counterparts, test the ideas behind common synthetic strategies used to yield lowoptical-gap materials, and explore the extent to which polymer properties can be tuned through copolymerisation.

Properties of interest and polymer models
The optoelectronic properties of a conjugated polymer may be characterised by the key quantities 41 outlined in Fig. 1a. These are the ionisation potential (IP), the energy required to remove an electron from the polymer; the electron affinity (EA), the energy released upon adding an electron to the polymer; and the optical gap, the minimum energy at which the polymer absorbs light to form an interacting electron-hole pair (exciton). Two additional quantities may be derived from these: the fundamental gap, the energy required to form a completely non-interacting electron-hole pair; and the exciton binding energy, a measure of the interaction energy between the excited electron and hole in the exciton (the difference between the optical and fundamental gaps). Note that, throughout the text, we generally focus on the negative of IP and EA, (ÀIP and ÀEA), which map directly onto the commonly used HOMO (ÀIP) and LUMO (ÀEA) concepts which are oen used as approximations to these quantities. Additionally, we approximate the optical gap as the lowest energy excitation (S 0 / S 1 ) for all polymers.
In line with previous work, 18,42-45 we model polymer materials as long-chain oligomers, with the environment of an oligomer in the bulk polymer approximated in the xTB calculations by a dielectric continuum. In previous work we showed that such a model yields accurate ÀIP, ÀEA and optical gap values compared with experimental measurements derived from photoelectron spectroscopy 44 and UV-vis absorption spectra. 18,45 Training data generation The generation of training data follows a tiered strategy, where a relatively small, diverse subset of copolymers is used to calibrate the accurate trends in properties given by a family of semiempirical methods to the absolute values given by DFT. Within this family of semi-empirical, density functional tight-binding methods, GFN-xTB 37 is used for structural optimisation of the neutral polymers. For ÀIP/ÀEA calculations, we use an extension of the parent GFN-xTB method, IPEA-xTB, 38 a variant of GFN-xTB especially parameterised by Grimme and co-workers for the calculation of ÀIP and ÀEA values. For optical gaps, we employ the tight binding simplied Tamm-Dancoff approximation (sTDA) 36 applied to orbitals and orbital eigenvalues obtained from xTB (sTDA-xTB), 46 an approach capable of ultrafast computation of entire UV-vis absorption spectra. All GFN-xTB and IPEA-xTB calculations were performed using the xtb code, 47 while the sTDA results were obtained using the stda code. 48 All GFN-xTB and IPEA-xTB calculations, but not sTDA calculations, used the generalised Born surface area solvation model, with the default parameters for benzene distributed with the xtb code, so as to approximate the environment of a polymer chain in an amorphous polymeric solid. The xTB ÀIP, ÀEA and optical gap values are calibrated to those predicted by B3LYP 49-52 using a linear model and our previously published parameters for the low dielectric permittivity case. 35 Structures for the xTB calculations are generated in a 3-step approach. Starting from a 2D simplied molecular-input lineentry system (SMILES) 53 representation of each monomeric unit, linear polymer structures were generated using the Supramolecular Toolkit (stk), 54,55 a Python library for the assembly, structure generation and property calculation of supramolecules, which takes base functionality from RDKit. stk allows for exible copolymer formation from arbitrary monomer units, control over monomer sequence within repeat units, and the automatic generation of different structural isomers where asymmetric monomer units (e.g. 2,5 linked pyridine) are concerned. In all cases, we restrict repeat units to two monomer units and the polymer chains to 8 monomer units in total, a length that we have previously shown to provide approximately converged optoelectronic properties. 44 Where asymmetric monomer units are concerned, we generate both possible ordered isomers. In a second step, a conformer search is performed using the stochastic Experimental-Torsion Distance Geometry with additional basic knowledge (ETKDG) 56 method, where we typically generate 500 conformers per polymer. The resulting conformers undergo a subsequent optimisation and energy ranking procedure using the Merck Molecular Force Field (MMFF) 57 as implemented in RDKit, 58 where the lowest energy conformer according to MMFF is selected for the xTB calculations.

Neural network training and evaluation
Although all xTB calculations are performed on long-chain oligomer models, we use trimers to generate molecular descriptors in the form of xed-dimensional bit vectors using Extended-Connectivity Fingerprints (ECPF). These bit vectors are obtained directly from the 2D SMILES representations of each trimer using RDKit. Using trimers instead of the entire oligomer chain to obtain molecular ngerprints dramatically reduces the computational effort required for ngerprinting, while preserving all of the sub-structural information of the polymer. The use of 2D SMILES rather than representations of the 3D structures of the polymers is supported by the weak dependence of the optoelectronic properties of the polymer on the conformational degrees of freedom, 35,39 already alluded to in the introduction (see also Fig. S1 †). Though we explored different bit lengths and ngerprint radii, it may be assumed that results were obtained using a 2048 bit and radius 2 ngerprint, unless otherwise stated. The neural network itself has two hidden layers of 128 neurons each, using rectied linear (ReLu) 59 activation functions throughout. To avoid overtting, the neural network is regularised using dropout. 60 Each of the training hyper-parameters, the dropout fraction, as well as the neural network architecture, were obtained by 100 iterations of a random search across the hyper-parameter space (for details, see the ESI †). The network was trained to minimise the mean absolute error (MAE) of the predicted IP, EA and optical gap values using the Adam optimisation algorithm as implemented in Tensorow. 61 The model was evaluated using a simple 50% train-test split of $50 000 polymer structures for which the target properties are calculated. The ngerprinting, model construction, and model training can be reproduced using a freely-available, easy-to-use Python interface. 62

Model generation and performance
The nal model was obtained via a 'data enrichment' process, whereby predictions made for all polymers by the initial model were projected onto 2D property spaces (e.g. ÀIP vs. ÀEA). Areas towards the edge of these property projections with a low density of points (i.e. shallow ÀIP, deep ÀEA and low optical gap) were identied. Monomer units, which were statistically over-represented in these regions, were combined exhaustively with each other and the properties of the resulting copolymers calculated. A fraction (50%, approximately 900 additional examples) of the resulting data is then applied in re-training the neural network model. Here, this procedure is only conducted once, but it is conceivable that it could be performed over many iterations to generate more robust models from more limited training data. Fig. S2 † shows the effect of this data enrichment process. Generally, we see that points at the extrema of the property projection plots tend to be exaggerated (e.g. ÀEA values are under-estimated) prior to re-training.
The resulting neural network model clearly performs very well across the entire range of properties and property values, with root mean square error (RMSE) of less than 0.12 eV when predicting ÀIP, ÀEA and optical gap simultaneously ( Fig. 2a  and b). This represents a signicant improvement in performance over previous attempts for polymers, 34 and a far larger compositional space by several orders of magnitude. Comparing to a linear regression model obtained with an identical ECFP bit length and radius ( Fig. 2c and d), we see that the neural network outperforms the linear model signicantly for all properties (the linear regression model yields an RMS error of 0.30 eV overall). This comparison demonstrates that the neural network model captures some degree of non-linearity when mapping molecular substructures to optoelectronic properties. For high-throughput screening purposes, the neural network model accuracy is perhaps even greater than required, with absolute values as well as relative ordering of polymer properties adequately recovered. Further, high-throughput workows, which rely on a cost-efficient method to screen very large number of structures, generally involve a post-largescale-screening stage, where a promising subset of systems are taken forward and treated at a more computationally intensive level of theory. In this case, however, it appears that this step could effectively be negated by the inherent model accuracy. Fig. S3 † shows model performance when predicting differences in optical gap between isomers using different ngerprint bit lengths and radii. While we observe improvements in this quantity at longer bit lengths and radii, no signicant improvement of the overall model performance is observed and, indeed, increasing these parameters may be detrimental to model generality. On the other hand, effects of monomer isomerism (in the case of asymmetric monomer units) are far better (albeit still roughly) captured at longer radii. This is consistent with the idea that distinctions between repeat unit isomers can only be made effectively when considering larger molecular fragments. In the future, some form of feature engineering could potentially be used to account for monomer isomerism more explicitly.

Comparing the property space of homo and binary copolymers
The large and varied data set at our disposal means that we can empirically probe the optoelectronic property space of binary copolymers and how it differs from that of homopolymers. The optoelectronic property space is a 3D space spanned by vectors corresponding to a polymer's ÀIP, ÀEA and optical gap values. The fundamental gap is by denition equal to the difference between ÀIP and ÀEA and hence not a free parameter. Fig. S4 † shows an image of this property space, showing that all polymers lie in an almost 2D plane embedded in the 3D space. The quasi-two-dimensional nature of the optoelectronic property space nds its origin in the fact that (i) in the limit of zero exciton binding energy, the optical gap would equal the fundamental gap and (ii) the predicted exciton binding energies ($0.5-2 eV), while large compared to classical inorganic semiconductors, are small relative to the fundamental gap ($2-6 eV, see Fig. S5 †). Fig. 3a-c shows projections of the 3D optoelectronic property space on 2D surfaces spanned by (i) ÀIP and ÀEA, (ii) ÀIP and optical gap, and (iii) ÀEA and optical gap, respectively, where we have drawn convex hulls enclosing all homopolymers in each case. Comparing these homopolymer convex hulls with the plotted points for the copolymers it appears that only a very small numberlikely to be statistically insignicant for a dataset of this sizeof copolymers lie outside of the property space spanned by homopolymers. The homopolymers also appear to sample the property space proportionally to the density of copolymers within a given subspace. This suggests that copolymerisation, at least in the case of ordered binary copolymers, does not allow access to additional regions of the optoelectronic property space not already sampled by the homopolymers. The density of points in the case of the copolymers is much larger though, conceptually allowing for more ne-grained property control. Further, we would like to emphasise that these observations may not hold for other properties (e.g. charge-transport properties) and more complex co-polymer repeat units (e.g. ternary and quaternary copolymers). Finally, we note that, even if the vast majority of copolymers lie inside the homopolymer convex hulls, this does not necessarily mean that the properties of a specic copolymer lie in between those of the two corresponding homopolymers, as we will discuss later. Fig. 3d-f shows kernel density estimates of the distributions of ÀIP, ÀEA and optical gap values for both the homo and copolymers. Here we see that the co-polymer property space spans a broad range of values, with signicant numbers of materials present over a range of more than 4 eV for each property. It is clear that in all cases the copolymer distributions are more symmetrical than those of their homopolymer counterparts.

Correlations between copolymer properties
The 2D projections in Fig. 3a-c shows that there are weak correlations between the different properties. In the case of ÀIP and ÀEA, binary copolymers and homopolymers with deep ÀIP values are likely to also have deep ÀEA values and vice versa. In the case of the optical gap, binary copolymers and homopolymers with small(er) optical gaps are more likely to have shallower ÀIP values. Similarly, the same polymers are more likely to have deeper ÀEA values. It is unclear if these correlations are evidence of some deeper relationship or merely result from the fact that the fundamental gap values of the polymers span a range of around 4 eV. Regardless, as we study a large range of monomers, and therefore copolymers, it is apparent that certain property combinations might be difficult to achieve (e.g. copolymers that both have a shallow ÀEA value and a small optical gap; copolymers with a shallow ÀIP value and a large optical gap) due to the absence of copolymers in these regions of property space. As these regions are also not sampled by the homopolymers, this is simply the result of practically all binary copolymers lying within the homopolymer convex hull.

Emergence of copolymer properties and the donor-acceptor model
As briey mentioned in the introduction, models that explain the copolymer optoelectronic properties in terms of the donor and acceptor properties of the monomeric building blocks have been proposed in the literature. In the same vein, we compare the optoelectronic properties of copolymers to their homopolymer counterparts formed from the same building blocks. The reason for comparing with homopolymers rather than monomers is two-fold. Firstly, we do not have direct access to the optoelectronic properties of the isolated building blocks via the neural network. Secondly, the direct comparison of optoelectronic monomer and copolymer properties is inherently fraught by the conation of effects due to the electronic coupling between the different monomers and their polymerisation.
In the absence of a clear rst principles model for this relationship, we employ two simple empirical models which explore two different regimes (i) a "max/min" model in which the ÀIP and ÀEA of the copolymer are predicted by the least negative (shallowest) ÀIP value and the most negative (deepest) ÀEA value of the relevant homopolymer pair, and (ii) an "averaging" model in which the ÀIP and ÀEA values are approximated by the arithmetic mean of the ÀIP and ÀEA values of the homopolymer pair (Fig. 4a). Fig. 4b shows the performance of these models in terms of the ÀIP and ÀEA value of the copolymers. We observe that the averaging model performs well in terms of predicting the ÀIP and ÀEA values of the copolymers, with an RMSE of 0.16 eV overall. The max/min model performs less well (RMSE ¼ 0.38 eV), while appearing to estimate a lower (upper) boundary to the ÀEA (ÀIP) value of a copolymer, reecting the convex hull analysis in Fig. 2a-c. Additionally, we observe that the average model shows the largest deviation for copolymers where the difference between the ÀIP (or ÀEA) values of the homopolymer pair is large (see Fig. S6 †), with a general over-and underestimation of ÀEA and ÀIP, respectively. This is also consistent with the qualitatively curved contour lines shown in Fig. 4c, where, when the difference between ÀIP/ÀEA homopolymer values is large, the more positive ÀIP/more negative ÀEA homopolymer skews the resulting copolymer property further from a perfect average value. Conversely, where the difference between homopolymer values is small, the resulting copolymer properties are closer to the simple average value. Finally, as can be seen in Fig. S8 † use of the averaging model can also qualitatively reproduce the convex hull picture shown in Fig. 2a.
Overall, expressing copolymer properties as a simple average of 'parent' homopolymers appears to be an effective model for most polymers. In the literature, the case for copolymerisation is oen based on the 'donor-acceptor' strategy, 19,21 where combining monomers with 'donor' and 'acceptor' qualities allows one to obtain copolymers with small(er) optical gaps. Here, we can use the large volume of data at our disposal to explore this concept and how it relates to the two empirical models discussed above. Indeed, the predictions made by the neural network identify some co-polymers for which the optical gap is lower than that of the two corresponding homopolymers (Fig. 5c). Specically, we observe that $17 000 out of $350 000 copolymers studied have an optical gap that is at least 0.12 eV (the overall RMSE of the neural network) lower than that of the homopolymers. As can be seen from Fig. 5c, such copolymers generally correspond to cases where the related homopolymers have signicantly different ÀIP and/or ÀEA values, and almost exclusively for cases where the ÀIP and ÀEA values of the two homopolymers are staggered with respect to one another (Fig. 5a). Conversely, when the ÀIP and ÀEA values of one homopolymer straddle the other (Fig. 5b), no reduction in optical gap upon copolymerisation is predicted. Furthermore, the likelihood of reducing optical gap through copolymerisation appears to increase with the extent to which the ÀIP and ÀEA values are staggered (Fig. 5d), which we rationalise through the concomitant decreasing likelihood of this effect being countered by differences in the exciton binding energy between homo and copolymers. Overall, accounting for the overall RMSE of the neural network, we nd that in our dataset $100 000 out of the $350 000 copolymers are staggered by at least 0.12 eV, $17 000 of which display an optical gap reduction of at least 0.12 eV. In contrast, the ÀIP and ÀEA values of the copolymers strictly lie in between those of the two corresponding homopolymers when accounting for the RMSE of the neural network model.
One can explain the above observations by noting that, while the averaging model predicts that the fundamental gap of copolymers always strictly lies in between that of both corresponding homopolymers, and while it is very successful for most copolymers considered, there are copolymers that deviate considerably from its predictions. Such copolymers, as discussed above, tend to correspond to cases where the difference between the ÀIP (and/or ÀEA) values of the homopolymer pair is large (see Fig. 4c and S5 †). In these cases the fundamental gap tends towards that predicted by the max/min model. A combination of this with a staggered arrangement of the ÀIP and ÀEA values of the two homopolymers then gives rise to a fundamental gap that is smaller than either of the homopolymers (see DE max/min in Fig. 5a). As can be seen from Fig. 1a, this explanation translates directly to the case of the optical gap, as long as the exciton binding energies in the co and homopolymers are not sufficiently different. As such, the requirement for a staggered arrangement maps on to the intuitive donor-acceptor picture used in the experimental literature, but stresses that these labels are only really meaningful when considering pairs of monomers and their properties relative to one another.
Overall, these observations and their explanation lend both context and understanding to the donor-acceptor strategy proposed in the literature. With knowledge of the optoelectronic properties of homopolymers alone, we can provide a simple heuristic to predict promising combinations of monomers, which are likely to result in low optical gap materials. Specically, for optical gap reduction to likely occur, not only should the ÀIP and ÀEA values of the two corresponding homopolymers be signicantly different, but they should also be staggered, along the lines of Fig. 5a. This is strongly illustrated by Fig. 5d, which shows that for staggered cases with large ÀIP and ÀEA differences optical gap reduction is highly likely, while for straddled cases the odds of optical gap reduction are effectively zero. The same observation would also suggest that a likely side effect of reducing the optical gap is that the ÀIP and ÀEA values of the resulting copolymers will lie closer to those predicted by the max/min model than its averaging counterpart. As a result, such copolymers will likely combine relatively shallow ÀIP and deep ÀEA values, reducing their potential applicability in domains such as photocatalysis, where the alignment of the polymer potentials relative to those of other materials or solution half-reactions is crucial.

Monomer topography of the property space
Aside from the general exploration of copolymer property space and the testing of models able to describe it, high-throughput calculations have the potential to guide synthetic efforts towards promising materials with properties amenable to certain applications. In the context of copolymers, this could mean either the identication of specic copolymer compositions orperhaps more interestingly from synthetic accessibility and material morphology standpointsmonomers (i.e. (c) Plot of whether a copolymer optical gap is less than (red) or greater than (blue) that of both related homopolymers, as a function of the difference between ÀIP and ÀEA homopolymer values. Quadrants related to 'staggered' and 'straddled' energy levels are highlighted. (d) Fraction of co-polymers within the staggered (red) and straddled (black) arrangements for which the observed optical gap is at least 0.12 eV lower than that of both related homopolymers as a function of the smallest of the differences between the IP and EA values of the related homopolymers. (e) Cumulative histogram of copolymers for which the optical gap/ fundamental gap is less than that of both related homopolymers. Dashed line indicates overall RMSE of neural network model. dibromo compounds or diboronic acids/acid esters)which target a particular region of property space. To illustrate this, we give examples of the most prevalent co-monomers in different regions of the property space (Fig. 6). From this analysis, we see the emergence of some common motifs found in, for example, the organic photovoltaics literature (namely, diketopyrrolopyrrole and benzothiadiazole), where smaller optical gaps are sought aer to absorb more of the solar spectrum. Similarly, monomers that give rise to materials with deep ÀIP and not too deep ÀEA values, which are potentially attractive for watersplitting due to their large driving force for both proton reduction and water oxidation, contain electron-withdrawing substituents like -F and -NO 2 (1,3-linked tetra-uorophenylene and 1,3-linked nitropyrazole). Additionally, these same monomers illustrate the idea that, due to the quasitwo-dimensional nature of the optoelectronic property space, choosing monomers that place ÀIP and ÀEA within a desired range also xes the possible optical gap values to within the domain of possible exciton binding energy values. Finally, Fig. 6 also suggests that, for applications in which ohmic contacts between the polymer and an electrode are important, e.g. organic photovoltaics and organic light emitting diodes, to achieve barrierless charge injection or collection, the properties of the copolymer relative to an electrode can be anchored to a particular value range by copolymerisation with suitably chosen monomers.

Conclusions
We have demonstrated that machine learning techniquesneural networkscan be used to resolve the optoelectronic property landscape of conjugated organic copolymers with very diverse monomer compositions. The neural network training is facilitated by the availability of large amounts of accurate, lownoise data derived from a tiered strategy based on calibrated density functional tight binding calculations, which display an accuracy on par with density functional theory. The property space generated by the neural network allows for the datadriven testing of simple models that link the properties of the constituent monomers of a copolymer to the properties of the copolymer itself. We observe that copolymerisation to make binary copolymers does not appear to allow access to regions of the optoelectronic property space not already sampled by the homopolymers, while allowing for more ne-grained property control. The large dataset at our disposal also facilitates the testing of common synthetic strategies such as using 'donor' and 'acceptor' monomers to construct low-optical-gap materials. Generally, despite the prevalence of this concept in the literature, we observe that this phenomenon is relatively rare. We predict that for a copolymer to have a signicantly smaller optical gap than its related homopolymers, the potentials of these should be substantially offset and arranged in a staggered fashion. From here, one can imagine an application-specic, optimal balance between absolute value of the homopolymer potentials themselves and the extent to which they are staggered relative to one another that achieves ideal copolymer light absorption and redox properties. Additionally, we demonstrate that high-throughput methods could be used to identify promising monomers which target specic regions of property space.

Conflicts of interest
There are no conicts to declare.