Using High-Throughput Virtual Screening to Explore the Optoelectronic Property Space of Organic Dyes; Finding Diketopyrrolopyrrole Dyes for Dye-Sensitized Watersplitting and Solar Cells

Screening to Explore the Optoelectronic Property Space of Organic Dyes; Finding Diketopyrrolopyrrole Dyes for Dye-Sensitized Watersplitting and Solar Cells. Organic dyes based on conjugated chromophores such as diketopyrrolopyrrole (DPP) have a large range of uses beyond providing colour to other materials, such as in dye-sensitized solar cells, dye-sensitized photoelectrochemical cells, dye-sensitized colloidal photocatalysts and organic photovoltaics. We perform a high-throughput virtual screening using the xTB family of density functional tight-binding methods to map the optoelectronic property space of ~45,000 DPP dyes. The large volume of data at our disposal allows us to probe the difference between symmetric and asymmetric dyes and to identify the apparent boundaries of the optoelectronic property space for these dyes, as well as which substituents give access to particular combinations of properties. Finally, we use our dataset to screen for DPP dyes that can drive the reduction of protons to molecular hydrogen when illuminated as part of dye-sensitized photoelectrochemical cells or dye-sensitized Organic dyes based on conjugated chromophores such as diketopyrrolopyrrole (DPP) have a large range of uses beyond providing colour to other materials, such as in dye-sensitized solar cells, dye-sensitized photoelectrochemical cells, dye-sensitized colloidal photocatalysts and organic photovoltaics. We perform a high-throughput virtual screening using the xTB family of density functional tight-binding methods to map the optoelectronic property space of ~45,000 DPP dyes. The large volume of data at our disposal allows us to probe the difference between symmetric and asymmetric dyes and to identify the apparent boundaries of the optoelectronic property space for these dyes, as well as which substituents give access to particular combinations of properties. Finally, we use our dataset to screen for DPP dyes that can drive the reduction of protons to molecular hydrogen when illuminated as part of dye-sensitized photoelectrochemical cells or dye-sensitized colloidal photocatalysts, or as dyes for TiO 2 -based dye-sensitized solar cells.


Introduction
The use of dyes and pigments is inherently linked to their ability to absorb light. For example, the di(p-chlorophenyl) derivative of diketopyrrolopyrrole (DPP) absorbs most of the visible spectrum, reflecting back only red light. As a result this DPP 1,2 is a red pigment, world famous as Ferrari red or pigment red 254. Similarly, the chemically related indigo molecule absorbs green to red light, reflecting back only blue light, and is the dye that colours jeans blue. 3 However, the use of organic dyes and pigments is not limited to simply providing colour, they also find application in dye-sensitized solar-cells (DSSCs) 4-20 dyesensitized photocatalysts [21][22][23][24][25] and in organic electronics, for example in organic field-effect transistors (OFETs), 2,26,27 organic photovoltaics (OPV) 28 and materials for singlet fission. [29][30][31][32] In DSSCs, the ability of a dye to absorb light of a particular range of wavelengths and inject photoexcited electrons or holes into a semiconductor that would otherwise be transparent to that light, ultimately leads to the generation of an electrical current. Similarly, the dye sensitizer in a dye-sensitized photocatalyst or photoelectrochemical cell (DS-PEC) absorbs light and initiates an electron transfer process, ultimately resulting in the splitting of water into molecular hydrogen and oxygen.
Organic dyes and pigments, which we will refer to simply as "dyes" hereafter, generally consist of a conjugated core, the chromophore after which dyes are typically named, which is augmented with substituents. The function of these substituents can be to improve the adsorption of the dye on a substrate (e.g. carboxylic or phosphonic acid groups), to improve the solubility (e.g. long alkyl chains to improve the solubility in organic solvents), or to modify the optoelectronic properties of the chromophore, e.g. electron withdrawing/donating groups or aromatic substituents, which extend the conjugation.
Understanding the suitability of dyes for particular applications such as DS-PECs, DSSCs or organic photovoltaics, requires knowledge of the dyes optical gap (ΔO) and ionization potential (IP) and electron affinity (EA) values (see Fig. 1 (a)), among other things. The optical gap is the maximum wavelength (or minimum energy) of light able to excite a molecule and generate an exciton, an excited electron-hole pair. The IP and EA are the energy required to extract an electron from a molecule and the energy released when an electron is added, respectively. Finally, the difference between IP and EA is the fundamental gap (Δf), the energy required to generate a non-interacting pair of an excited electron and hole, which is larger than the optical gap by an amount that equals the exciton binding energy.

Figure 1: (a) Definition of -IP, -EA, ΔO, Δf; and EBE, (b) composition of the different dyes studied in term of their building block with an example ABCBA dye and (c) dye building blocks).
The number of possible dyes to consider increases rapidly with the number of substituents and/or substituent sites to consider. Recently, for example, Fallon and co-workers studied nearly 10,000 indigo dyes with (time-dependent) density functional theory (TD-)DFT for their application in singlet-fission. 32 However, even with a relatively small library of substituents and either allowing oligomeric side-chains, e.g. ABCBA, where C is the core of the dye and A and B substituents, or allowing for asymmetric substitution, e.g. ACA', with two different As, as in D-π-A dyes, it is possible to come up with even larger families of dyes. Screening 10,000+ dyes experimentally would be practically impossible. Studying such families of dyes computationally using (TD-)DFT would be, at least currently, equally a laborious task, even if this is now routinely done for smaller libraries of dyes. 15,17 Recently, however, we showed that workflows based on the computationally efficient density functional tight-binding method -GFN-xTBdeveloped by Grimme and co-workers, 33 and its IPEA-xTB 34 and sTDA-xTB 35 extensions, allowed one to screen the optoelectronic properties of co-polymers 36 and small aromatic molecules of much larger libraries than currently feasible with (TD-)DFT and with a comparable accuracy.
Here we use the same approach and apply it to different families of DPP-based dyes with the sequences ABCBA, ACA and ACA'. DPP and its derivatives are a well-known class of semiconducting molecules with a wide range of applications besides that of a pigment. DPP derivatives have been used as dyes for DSSCs, 18,8,11 DS-PEC 25 and OPV, 37 as well as in OFETs. 2,28 The attractive features of DPP include good thermal stability and photostability, as well as the fact that it can be easily functionalised to improve solubility or to provide an anchoring group to allow for adsorption of the dye on a substrate, such as a TiO2 photoanode in a DSSC. We use our screening to understand the general features of the optoelectronic property space of DPP-based dyes, as well as predict the common molecular elements for DPP-based dyes that have the right combination of properties for dye-sensitized photocatalytic proton reduction to molecular hydrogen.

Screening workflow
Three families of dyes with a diketopyrrolopyrrole core (C) and compositions ABCBA (32041 dyes), ACA' (15931 dyes) and ACA (179 dyes), in which the A and B units taken from the same library of 179 common (heterocyclic) aromatic building blocks (see Figs. 1B-C and Fig. S1), were explored computationally. Using a Python pipeline, all possible combinations within these specific sequences were constructed using stk, 38,39 a Python library that takes functionality from RDKit, 40 an opensource cheminformatics toolkit. For each structure, 30 conformers were generated using the Experimental-Torsion Distance Geometry with additional basic knowledge (ETKDG) algorithm. 41 The lowest energy conformer for each structure was then determined using the Merck Molecular Force Field (MMFF) 42 as implemented in RDKit.
Subsequently, the optoelectronic properties of each lowest energy conformer were calculated. The ionisation potential (IP) and electron affinity (EA) of each molecule were predicted using the (GFN/IPEA)-xTB 35,34,43 family of semiempirical tight-binding methods using the xtb code. 44 The optical gap was approximated as the lowest energy singlet excitation and calculated both using the sTDA-xTB method as implemented in the sTDA code 45 and long-range corrected tight-binding DFT (LRC-TD-TB-DFT) as implemented in the DFTBaby code. 46 In the latter case, we limited ourselves to molecules containing only carbon, hydrogen, oxygen and nitrogen (CHON) atoms. A generalised Born solvation model is available within GFN-xTB and was therefore used for the optimisation and IP/EA calculations. As there is no solvation model implemented for sTDA or LRC-TD-TB-DFT, the optical gap calculations were performed in the absence of a solvation model.

Validation and calibration of (GFNIPEA/sTDA)-xTB/LRC-TD-TB-DFT results to (TD-)DFT
For the validation of the xTB/LRC-TD-TB-DFT calculated properties of the different dyes, the properties of sub-sets of each were also calculated using ΔDFT (to calculate IP and EA) and TD-DFT (to calculate the optical gap) respectively. Our functional/basis set choices were B3LYP/6-311G** 47-50 for the ground state geometry and ΔDFT calculations and ωB97X/6-311+G** 51 for the TD-DFT calculations. We use a range-separated functional for the TD-DFT calculations because the donor acceptor character of some of the dyes means that those dyes are likely to have low lying charge-transfer excited states. Such excited-states are spuriously stabilised 52 by non-range-separated functionals, which in that case would mean that the optical gap of dyes with low-lying charge transfer states would be underestimated. Moreover, sTDA-xTB/LRC-TD-TB-DFT both include range-separated exchange. The balanced treatment of local and charge-transfer states by a range-separated functional, such as ωB97X, does come at the expense of a slight blue-shift 53 of all predicted optical gaps and means that we cannot reliably calculate exciton binding energy values. As part of the ΔDFT calculations we optimise the anionic and cationic versions of the dyes and as such the ΔDFT IP and EA values correspond to adiabatic potentials. This is in contrast to their xTB counterparts, which are technically vertical potentials, though they have previously been argued to be good predictors of adiabatic values. 34 All (TD-)DFT calculations were run using Gaussian 16. 54 Finally, we used a subset of the results of of these (TD-)DFT calculations to fit linear calibration models that translate the xTB/LRC-TD-TB-DFT values to the DFT scale 36,55,56 and the remainder to validate the fitted models.

Validation and calibration of the high-throughput approach
We calibrated the performance of (sTDA/IPEA-)xTB and LRC-TD-TB-DFT relative to (TD-)DFT for dyes in water by performing ΔDFT and TD-DFT calculations on a sub-set of 105 dyes, chosen so as to sample the whole range each of the properties can take. We then fitted a linear model that maps the semiempirical results to an approximate DFT scale. The sub-set of dyes included ABCBA, ACA' and ACA dyes. We also used (TD-)DFT to calculate the -IP, -EA and optical gap of building blocks and performed a similar calibration for the combination of the sub-set of dyes and the library of all the building blocks.   Table 1 gives the coefficient of determination (r 2 ) and mean average error (MAE) values for the different fits and Table S1 in the supplementary information the fit coefficients.
From the data in Table 1, it is clear that for -IP and -EA there is generally a good correlation between the values predicted by IPEA-xTB and DFT. For the optical gap the correlation is poor, but improves significantly when excluding dyes containing sulfur (see Fig. S4 and S6 in the supplementary information). Use of sTDA-xTB for some sulfur-containing dyes appears problematic and results in outliers, something we have not previously observed in the case of polymers 36,55 and small molecules. 56 Hence in the remainder of the manuscript, in the case of optical gap value prediction, we only consider dyes that lack sulfur and only contain carbon, hydrogen oxygen and/or nitrogen (CHON). The data in Table 1 also shows that in most cases the correlation is better, r 2 closer to 1, when considering dyes and building blocks in combination rather than the dyes alone. However, the latter also results in generally larger MAE values. Both effects are probably the result of the much larger range of values sampled for each of the properties when including building blocks. This raises the question of which calibration is most useful for the purpose of this study. As we are most interested in making good quality prediction for the dyes, we considered a sub-set of 15 dyes not included in the calibration. This sub-set included ABCBA, ACA' and ACA dyes, as in the calibration stage. The results of this validation can be found in Figs. 3 and Table S2. It is clear that the best results for the dyes were obtained when using the calibration fitted to the dyes only, and we will use that calibration in the remainder of the paper. Finally, we also performed a similar calibration for the dyes in benzene, the results of which can be found in Tables S3 and S4 of the supplementary information.  Figs. S7-S10 in the supporting information show the same information for DPP dyes dissolved in water and benzene before calibration, and Figs. S11 and S12 those for benzene after calibration. For reasons discussed above, plots involving optical gap data only show points for the subset of dyes containing only CHON (14641 ABCBA, 7260 ACA' and 121 ACA dyes). The optoelectronic property spaces of the different families of DPP dyes dissolved either in water and benzene look qualitatively similar, though there are quantitative differences.
From Figs. 4 and 5 it is evident that the different dye families occupy the same region of property space, but that the ABCBA dyes allow for deeper -EA values, shallower -IP values and lower ΔO values than the ACA' and ACA dyes. The ACA' and ACA dyes are very similar in terms of the -IP and -EA values they allow for, but the minimum optical gap of ACA' dyes is ~0.5 eV smaller than for their symmetric ACA counterparts. It is also pertinent to point out the DPP core lies in a different part of property space than that spanned by the different dye families.
In terms of -IP and -EA, the -IP and -EA value of the DPP core lie separately, each within the distribution of -IP and-EA values of the dyes but the combination of the -IP and -EA value of DPP lies considerably outside the -IP and -EA convex hulls of the dyes. The optical gap of DPP even lies outside the distribution of optical gap values of the dyes. This is to be expected though, as extending the πsystem by adding conjugated side-chains should always result in a lowering of the optical gap compared to the non-substituted core.  In terms of the correlation of the different dye properties, the 2D histograms shown in Fig. 4 reveal in general a weak correlation between the investigated properties. The -IP and -EA values of the dyes are most strongly correlated, where in general, deeper -IP dyes also have a deeper -EA values, and vice versa.
No particular trend is observed between -IP and ΔO however, the plot of ΔO vs. -EA suggests that dyes with shallower -EA values are more likely to have larger optical gaps than dye with deep -EA values.

Relationships between the properties of the dyes and those of their building blocks
Intuitively one would assume that the properties of a given dye would be correlated with those of its constituent A and B units. In the absence of first principle insight into the form that such a relationship would take, we explore two empirical models based on our earlier work on co-polymers, 55    The dyes thus clearly inherit their electronic character from their constituent components and it appears that the electronic properties of the A and B units dominate that of the DPP core. This dominance probably arises from the fact that the -IP and -EA values of the DPP core lie near the average of the distributions of -IP and -EA values from the library from which A and B are taken (see Fig S13).
The correlation between the -IP and -EA values of A and B and those of the dye, also means that one can make informed guesses about the electronic properties of DPP dyes without doing calculations on the dyes themselves, something which might be interesting in the case of A or B units that are not in our library. Finally, while we only consider DPP dyes here, we believe based on these results that similar correlations should exist for dyes with other cores.

Relationships between the properties of ABCBA and ACA' dyes and those of their corresponding AACAA and ACA dyes
The two models discussed above can similarly be used to describe the relationship between the -IP and -EA of the ABCBA and ACA' dyes and their corresponding dyes with the same building blocks in the A, B and A' positions, i.e. the corresponding AACAA/BBCBB and ACA/ACA' dyes. Indeed that is exactly what we did in our previous work 36 on co-polymers, where we related the properties of binary co-polymers to those of their respective homopolymers. On the right of Fig. 6, the results for averaging and max/min model for the ABCBA dyes are shown, and below them on the right of Fig. 7 are the results for the same models in the case of the ACA' set of dyes.
We The improved correlations observed here using the properties of the AACAA and ACA dyes as input rather than those of the building block is most likely just because the AACAA and ACA dyes are more similar electronically to the ABCBA and ACA' dyes than the building blocks. Again, even though we only study DPP dyes here, we believe that similar correlations exist for dyes with other cores.

Effect of the asymmetric nature of ACA' dyes
We also explored in the case of ACA' dyes to what extent their asymmetric nature results in a reduction of the optical and/or fundamental gap with respect to their symmetric ACA and A'CA' counterparts. Fig. 8 shows a cumulative histogram of the optical and fundamental gap of the ACA' dyes as a function of the difference between the optical/fundamental gap of the ACA' dye and that of their ACA or A'CA' counterpart with the smallest optical/fundamental gap. As can be seen from Fig. 8 a small but significant amount of ACA' dyes have a fundamental and/or optical gap that is smaller than their ACA and A'CA' counterparts. Inspection of such dyes shows that they generally combine electron-poor and electron-rich building blocks and hence are clear examples of in D-π-A dyes (see Fig. S14).
One possible way to analyse if ACA' dyes are likely to have D-π-A character is to consider the relative positions of the -IP and -EA values of their ACA and A'CA' counterparts, akin to what we previously considered in the case of binary copolymers. 36 The hypothesis would be that when the -IP and -EA values of the ACA dye straddle the -IP and -EA values of the A'CA' dye, or other way around, the corresponding ACA' dye would have D-π-D or A-π-A character, and that only when the -IP and -EA values of the ACA dye are staggered with respect the -IP and -EA values of the A'CA' dye, the equivalent ACA' has D-π-A character. In line with this hypothesis, in the first scenario the max/min model using the ACA/A'CA' -IP and -EA values as input would predict that the ACA' dye would inherit its -IP and -EA values of the same ACA/A'CA' counterpart, the dye being straddled. While in the second scenario, the max/min model would predict that the ACA' dye inherits the -IP value of one of its ACA/A'CA' counterparts and its -EA value of the other.  dyes is an excellent predictor if the fundamental gap of an ACA' dye is smaller than its ACA/A'CA' counterparts. It is a less good predictor in the case of the optical gap probably because there are other effects at play. One such effect could be that dyes differ in their exciton binding energy, the difference between optical and fundamental gap and a measure of how strong the excited electron and hole interact electrostatically in the lowest excited-state. Sadly, the fact that during calibration we have had to use, for the reasons outlined in the method section, a different DFT functional while calculating the optical gap than for -IP and -EA, we cannot reliably estimate exciton binding energy values using our approach for these dyes.

Dye candidates for dye-sensitized proton reduction and dye-sensitized solar cells
For the application of DPP dyes in combination with TiO2 nanoparticles in dyesensitized photocatalysis, certain optical and electronic criteria must be fulfilled on top of the obvious requirement of the dye to be stable in water under illumination: (a) The IP* of the dye, the potential associated with the ability of the excited dye to donate an electron, should be more negative than the conduction band maximum of TiO2 in order to inject electrons into the TiO2 nanoparticle, (b) The IP of the dye should be more positive than that of the water or sacrificial electron donor (e.g. ascorbic acid or triethanolamine) oxidation potential, (c) The optical gap of the dye should be small enough to absorb as much visible light as possible and preferably absorb in the near-infrared (NIR) region, assuming that the injection of an electron in to the TiO2 by the exciton is faster than the extraction of a hole from the dye by the sacrificial electron donor or water. If that is not true, IP* in condition (a) would be replaced by EA and IP in condition (c) by EA* (the potential associated with the ability of the excited dye to donate a hole). The difference between IP* and EA and that between EA* and IP is the exciton binding energy, which is likely to be small in the presence of a high dielectric permittivity solvent like water. Therefore, we consider in the following EA in condition (a) and IP in condition (b), both approximating IP* by EA and EA* by IP, as well as allowing us to be ambivalent about the relative kinetics of electron donation and electron acceptance.
The conduction band of TiO2 lies at ~ -0.40 V 57 with respect to SHE (and thus ~ -4 V with respect to vacuum) and the reduction potential for the one-hole oxidation of ascorbic acid at ~0.7 V at pH2.5, 58,59 the likely pH of an ascorbic acid solution. It is also necessary for the dye to absorb visible light and therefore the optical gap must be small enough to absorb light with energy below 2.95 eV. Oxidising water, rather than a sacrificial electron donor such as ascorbic acid or triethanolamine, is thermodynamically much harder, requiring an IP at pH 7 larger than 1.12 V, when including the additional driving force (0.82 V without), meaning that even less dyes can drive this halfreaction.
As the dielectric permittivity of solvents used in DSSCs, such as acetonitrile, is not that dissimilar to water when taking into account the inverse dependence of -IP and -EA on the dielectric permittivity of the solvent, 60 the standard reduction potential of redox couples such I3 -/3I -(+0.55 V vs. SHE) not that dissimilar to that of ascorbic acid at pH 2.5, and because in TiO2 based DSSCs the dye also needs to inject an electron into TiO2, the ABCBA and ACA' dyes discussed above also make good candidates for DSSCs.
It is worth mentioning, that in addition to the properties we discuss, it is necessary for a dye to have an anchoring group through which to adsorb the dye onto the semiconductor surface. 61 While appropriate anchoring groups such as phosphonic acid or carboxylic acid have not been included in our screening, they are typically electronically benign with respect to the electronic and optical properties of the dyes themselves, and would not pose a problem in terms of our analysis. A more advanced screening, which would also consider the kinetics of electron injection into the semiconductor by the dye, however, would probably require to include the building blocks with explicit anchoring groups in the screening.

Structure-property relationships
Identification of the dominant building blocks in particular regions of chemical space can give an idea of what combinations of chemical moieties lead to certain dye properties. Having gridded up a the plot of the -IP/-EA slice of property space of the ABCBA dyes (see Fig. 9) and determined which building blocks occur most often on the A and B sites we focus below on a few areas of interest: the extremities of the plot -the top right and bottom left corners, the middle sections and also the region that is relevant to the application of dyes in watersplitting.
Top right corner The top right section of the grid was taken to lie within the boundaries of -IP more positive or equal to -4.8 eV and -EA more positive than or equal to -3.3 eV. The fused-pyrrole building block shown in Fig. 9a (see also Table 3) was the most common building block on both the A and B site in this region containing 1529 dyes, occurring 73 times on the A site and 162 times on the B site. As there are 179 building blocks in the library this means 41% and 91% respectively of dyes with this building block on the A or B site, lie in this region. The implication of this is that the building block on the B site has greater control of the dye's overall properties. This observation is even more apparent upon inspection of the convex hulls in Figs. 10 and Fig. S15-S20 for building blocks on the A site (yellow) and on the B site (blue), in which the B convex hull is smaller, with the points more localised in the top right corner. This feature is even more pronounced in other regions of the -IP/-EA slice of property space. The nature of the building block itself, very electron rich, is in line with the shallow -IP and -EA of the resulting dyes. Bottom left corner The limits for this region were -IP more negative or equal to -6 eV and -EA more negative or equal to -4.3 eV. In this region the most dominant building block on the A and B sites differ, as can be seen from table 3. The most common A building block is shown in Fig. 9b and occurs 10 times on the A site and 33 times on the B site. The building block shown in Fig. 9c occurs most often on the B site in this region. Again, the latter occurred more often on the B site (34, 19%) than on the A site (8, 0.4%). The lower percentages of dyes with these building blocks occurring in this region relative to the case for the top right corner is in part the result of the fact that the bottom left corner region contains only 129 dyes in total. The very electron poor nature of the building blocks themselves is in line with the deep -IP and -EA of the resulting dyes.
Middle region For this region, where -EA lies between -3.3 and -4.3 eV, two slices of the -IP range were taken, -5.3 to -4.8 eV and -6.0 to -5.3 eV (see Table 3). The most often occurring building blocks are shown above in Fig. 9d-f. For the first slice, containing 9926 dyes, the most common A and B, Fig. 9d occurs 140 times (84%) on the A site and 169 times (93%) on the B site and the latter occurring 147 times (82%) on the A site and 170 (95%) on the B site. Within the second -IP slice 6485 dyes reside of which the most common building block on the A site is shown in Fig. 9e, which occurs 89 times (50%) on the A site and 124 times (69%) on the B site. The most common B building block shown in Fig. 9f occurs 138 times (77%) on the B site and 87 times (49%) on the A site. As can be seen from the percentages most of the dyes containing these building blocks are found within this region. However, this is to be expected since such a large proportion of all the ABCBA dyes are contained in this region. The nature of the building blocks themselves, less electron rich than the most common building blocks for the top right corner and less electron poor than the most common building blocks in the bottom left corner, is in line with the properties of the resulting dyes.

Dye-sensitized proton-reduction (& DSSC) region
Finally, we focus in on the -IP and -EA region relevant to dye-sensitized proton reduction, of which the boundaries with respect to vacuum are -IP more negative than or equal to -5.1 eV, the redox potential associated with the one-electron oxidation of ascorbic acid, and -EA more positive than -4.05 eV, the band-edge of TiO2. As discussed above, dyes in this region also make good candidates for dyes for TiO2 DSSCs.
Within this dye-sensitized proton reduction region lie 10604 CHON dyes, with only 116 unique building blocks in the A position and 110 in the B position. There are 12 dominant A building blocks and 8 dominant B building blocks each occurring 102 and 115 times, respectively out of the 161 unique CHON building blocks in this truncated dataset -three of these building blocks are both dominant in the A position and in the B position. All of these building blocks are shown in Fig. 11. In line with previous observations the most common building blocks for either the A or the B site occur a larger number of time on the B sitethis is observed for every building block in Table 3. For example, building block i in Fig. 11a occurs 102 Fig. 11a, 102 times on A site and 104 times on the B site, qualitatively it can be seen that the convex hulls for the B site span a more compact region of the plot. This example is shown below in Fig. 10. Structurally the most common building blocks in this region overlap with the most common building blocks found for the middle region (see Table 3D), discussed above. This is perhaps not surprising as the regions also overlap.
In line with what is observed for the middle region, at least one pyridinic nitrogen is present in these building blocks in almost all cases, making these all electron poor structures.  In the case of ACA' dyes, the water splitting region contains 6425 dyes. Of the 121 building blocks considered, only 6 of these are not found in the ACA' dyes in this region. The two most dominant building blocks for ACA' dyes in the water splitting region are benzene and the building block (b)(iii) in Fig. 11 The convex hull for ACA' dyes based on these two building blocks are also shown in Fig. 10 (labelled as A for benzene and A' for building block (b)(iii)) and have a similar spread of data points.
In general, there appears to be a trend in which the B site on the ABCBA dyes has a greater influence on the overall properties of the dye. This can be inferred from

Conclusions
In this work, we have demonstrated the transferability of our previously developed (sTDA/IPEA-)xTB-based high-throughput virtual screening approach to dye structures. Although the use of IPEA-xTB, followed by calibration to DFT results, was non-problematic for -IP and -EA calibration, we encountered problems with the optical gap calculations using sTDA-xTB for a sub-set of sulfur containing dyes, where we found the optical gaps for such dyes calculated with sTDA-xTB relatively poorly correlated with those calculated by TD-DFT.
The property results post-calibration showed that the ABCBA, ACA and ACA' families of dyes occupy the same region of property space, although the ABCBA dyes occupy a larger part of it. The properties of the dyes were found to be correlated with the properties of their constituent building blocks, as well as in the case of the ABCBA and ACA' dyes with those of the simpler AACAA and ACA dyes. While we only consider DPP dyes, we hypothesize that similar correlations exist for dyes with other cores. We further found that a fraction of ACA' dyes have an optical and/or fundamental gap that is smaller than their ACA/A'CA' counterparts and that those dyes tend to have D-π-A character. We also demonstrate that a model originally developed for co-polymers, and which takes the -IP and -EA values of the ACA/A'CA' dyes as input, successfully predicts which ACA' dyes are likely to have smaller gaps than their ACA/A'CA' counterparts.
Our analysis of the building blocks that most commonly occur in dyes in particular areas of property space revealed that the deepest -IPs and -EAs can be accessed by including electron poor building blocks in dyes, with the opposite being true for shallow -IPs and -EAs, as expected. We also observe that ABCBA dyes that share a common building block on the B position lie closer together in property space than dyes that share a common building block on the A position, indicating that the B building block is more influential in terms of the overall dye properties than the A building block. We speculate that the closer proximity of B to the core is the cause of this. Finally, in the region of property space where proton reduction in the presence of ascorbic acid as a sacrificial electron donor is optically and thermodynamically possible, the same region where the dyes have the right -IP and -EA for dye-sensitized solar cells, slightly less than half of all ABCBA and ACA' dyes are found. The most common building block on the B site for ABCBA dyes in this region all possess at least one pyridinic nitrogen.                   Table 3(c) -middle region.   Table 3(d) (right) -middle region.    Other files download file view on ChemRxiv dyes_screening_csvs.zip (4.23 MiB)