Open Access Article
Tamara
Husch
a,
Nusret Duygu
Yilmazer
a,
Andrea
Balducci
b and
Martin
Korth
*a
aInstitute for Theoretical Chemistry, Ulm University, Albert-Einstein-Allee 11, 89069 Ulm, Germany. E-mail: martin.korth@uni-ulm.de
bMEET, University of Münster, Corrensstrasse 46, 48149 Münster, Germany
First published on 10th December 2014
A volunteer computing approach is presented for the purpose of screening a large number of molecular structures with respect to their suitability as new battery electrolyte solvents. Collective properties like melting, boiling and flash points are evaluated using COSMOtherm and quantitative structure–property relationship (QSPR) based methods, while electronic structure theory methods are used for the computation of electrochemical stability window estimators. Two application examples are presented: first, the results of a previous large-scale screening test (PCCP, 2014, 16, 7919) are re-evaluated with respect to the mentioned collective properties. As a second application example, all reasonable nitrile solvents up to 12 heavy atoms are generated and used to illustrate a suitable filter protocol for picking Pareto-optimal candidates.
One thus needs the Gibbs free energies of oxidation and reduction:
| ΔGox = ΔG(X) − ΔG(X+) ΔGred = ΔG(X−) − ΔG(X) |
Individual free energies are usually taken from density functional theory (DFT) computations by taking zero-point and thermal enthalpic, entropic, as well as (implicit) solvation effects into account:
| ΔG = ΔH − TΔS = ΔE + ΔEZVPE + ΔHT − TΔS + ΔGsolvation |
As an estimate of the oxidation and reduction potentials one can look at the electronic energy differences (electron affinity (EA) and ioniziation potential (IP))
| ΔGox ≈ IP = ΔEox = E(X) − E(X+) ΔGred ≈ EA = ΔEred = E(X−) − E(X) |
| IP ≈ −EHOMO EA ≈ −ELUMO |
In our previous study we have evaluated several computational approaches and approximations for their impact on ranking compounds with respect to their EWS. We suggested a combination of semiempirical quantum mechanical (SQM) and wave function theory (WFT) methods for an efficient two-step screening procedure. All screening results presented below do include the electrochemical stability as a factor, partly based on SQM and WFT data as previously suggested (the database benchmark) and partly based on SQM data only (the nitrile set), as we have found that SQM estimates are usually good enough for ranking compounds within our extended screening procedure outlined below. In the following, we turn our attention to the approximate treatment of collective properties with lower level methods, as no higher-level methods are available for the fast prediction of these properties. At this point we still do not take solid-electrolyte-interface (SEI) formation into account, but schemes for using estimators for complex properties including SEI formation are in preparation. The results presented here are thus based on simplified model systems and approximate computational methods and should be taken with appropriate care (as simpler problems were shown to require much more advanced methods sometimes).34 Our main focus is the definition of a screening strategy, not the benchmarking of lower level methods against each other, though we are able to present some data for the comparison of COSMOtherm with ‘pure’ QSPR type models. We furthermore do not consider ionic liquids here, for which some details of the current screening setup are not optimal, though our scheme can easily be adjusted to work for ionic liquids also. Several collective properties are relevant for improving electrolyte systems. Here we investigate the possibilities of the COSMOtherm model35 for predicting boiling and flash points, viscosities (as estimators of ion conductivity), solubilities and free energies of solvation for several ionic species (as an estimator of solubility again) and of a pure quantitative structure property relationship (QSPR) model of Lang for computing melting points.36
COSMOtherm predictions are based on empirical models which make use of data from electronic structure theory calculations to allow for the description of hitherto experimentally unknown species also (unlike standard chemical engineering models, which usually require some compound-specific, experimentally determined parameters). For COSMOtherm we compare the performance of density functional theory (DFT) based estimates with semi-empirical (SQM) ones with respect to the ranking of candidate compounds. Semi-empirical PM6-DH+
37–40 calculations were done using MOPAC2012,41 making use of the COSMO35 solvation model to generate the input for COSMOtherm. BP8642,43 DFT calculations have been performed using TURBOMOLE 6.4,44,45 D2 dispersion corrections,46 the RI approximation for two-electron integrals,47,48 and again COSMO to generate the input for COSMOtherm. BP86 DFT calculations (again using D2 and RI) and local pair natural orbital (LPNO) coupled electron pair approximation (CEPA1)49 (CEPA in the following) calculations were done using a modified version of ORCA 2.8.50 TZVP, TZVPP and QZVP AO basis sets51 were employed for TURBOMOLE and ORCA calculations.
More information about the COSMOtherm model can be found, for instance, in a recent review by Klamt,35 but some details with direct relevance for the following need to be mentioned: in COSMOtherm, the liquid viscosity of a pure compound at room temperature is computed using a QSPR-type model:
| ln(ηi) = cAAi + CM2Mi2 + cNringNringi + cTSTSi + c0 |
It is based on the surface area Ai of the compound, the second σ-moment Mi, the number of ring atoms Nring and the pure entropy time temperature TSi, as well as on five parameters, which were derived from a set of 175 neutral organic compounds.
For boiling points at a given pressure, COSMOtherm varies the temperature of the system until the difference between the predicted vapor pressure and the given pressure is below 10−4 mbar. The vapor pressure itself is computed via the chemical potential of compound i in system S from the integration of the σ-potential over the surface of the compound
| μSi = μC,Si + ∫pi(σ)μs(σ)dσ |
| μgasi = Eigas − EiCOSMO − ωringNiring + ηgas |
| pSi/1bar = exp[(μgasi − μSi)/RT] |
Flash points are computed from the temperature dependent variation of the vapor pressure until the flash point pressure (FPP) is found,52 which in turn is computed from the molecular surface area A according to:
| ln(FPP) = 22.7–3 × ln(A) |
The prediction of melting points was not possible using COSMOtherm when we initially finished our study, though this feature has recently been added. We do not present COSMOtherm melting point predictions here, but instead use a QSPR model of A. Lang.
The model of Lang uses readily available molecular descriptors (with the number of hydrogen-bond donors and polar surface area as most important ones here) for the purely empirical estimation of melting points. Melting points are especially hard to predict as rather minor differences between molecular structures can result in large melting point differences due to packing effects. More details on QSPR methods and the available software packages can be found in recent reviews.53,54
To get an idea of how well COSMOtherm performs in comparison to ‘pure’ QSPR models we did some additional QSPR calculations using the T.E.S.T. software package.55 Although benchmarking such methods is not the focus of our work, this seemed interesting to us, as QSPR models were, for instance, used to estimate viscosities for the purpose of developing new ionic liquids.56 For our QSPR predictions, we relied on the consensus model (the average over all implemented models) implemented in the T.E.S.T. software package. The details of all included approaches can be found in the T.E.S.T. user guide. In these methods, the properties investigated here are predicted using overall 797 molecular descriptors and relying on experimental data sets for several thousand compounds.
Table 1 shows the predicted and measured8 results obtained for typical electrolyte solvents. Perusing this table, one finds that mean average deviations (MADs, about 0.2 cP and 18/23/23 degrees of viscosities and melting/flash/boiling points) are in the order of about 10 to 15 percent of the relevant property windows (here, 0.33 to 2.53 cP and −137 to 26/−17 to 160/41 to 270 degrees). The correct ranking of compounds can be investigated by looking at correlation coefficients, such as Pearson's R values for linear correlation and Kendall's τ values for non-linear (rank) correlation. Both correlation measures are very high, especially for the COSMOtherm-based estimates (with R values of 0.95 to 0.98 and τ values of 0.73–0.78), which implies that the ranking of compounds with respect to these properties is even better than the prediction of the actual values. This is a very promising result for integrated computational and experimental screening procedures, in which the computational part acts only as a filter for subsequent experimental high-throughput work.
| Viscosity [cP] | Melting point [°C] | Flash point [°C] | Boiling point [°C] | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Calculated | Measured | Deviation | Calculated | Measured | Deviation | Calculated | Measured | Deviation | Calculated | Measured | Deviation | |
| 1,3-DL | 0.74 | 0.59 | 0.15 | −75.7 | −95.0 | 19.3 | −24.0 | 1 | −25.0 | 96.8 | 78 | 18.8 |
| 2-Me-1,3-DL | 0.87 | 0.54 | 0.33 | −40.9 | — | — | −6.5 | — | — | 123.9 | — | — |
| 2-Me-THF | 0.62 | 0.47 | 0.15 | −97.2 | −137.0 | 39.8 | −16.6 | −11 | −5.6 | 112.1 | 80 | 32.1 |
| 4-Me-1,3-DL | 0.86 | 0.60 | 0.26 | −40.9 | −125.0 | 84.1 | −8.1 | −2 | −6.1 | 121.2 | 85 | 36.2 |
| BL | 1.10 | 1.73 | −0.63 | −36.7 | −43.5 | 6.8 | 63.1 | 97 | −33.9 | 237.9 | 204 | 33.9 |
| DEC | 0.76 | 0.75 | 0.01 | −37.0 | −74.3 | 37.3 | −1.5 | 31 | −32.5 | 124.8 | 126 | −1.2 |
| DEE | 0.90 | — | — | −69.7 | −74.0 | 4.3 | 12.1 | 20 | −7.9 | 144.5 | 121 | 23.5 |
| DMC | 0.61 | 0.59 | 0.02 | −15.0 | 4.6 | −19.6 | −30.7 | 18 | −48.7 | 78.9 | 91 | −12.1 |
| DME | 0.55 | 0.46 | 0.09 | −59.6 | −58.0 | −1.6 | −31.5 | 0 | −31.5 | 80.7 | 84 | −3.3 |
| DMM | 0.40 | 0.33 | 0.07 | −98.4 | −105.0 | 6.6 | −61.8 | −17 | −44.8 | 36.6 | 41 | −4.4 |
| EA | 0.50 | 0.45 | 0.05 | −83.8 | −84.0 | 0.2 | −30.0 | −3 | −27.0 | 84.2 | 77 | 7.2 |
| EB | 0.75 | 0.71 | 0.04 | −86.9 | −93.0 | 6.1 | 4.3 | 19 | −14.7 | 134.0 | 120 | 14.0 |
| EC | 1.81 | 1.90 | −0.09 | 22.6 | 36.4 | −13.8 | 97.5 | 160 | −62.5 | 284.3 | 248 | 36.3 |
| EMC | 0.69 | 0.65 | 0.04 | −38.0 | −53.0 | 15 | −15.3 | — | — | 103.2 | 110 | −6.8 |
| MB | 0.65 | 0.60 | 0.05 | −83.6 | −84.0 | 0.4 | −12.2 | 11 | −23.2 | 109.1 | 102 | 7.1 |
| NMO | 1.60 | 2.50 | −0.90 | 40.1 | 15.0 | 25.1 | 111.7 | 110 | 1.7 | 320.0 | 270 | 50.0 |
| PC | 1.79 | 2.53 | −0.74 | −15.2 | −48.8 | 33.6 | 102.8 | 132 | −29.2 | 299.0 | 242 | 57.0 |
| THF | 0.50 | 0.46 | 0.04 | −100.1 | −109.0 | 8.9 | −37.4 | −17 | −20.4 | 81.9 | 66 | 15.9 |
| VL | 1.41 | 2.00 | −0.59 | −17.3 | −31.0 | 13.7 | 85.4 | 81 | 4.4 | 278.4 | 208 | 70.4 |
| MAD | 0.22 | 17.69 | 22.86 | 22.64 | ||||||||
| R | 0.95 | 0.87 | 0.95 | 0.98 | ||||||||
| τ | 0.78 | 0.75 | 0.72 | 0.76 | ||||||||
Table 2 shows a comparison of the performance of the consensus QSPR method implemented in T.E.S.T. with COSMOtherm. Mean absolute deviations (MADs) are higher for the consensus model especially in the case of viscosities. R and τ values are lower for the consensus model, especially the R values of flash and boiling points. To be fair it should be mentioned that the consensus melting point prediction model performed much worse than the one of Lang which we use for our screenings, with an MAD of 37 K (30 K on the fit set) for the former, opposed to 18 K for the latter. This clearly illustrates that better QSPR models compared to the ones implemented in T.E.S.T. are available, which have a strong focus on toxicity prediction, not investigated here. Other available QSPR software packages unfortunately do not supply models for all properties of interest and none seems to be suitable for our high-throughput infrastructure.54
| Property | QSPR | COSMOtherm | ||||
|---|---|---|---|---|---|---|
| MAD | R | τ | MAD | R | τ | |
| Viscosity | 1.15 | 0.83 | 0.68 | 0.22 | 0.95 | 0.78 |
| Flash point | 26.82 | 0.77 | 0.63 | 22.86 | 0.95 | 0.72 |
| Boiling point | 37.08 | 0.63 | 0.65 | 22.64 | 0.98 | 0.76 |
000 molecules from public databases for their redox stability.10 Structures were automatically retrieved in the SMILES format and converted using OpenBabel57 into force field optimized input structures for DFT calculations. The highest occupied molecular orbital/lowest unoccupied molecular orbital (HOMO/LUMO) gaps, dipole moments and elemental composition were used as filters for identifying 83 (out of 100
000) candidate compounds, which were used for a systematic benchmarking of quantum chemical methods. When investigating the ‘hits’ of this screening study in more detail, later on, many turned out to have unfavorable collective properties, like high melting points, which nicely illustrate the need for a multi-level approach like it is presented here. As a first example application we thus re-evaluate the results of this earlier study using our improved approach. Using the previous CEPA ionization potential (IP) and electron affinity (EA) values as estimators of electrochemical stability, and after computing viscosities, melting/flash/boiling points, Li+/Mg2+/Al3+/LiPF6-solubilities and free energies of solvation for all compounds, we applied the following filtering scheme to identify the most promising candidates. The COSMOtherm model is not well suited to describe the properties of small, highly charged ions and thus these results are likely only meaningful for the ranking of rather similar compounds. Furthermore, computed solubilities are just indicated as high for all compounds with reasonable solubility using COSMOtherm, so that we turned to free energies of solvation for the ions as a rough estimator of ion solubility, again to be used only to rank rather similar compounds (which is actually not the case in this example application, but in the next one, see section B). Free energies of solvation are highly correlated for the small ions, so that using one value (we take the one for Li+) is sufficient for ranking purposes. Compounds with an IP below, an EA above, and free energies of solvation (which are negative) above the average were discarded, as well as compounds with melting/flash/boiling points above 273 K/below 323 K/below 373 K. Calculations at all levels were successful for 8772 candidates out of the subset of about 10
000 small organic molecules from the whole database of 100
000 structures. We take problems with any of the calculations as an indicator of the complicated electronic nature of the compound and thus discard it. Filtering left us with 72 structures and we then restricted our list to 53 Pareto-optimal ones, i.e. the candidates which are not equal to or beaten by another candidate with respect to all properties, as non-Pareto optimal candidates would offer no advantages over the remaining stock. To account for the inaccuracy of our approximate models we binned the computed values in 5 percent intervals before checking for Pareto-optimal cases. Further results obtained for these compounds are presented elsewhere, here we concentrate on the evaluation of the COSMOtherm model (but all data are made publicly available on our project web page32).
Table 3 shows the correlation between the different properties computed. Not unexpectedly, one finds a very good correlation between IP and HOMO values and much lower values for the correlation between EA and LUMO values. Melting points are correlated with flash and boiling points, which in turn are almost perfectly correlated with each other. Free energies of solvation for different small cations are also highly correlated, but not correlated to the corresponding values of the large anionic PF6− ions. These findings will be discussed in more detail below, together with the corresponding data for the second application case.
| R | τ | |
|---|---|---|
| IP/HOMO | −0.84 | −0.67 |
| EA/LUMO | −0.57 | −0.29 |
| Melting/flash point | 0.65 | 0.49 |
| Melting/boiling point | 0.63 | 0.48 |
| Boiling/flash point | 0.99 | 0.92 |
| ΔGsolv(Li+)/ΔGsolv(Mg2+) | 0.97 | 0.83 |
| ΔGsolv(Li+)/ΔGsolv(Al3+) | 0.98 | 0.87 |
| ΔGsolv(Mg2+)/ΔGsolv(Al3+) | 1.00 | 0.96 |
Table 4 again shows the correlation between different properties, now for DFT as well as SQM based estimates. First of all, values of SQM are very similar to those of DFT, implying that it is possible to obtain DFT-level ranking results with the much faster SQM method, see also the discussion of Table 5 below. For this set, viscosities are highly correlated with both flash and boiling points, which are in turn again perfectly correlated with each other. Also free energies of solvation for different small cations are again highly correlated, but still not correlated to the corresponding values of the large anionic PF6− ions. Viscosities, and flash and boiling points are inversely correlated with free energies of solvation for PF6−. This implies that for a given compound class high thermal stability and good ion solubility often go hand in hand, but usually come at the price of higher viscosities, i.e. very likely lower ion conductivities. The results obtained for the much more diverse database set presented above, on the other hand, did not show a high correlation between viscosities and boiling and flash points. This indicates that different compound classes show different relationships between viscosities and thermal stability. The best way of addressing the challenge of balancing thermal stability with ion conductivity thus seems to be a diversity oriented approach, which goes beyond the usual compound classes (carbonates, nitriles, etc.).
| DFT | SQM | |||
|---|---|---|---|---|
| R | τ | R | τ | |
| Viscosity/flash point | 0.58 | 0.83 | 0.59 | 0.81 |
| Viscosity/boiling point | 0.52 | 0.78 | 0.54 | 0.78 |
| Melting/boiling point | 0.55 | 0.41 | 0.51 | 0.36 |
| Flash/boiling point | 0.99 | 0.92 | 0.99 | 0.92 |
| Viscosity/ΔGsolv(PF6−) | −0.47 | −0.49 | −0.50 | −0.44 |
| Flash point/ΔGsolv(PF6−) | −0.74 | −0.44 | −0.70 | −0.41 |
| Boiling point/ΔGsolv(PF6−) | −0.70 | −0.42 | −0.66 | −0.39 |
| ΔGsolv(Li+)/ΔGsolv(Mg2+) | 0.93 | 0.76 | 0.95 | 0.79 |
| ΔGsolv(Li+)/ΔGsolv(Al3+) | 0.95 | 0.81 | 0.97 | 0.83 |
| ΔGsolv(Mg2+)/ΔGsolv(Al3+) | 1.00 | 0.94 | 1.00 | 0.95 |
| R | τ | MD | MAD | RMSD | MIMA | |
|---|---|---|---|---|---|---|
| HOMO | 0.96 | 0.86 | 3.70 | 3.70 | 3.71 | 4.36 |
| LUMO | 0.93 | 0.60 | −1.72 | 1.72 | 1.76 | 3.33 |
| Viscosity | 0.95 | 0.89 | 0.40 | 0.50 | 1.76 | 79.34 |
| Boiling point | 0.95 | 0.82 | 25.35 | 26.69 | 31.53 | 548.91 |
| Flash point | 0.95 | 0.83 | 14.07 | 14.85 | 17.81 | 322.57 |
| ΔGsolv(Li+) | 0.74 | 0.56 | 3.64 | 3.66 | 3.72 | 24.61 |
| ΔGsolv(Mg2+) | 0.72 | 0.50 | 9.15 | 9.19 | 9.31 | 64.21 |
| ΔGsolv(Al3+) | 0.73 | 0.50 | 13.25 | 13.30 | 13.49 | 92.31 |
| ΔGsolv(PF6−) | 0.97 | 0.84 | −0.30 | 0.34 | 0.41 | 6.33 |
The used COSMOtherm models are parametrized to work on top of B86/TZVP DFT calculations, but they are also possible to work on top of SQM computations, which are about 2 to 3 magnitudes faster. It is thus of high interest to investigate the effect of using SQM instead of DFT information on the ranking results in more detail. Table 5 shows correlation and error measures – mean deviations (MDs), mean absolute deviations (MADs), root mean square deviations (RMSDs) and error spans (MIMAs) all in kcal mol−1 – for the comparison of properties computed at the DFT level with those computed at the SQM level. Perusing this table, first of all one finds very high correlation values for all computed properties. MD and MAD values of similar magnitude indicate that systematic shifts are found for all properties, which are mostly within the accuracy found for the COSMOtherm approach in comparison to experimental values (Table 1). High error span (MIMA) values nevertheless suggest to re-screen preselections of compounds from SQM level computations at the DFT level again to exclude outliers, or to directly use a two-level approach as a consistency check. Correlation measures close to the ones found for comparison of COSMOtherm with experiment (Table 1) allow one to draw the conclusion that the theoretically less appealing SQM computations can also be very valuable for large-scale screening approaches based on the COSMOtherm model.
Finally, Table 6 shows a comparison of the consensus QSPR method implemented in T.E.S.T. with COSMOtherm for the nitrile set. Mean absolute deviation (MAD), Pearson R and Kendall's τ values between ‘pure’ QSPR and COSMOtherm values are given. Perusing this data one finds that the consensus QSPR model gives substantially different results than the COSMOtherm approach. In light of our evaluation of the consensus model for the systems in Table 1 (see above) and the unavailability of accurate QSPR alternatives that are suitable for our high-throughput approach, COSMOtherm seems to be a better choice for our task.
| Property | MAD | R | τ |
|---|---|---|---|
| Viscosity | 2.51 | 0.37 | 0.41 |
| Flash point | 15.26 | 0.76 | 0.62 |
| Boiling point | 72.56 | 0.71 | 0.59 |
| This journal is © the Owner Societies 2015 |