Open Access Article
Trevor N.
Brown
*a,
Alessandro
Sangion
a,
Li
Li
b and
Jon A.
Arnot
acd
aARC Arnot Research & Consulting, Toronto, Ontario, Canada. E-mail: trevor.n.brown@gmail.com
bSchool of Public Health, University of Nevada, Reno, Nevada, USA
cDepartment of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
dDepartment of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada
First published on 6th October 2025
Three Quantitative Structure Property Relationship (QSPR) software packages, IFSQSAR, OPERA, and EPI Suite are compared and assessed for prediction accuracy, applicability domain (AD) and uncertainty of the predictions. A database of experimental physical–chemical (PC) properties is compiled, merged, and filtered, and the QSPRs are assessed with datasets of octanol–water (KOW), octanol–air (KOA), and air–water (KAW) partition ratios. Upper and lower limits on PC property predictions are proposed based on theory, data, and applications of the properties in hazard screening and risk assessment. Validations of the uncertainty metrics of the QSPR packages are done for the PC properties using experimental data external to all training datasets. The IFSQSAR 95% prediction interval (PI95) calculated from root mean squared error of prediction (RMSEP) captures 90% of the external data, while OPERA and EPI Suite require a factor increase of at least 4 and 2 respectively for their PI95 to capture a similar 90% of the external experimental data. The assessment of QSPR consensus predictions identified future research and experimental testing to improve the predictive models for data-poor chemicals such as polyfluorinated or per-fluorinated alkyl substances (PFAS), ionizable chemicals, and chemicals with complex and multifunctional structures.
Environmental significanceThe findings of this work provide decision-makers with better tools to recognize and evaluate the uncertainty associated with physical–chemical (PC) properties when conducting chemical assessments. Reasonable upper and lower bounds on predicted PC properties have been proposed, and three PC property prediction packages have had their prediction uncertainty evaluated and refined against novel datasets. In addition, three major classes of data-poor chemicals have been confirmed as requiring more experimental, theoretical and modelling research: polyfluorinated or per-fluorinated alkyl substances (PFAS); ionizable organic chemicals (IOCs), especially strong acids and bases; and large complex chemicals with multiple heteroatom functional groups. |
It is not feasible to measure PC properties for the several thousand chemicals requiring evaluation and predictive methods are necessary.20 Methods for predicting PC properties include in silico models such as Quantitative Structure–(Activity)Property Relationships (QS(A)PRs)21–30 and quantum chemistry/ab initio31 methods, and empirical models such as Poly-Parameter Free Linear Energy Relationships (PPLFER).32,33 QSPRs are specific to predicting chemical properties, whereas QSARs are more general and may include reactivity and toxicology end points. We use the term QSPR here but the guidance from various sources, which refer to QSARs, also applies. Organisation for Economic Co-operation and Development (OECD) guidance for QSAR development and validation for applications in regulatory decision-making34,35 includes consideration of the applicability domain (AD) for a predicted property.36 AD has been defined by experts as “the response and chemical structure space in which the model makes predictions with a given reliability”,37 and the OECD guidance document on validation of QSARs (“OECD QSAR principles”),34 and the QSAR Assessment Framework (QAF)36 have also adopted this definition. The AD and the reliability are intrinsically linked as implied by the quote, but in the five OECD QSAR principles they are listed as two different principles: "(3). A defined AD; and (4). Appropriate measures of goodness-of-fit, robustness, and predictivity". The QAF also acknowledges that AD and reliability are linked stating “applicability domain informs the reliability of the prediction” but again assesses them separately for convenience. In our previous work we have used the term uncertainty23,25 defined as the inverse of reliability, i.e., high reliability means low uncertainty, and low reliability means high uncertainty. In the QAF the term uncertainty has a broader meaning,36 but it is used here only as the inverse of the term reliability. Previous related work examined the AD for some PC property QSPRs without specifically investigating the uncertainty,38 and provided guidance on selecting and harmonizing measured or predicted values for chemical properties.9 In this work we evaluate and compare both AD and the uncertainty of select QSPRs, especially in the context of data-poor chemicals.
QSPRs from different research groups frequently implement AD in different ways and many methods have been explored in the literature.38–41 Our previous method development work implemented AD in IFSQSAR using chemical similarity, leverage (a distance metric related to the linear regression), a check based on atoms and bonds not found in the training data, and the range of experimental values in the training data,25,42 and refined the uncertainty for partitioning properties.25 The current work compares IFSQSAR Ver. 1.1.221–26 to two other QSPR software packages that provide predictions for many of the same properties: Estimation Programs Interface (EPI) Suite™ Ver. 4.11,27,28 and OPEn (Quantitative) Structure–activity/property Relationship App (OPERA) Ver. 2.9.29 EPI Suite does not explicitly provide AD or uncertainty metrics in its outputs, but the documentation identifies chemical structures which are more prone to prediction uncertainty, and suggests simple AD checks by comparing the properties of chemicals to those in the training data. OPERA provides AD with its output which are based on similar methods as IFSQSAR,29 and provides an expected prediction range as an uncertainty metric.
The primary objective of this study is to better understand and communicate the prediction uncertainty and ADs of the selected QSPR software packages for KOW, KOA, and KAW of neutral organic chemicals and the neutral form of IOCs. This review provides guidance for selecting PC property data for chemical assessments and for integrated testing strategies to systematically address uncertainty in measured and predicted properties. There are chemical classes such as quaternary amines, surfactants, and chemicals with strong specific binding which are out of the AD of partitioning-based models, and these are out of the scope of this work. A general overview of the models selected is provided along with some methods for estimating the prediction uncertainty. Predictions from different KOW, KOA, and KAW models are then compared with a large set of chemicals undergoing regulatory evaluation. A method for choosing which model outputs to include in consensus predictions is described and the predicted values are also compared to measured data which are external to the training datasets of the models selected for this study. Chemical classes and structural features for which uncertainty in the property predictions are large are identified and general recommendations are provided to address these uncertainties.
000 discrete organic chemicals has been collected from various regulatory assessment databases on an ad hoc basis over the past 15 years. Chemical identities and structures were curated through a semi-automated process involving cross-referencing Chemical Abstract Service (CAS) Registration Number, chemical names, and molecular structures across multiple databases such as PubChem47 and US EPA's CompTox Chemistry Dashboard48 to identify and address inconsistencies and errors.49 Standardized representations for a chemical are stored using canonical Simplified Molecular Input Line Entry System (SMILES) notation,50,51 while InChIKeys are used for database indexing. We differentiate between isomeric structures which preserve stereochemistry and counterions, and parent structures which are derived by neutralizing the chemical, i.e., removing counterions and stripping isomeric details. We refer to this dataset as the chemical structure dataset and use it in this work to evaluate how the three QSPR software packages perform on a large dataset of relevant chemical structures which are mostly data-poor. Predictions for the properties assessed in this work have been made with the each of the software packages using the parent structures, because all three QSPR software packages use only two-dimensional (atom connectivity) descriptors which neglect stereochemistry. This dataset and the predicted properties can be accessed in the Exposure And Safety Estimation (EAS-E) Suite online platform (https://www.eas-e-suite.com, database ver.1.0.1), and can be queried with name, CAS Registration Number, or SMILES.
Most of the experimental KOW, KAW, KOA, VP, SW, and MP data originate from the PHYSPROP database52 developed by the EPA Office of Pollution Prevention and Toxics (OPPT) with the Syracuse Research Corporation (SRC). The PHYSPROP data files were downloaded in 2016 and are no longer available on-line and the time stamps on the PHYSPROP files indicate they were last updated in 2008. To address more recent updates to the datasets, the EPI Suite internal experimental databases were searched in batch mode with CAS numbers from the original PHYSPROP datasets. Previous to the current curation efforts, the EPA Office of Research and Development (ORD) updated and curated the SRC datasets, seeking to ensure that chemical identity and chemical structure were correct.53 The ORD version of the datasets were used to develop OPERA.29 The OPERA ver.2.6 experimental datasets were downloaded from GitHub and merged with the updated PHYSPROP datasets forming the preliminary experimental property dataset. OPERA ver.2.9 experimental datasets were investigated; however, problems were identified that are difficult to resolve using automated processing. For example, OPERA ver.2.9 and the CompTox dashboard report some PC data as “experimental” when they are actually QSPR predictions, and some values reported as measurements are averages of multiple sources (including some predicted values), and citations to original literature that could be used to resolve these issues are sometimes missing. Further details on the merging of the PHYSPROP and OPERA 2.6 datasets are described in Section SI-3.
Three other high quality datasets were added to the preliminary experimental property dataset and merged with the chemical structure dataset. Any chemicals identified as salts, permanent ions, or inorganics were excluded. The Henry's Law Constant dataset of Sander2 was incorporated as log KAW after filtering for data that Sander flagged as reliable experimental values. When multiple values were available for a chemical, more recent measurements were selected over older measurements. The log KOA dataset of Baskaran et al.54 was filtered for experimental values measured for dry octanol between 20 and 30 °C, and any data they flagged as unreliable were removed. When more than one value was available the most reliable value, as ranked in the database, and the most recent value was selected. In both datasets when more than one experimental measurement was available for a chemical the arithmetic mean of the log-scale values was used. The Bradley MP dataset55 was also added to the EAS-E Suite experimental property dataset. The full experimental dataset can be accessed in the EAS-E Suite online platform.
Finally, external validation datasets were defined by filtering the experimental property dataset to remove chemicals in the training datasets of the QSPR software packages assessed in this work. The OPERA QSPR package returns experimental values instead of QSPR predictions if a chemical is in its experimental database, so all chemicals identified in the OPERA 2.6 and 2.9 experimental databases were removed from consideration for external testing. The original SRC PHYSPROP database files typically identify chemicals in the EPI Suite training datasets, and these were also removed from consideration. Chemicals are also matched by CAS number with chemicals in the solute descriptor database used to develop the IFSQSARs,24 and any chemicals in the training datasets of the QSPRs or PPLFERs were removed from consideration.
The EPI Suite software package (v4.11, Nov 2017)27,28 was used to predict the properties assessed in this work. EPI Suite QSPR predictions lack explicit AD information, and the software only provides general recommendations in the documentation for determining if a chemical is in the AD of the QSPRs. This suggestion is time-consuming for the EPI Suite user and requires some expertise on structural fragments. To address this limitation, we developed an in-house method to explicitly determine the AD of EPI Suite QSPR predictions, provided training set data and model fragment information are available. We apply this method in the EAS-E Suite database and on-line platform providing AD information for EPI Suite predictions discussed in the present study. EPI Suite provides the point estimate for each endpoint, and we additionally used the root mean squared error of prediction (RMSEP) for the validation datasets from the EPI Suite documentation as an estimated uncertainty metric (for details, see Section SI-5). The following values for standard deviation of prediction from external validation datasets shown in the EPI Suite documentation are used as RMSEPs: log KOW: 0.479, log KAW (bond method): 1.54, log KOA (root mean squared sum of log KOW and log KAW): 1.61, log SW (WATERNT): 1.045, log VP: 1.057.
The OPERA QSPRs29 were developed on the same PHYSPROP datasets as the EPI Suite QSPRs, but with further curation of the datasets and chemical structures,53 a different methodology, and external validation and AD definition adhering to OECD guidance.34,35 A k-nearest neighbours model was developed for each PC property, where the predicted values are the weighted average of the k = 5 nearest neighbours. OPERA applies two complementary approaches for defining the AD for OPERA model predictions and provides an uncertainty metric.
When multiple QSPR predictions are available for a single property the arithmetic mean of logarithmic values, referred to as the “consensus value”, is recommended as a reasonable estimate to combine the battery of QSPR predictions for chemical assessments.63–65 This approach assumes that QSPRs building on different algorithms would contain uncertainties or biases in different directions or aspects and that errant predictions can, therefore, be mitigated to a degree by predictions from other models.64,66
Consensus predictions have been calculated using the three PC property packages by taking the arithmetic mean of the partition coefficients or solubilities on the log scale.67 The IFSQSAR and EPI Suite QSPRs are additive models and can extrapolate outside of their training data, but OPERA QSPR predictions are limited to the range of experimental training set data. Therefore, including the OPERA predictions in every case will bias consensus predictions towards the center of the experimental range which may not be desirable. In all cases the results from IFSQSAR and EPI Suite are included in the consensus value. After testing several approaches, it was decided not to include the OPERA predictions in the consensus values if the OPERA predictions are flagged as out of the AD. See Section SI-5 for more details.
The quantitative uncertainty metric applied in this work is the root mean squared error of prediction (RMSEP) which is an estimate of the prediction uncertainty. The RMSEP can be converted to a prediction interval which is a probabilistic metric. In this work we calculate prediction intervals at the 95% confidence level (PI95), and while this is a common choice the calculations could be made at any other confidence level. Consensus predictions are assigned quantitative uncertainty metrics by summing the RMSEP of the QSPRs that go into them according to summation of error rules. Another uncertainty metric associated with consensus predictions is the root mean squared deviation (RMSD) which shows the spread of the predictions in relation to the consensus. Equations and more details of these metrics are found in Section SI-5.
| Property | Experiment n | QSPR lower limit | Experiment minimum | 2.5% | Median | 97.5% | Experiment maximum | QSPR upper limit |
|---|---|---|---|---|---|---|---|---|
| a The upper limit is set to atmospheric pressure, the experimental values that exceed this are for chemicals that are gases at standard conditions. b The upper limit is an assumed mole fraction of 0.5 for miscible solutes in water, the few experimental values which exceed this are from a 1990s USEPA database which is no longer accessible, so the reason could not be verified but might involve a different way of treating miscible solutes. | ||||||||
| Log KOW | 14 005 |
−6 | −5.08 | −1.3 | 2.03 | 6.36 | 11.29 | 19.3 |
| Log KOA | 855 | −3.2 | −0.95 | 1.76 | 5.56 | 11.47 | 12.59 | 22.3 |
| Log KAW | 2184 | −22.4 | −11.38 | −6.7 | −2.07 | 1.83 | 3.52 | 16.6 |
| Log VP | 2982 | −14.6 | −11.55 | −7.4 | 0.7 | 5.28 | 7.79a | 5.0a |
| Log SW | 5791 | −18 | −13.17 | −8.18 | −2.49 | 1 | 1.58b | 1.4b |
![]() | ||
| Fig. 1 Predicted or calculated vs. experimental values of log KOW for the external validation dataset for (A) IFSQSAR, (B) EPI Suite, (C) OPERA, and (D) consensus values. Martel data56n = 700 span log KOW 1 to 7.5 and Tshepelevitsh data57n = 45 span log KOW −1 to 21. Root mean squared error of prediction (RMSEP) are shown for all data based on applicability domain (AD) and regression lines are shown separately for Martel (dashed) and Tshepelevitsh (dotted) data. Uncertainty Level (UL) corresponds to the AD checking of IFSQSAR with E, 0, 1, 2 considered in AD with increasing uncertainty, 3 is out of AD and 6 is a prediction limit violation. EPI Suite and OPERA AD groups are OK and Borderline in AD, “Warn” is out of AD, and Limit is a prediction limit violation. | ||
The external log KOA data from Baskaran et al.54 are mostly organo-halogens that are frequently within the AD of all three QSPR packages (Table S3). The RMSEP of predicted vs. experimental values is lowest for OPERA (0.533) and the RMSEP of the consensus predictions is comparable (0.547). The good correlation with the external experimental data (R2 = 0.966) as shown in Fig. S11 is likely due to log KOA being an easily predicted property, and that the data are for well-studied chemical classes within the AD of the QSPRs. There is a tendency for larger scatter above log KOA of 6 because more of these are out of AD or only borderline within AD for IFSQSAR and EPI Suite. The external log KAW dataset comes from the review of Sander 2023 (ref. 2) and includes more diverse chemical classes, which cover a much larger range of values (Fig. S12). The data are still mostly within the AD of the three QSPR packages, e.g., IFSQSAR flagged only one chemical out of the AD. The consensus predictions for all chemicals in the external dataset are more accurate, with a lower RMSEP (1.403), than any of the individual QSPR package predictions, showing the benefit of using consensus predictions.
Table S3 also shows the external validation statistics broken down be chemical state, i.e., gases or liquids, and solids for each of the three main partitioning properties. For log KOW and log KOA most of the chemicals are solids, but log KAW has a nearly equal split between solids and non-solids. The accuracy of predictions for solids is poorer, with higher RMSEP than for non-solids for all models and all PC properties. A likely explanation for this is that solids are more frequently out of the AD; for all three properties and for all three QSPR packages the solids always have a greater proportion of chemicals that are out of the AD than the non-solids. The AD information of IFSQSAR for the solids that are out of AD indicates egregious extrapolation from the training dataset. Solids tend to be larger and more complex than liquids and gases, they have more functional groups and more combinations of functional groups which pushes them out of the AD of group contribution QSPRs such as those in IFSQSAR and EPI Suite. OPERA predictions are based on a nearest-neighbours approach, so in this case solids are more frequently out of AD because of a lack of similar chemicals in the training data. Because solids are larger and more complex, they will cover a larger chemical space than gases and liquids, and so proportionally more data for solids is needed in the training data to fill in the chemical space and provide adequate nearest neighbours for solid chemicals.
For the EPI Suite and OPERA QSPR packages the accuracy of the uncertainty metrics was validated as was done for IFSQSAR.25 In brief, the uncertainty metric is estimated as the RMSEP calculated on an external validation dataset, but this tends to underestimate the actual uncertainty, i.e., more chemicals than expected are outside the bounds of the PI95 calculated from the RMSEP. A second external validation dataset is used to fit a scaling factor applied to the calculated RMSEP so that closer to 95% of chemicals are within the PI95. The RMSEP uncertainty metrics were estimated from the original EPI Suite and OPERA validation data as described in Section 2.3.2, and the fraction of chemicals in the external validation datasets from this work within the PI95 are shown as percentages in Table 2. This was done separately for chemicals flagged as in AD and out of AD, because by definition the uncertainty metrics cannot be assumed to be accurate for chemicals that are out of AD. The percentages are first calculated with the uncertainty metrics “as given” as shown in Table 2. The RMSEPs of all IFSQSAR partitioning and solubility QSPRs were scaled by a factor of 1.25 in previous work to make the PI95s capture 95% of the experimental data used in that work, and in this work the fraction is 90% of chemicals in AD and 96% of chemicals out of AD, no further adjustments were made. Less than 95% of chemicals are captured in the IFSQSAR PI95 for log KOW (92%), but the fraction of log KOA and log KAW values captured are even lower. The fraction of log KOW values within the EPI Suite PI95 is much lower than for the other two partition ratios, but the RMSEP is only 0.479 compared to 1.54 and 1.61 for the other partition ratios. The fraction of chemicals in the OPERA PI95 varies from 0.36 to 0.49 and the log KOW PI95 does capture the lowest fraction. These results do not give strong evidence that the experimental log KOW values are more uncertain than the log KOA and log KAW values. However, for log KOA and log KAW there are fewer data and the chemicals are not as diverse as the chemicals with measured log KOW, so the statistics should be treated with some caution. The RMSEP of consensus predictions are calculated using propagation of uncertainty, using the simple assumption of no collinearity. Consensus predictions are only considered to be in AD if all three QSPR packages flag a chemical as in their AD. For EPI Suite and OPERA scaling factors were fitted to make the respective PI95s capture the same percentage of all the experimental values where the predictions were flagged as in AD by IFSQSAR, i.e., 90%. Scaling to reach 90% instead of 95% ensures that the scaling factors are not unduly influenced by any uncertainty in the experimental data. The required scaling factors are shown at the bottom of Table 2. An additional scaling factor of 1.5 was applied only to the out of AD predictions to bring about 95% of chemicals with experimental data within the PI95s of EPI Suite and OPERA, though as previously stated uncertainty for out of AD predictions cannot be assumed to be accurate. The factor 4 increase in RMSEP for OPERA is quite large, it may be the PI of OPERA should be interpreted as ± RMSEP rather than a PI95, in which case the factor increases for both EPI Suite and OPERA would be about 2.
| Model | Property | % In PI in AD as given | % In PI out AD as given | % In PI in AD adjusted | % In PI out AD adjustedc | % In PI out AD readjustedd |
|---|---|---|---|---|---|---|
| a EPI Suite and OPERA RMSEP are scaled so that their % in PI matches this value, see bolded values. b Scaling factor from previous work,25 no further adjustments in this work. c % In PI for out of AD predictions when applying the same scaling factor as in AD predictions. d % In PI for out of AD predictions when applying an additional 1.5 scaling factor. | ||||||
| IFSQSAR | Log KOW | 92 | 96 | 92 | 96 | 96 |
| Log KOA | 88 | 0 | 88 | 0 | 0 | |
| Log KAW | 81 | 100 | 81 | 100 | 100 | |
| All | 90a | 96 | 90 | 96 | 96 | |
| EPI Suite | Log KOW | 57 | 47 | 85 | 78 | 90 |
| Log KOA | 100 | 100 | 100 | 100 | 100 | |
| Log KAW | 96 | 78 | 100 | 100 | 100 | |
| All | 69 | 57 | 90 | 83 | 93 | |
| OPERA | Log KOW | 36 | 4 | 91 | 68 | 88 |
| Log KOA | 49 | 100 | 93 | 100 | 100 | |
| Log KAW | 44 | 55 | 84 | 95 | 100 | |
| All | 40 | 29 | 90 | 81 | 94 | |
| Consensus | Log KOW | 59 | 53 | 85 | 78 | 82 |
| Log KOA | 97 | 82 | 100 | 100 | 100 | |
| Log KAW | 78 | 70 | 95 | 84 | 90 | |
| All | 69 | 58 | 89 | 81 | 84 | |
| IFSQSAR | Uncertainty factor increaseb | 1.25 | 1.25 | 1.25 | 1.25 | 1.25 |
| EPI suite | Uncertainty factor increase | 1 | 1 | 2 | 2 | 3 |
| OPERA | Uncertainty factor increase | 1 | 1 | 4 | 4 | 6 |
000 chemicals in the chemical structure dataset, after removing all chemicals with log KOW values in the experimental property dataset. The three QSPR packages are plotted vs. each other, with the range of experimental values and the prediction limits set in Section SI-2, presented in Table 1, shown in black and red boxes. The chemicals plotted are all the neutralized, de-salted chemicals in the chemical structure dataset. The same types of plots for log KOA and log KAW are shown in Fig. S13 and S14 in the SI. The limits of the models are clear in these plots; OPERA predictions go outside of the range of experimental values in very few cases likely because of specific data points in the OPERA internal database that were excluded from the current work for various reasons, e.g., suspect data points from the expanded OPERA 2.9 database. All the EPI Suite predictions outside of the range of experimental values are flagged as out of AD. IFSQSAR and EPI Suite are correlated over the whole range of values for each property, with the largest scatter in the range of experimental values where most predicted values lay, generally clustered around the 1
:
1 line. At the very upper ranges there is a bias, especially obvious for log KAW, as shown in Fig. S14 where the correlation deviates from the 1
:
1 line. When IFSQSAR and EPI Suite predictions are outside of the range of experimental values, the OPERA predictions tend to stay away from the upper limit of the experimental range. OPERA predictions may be in or out of the AD when the IFSQSAR and EPI Suite predictions are outside of the range of experimental values.
![]() | ||
| Fig. 2 Binary model comparison of log KOW predictions from IFSQSAR, EPI Suite, and OPERA for the chemical structure dataset. | ||
The OPERA and IFSQSAR predictions are both within their respective AD for 60% or more of the chemicals for all three major PC properties, but EPI Suite and IFSQSAR and EPI Suite and OPERA are only both in their AD for less than half the chemicals for every property except for log KOW where the agreement is close to 60%. Agreement between models is best for log KOW, with an average 55.2% and 75.8% of pairs of model predictions agreeing within 1 and 2 log units, respectively. The agreement for log KOA is an average 40.5% and 62.2%, and for log KAW is an average 34.6% and 55.4% within 1 and 2 log units. However, the deviations between model predictions are commonly very large. For example, comparing log KAW predictions between IFSQSAR and OPERA 25.4% of predictions differ by greater than 5 log units. Much of this can be attributed to chemicals with log KAW values that are out of range of experimental data, but even when the comparison is restricted to cases when both IFSQSAR and OPERA are in their AD (12.6% of predictions), more than 7000 chemicals have predictions that differ by greater than 5 log units. Across all model and partitioning property comparisons 14.2% have a deviation greater than 5 log units, and 3.7% have a deviation greater than 5 log units when only considering cases where both models are in their respective AD.
Few of the chemicals have been capped at the prediction limits set in Section SI-2, only about 0.3% of the log KOW predictions from IFSQSAR and EPI Suite were capped at the upper or lower prediction limits. More chemicals were capped at the upper or lower limit for log KOA (6.5%) and log KAW (4%), but the fractions were still small. In these cases, the higher predictions of the QSPRs have been replaced with the value of the prediction limit. The chemicals that have been capped typically fall into one of the classes identified in Section SI-2 from the minimum and maximum values of log VP, log SW, and log SO such as PFAS, waxy alkanes or fatty acid esters, and complex chemicals with multiple heteroatom functional groups.
Two methods were used to investigate chemicals to identify poorly represented chemical classes and help prioritize future experimental work. First, chemicals that are out of the AD of all three QSPR packages were compared to the chemicals within all three AD. Second, chemicals with consensus RMSD values in the highest 75th percentile were compared to chemicals with RMSD in the lowest 25th percentile. Most of the chemicals identified as being out of all three ADs or as having RMSD above the 75th percentile have consensus values outside of the experimental limits. These typically belong to one of the chemical classes identified in Section 3.1 as having the lowest VP, SW, and SO, and are considered experimentally inaccessible using current methods. Instead, this analysis was restricted to chemicals within the experimental limits to investigate which chemicals are poorly represented and have the greatest uncertainty, but which should still be experimentally accessible. The RMSD shows bias towards larger values near the experimental limits, so only chemicals where the IFSQSAR and EPI Suite predictions were at least ±0.674 times the RMSEP (corresponding to a 75% PI) from the upper or lower limit were included.
Next, we seek to better understand which types of chemicals are more likely to fall within or outside the ADs. For this we use solute descriptors (which correlate with molecular interactions) and molar mass to characterize the chemicals. The solute descriptors for chemicals that are in the AD of all three QSPR packages or out of the AD of all three QSPR packages are plotted vs. MW for each of the PC properties in Fig. S18–S20. The same plots for chemicals in the lowest and highest 25th percentiles are shown in Fig. S21–S23. An obvious feature in these plots is a group of chemicals that are out of AD or have high RMSD and have L and V, and to a lesser extent S, solute descriptors that follow a distinctly lower trend extending outside the space covered by chemicals that are in the AD or have low RMSD. These chemicals are PFAS which have unique molecular interactions compared to other chemical classes. Recent work has improved the AD of IFSQSAR with regards to this chemical class,26 but the amount of data available is still small compared to data for other chemical classes meaning many of these chemicals are still out of the AD and have higher uncertainty, especially those with MW greater than 600. This MW range also corresponds to the PFAS with anomalously low SO shown in Fig. S10, so that result may also be due to problems with the AD. The AD plot for log KOA in Fig. S19 shows that scarcely any chemicals in the chemical structure dataset are out of the AD of all three QSPR packages, and Fig. S22 shows that, other than PFAS, chemicals with higher consensus RMSD do not have very different molecular interactions than those with lower RMSD. Overall, despite its smaller training dataset, log KOA predictions are more within the AD of the QSPRs, and the QSPRs make more consistent predictions than for log KOW or log KAW.
All the solute descriptors for the chemicals that are out of AD or have high RMSD tend to be larger, with a much higher MW range, and for non PFAS also higher L and V solute descriptors, than chemicals that are in AD or have low RMSD. The S and B solute descriptors correlating with polar interactions and hydrogen bond acceptor strength show extrapolation to higher values meaning that the chemicals are more complex likely containing more heteroatom functional groups. The solute descriptor that correlates with hydrogen bond donor strength (A) shows a different trend than the other solute descriptors, the chemicals that are out of AD or have high RMSD do not tend to have higher A values than those that are in AD or have low RMSD. This may mean that the A solute descriptor is consistently being under-estimated by IFSQSAR for these chemicals. There are some hydrogen bond-donor functional groups that are not represented in the training data, namely the neutral forms of strong acids, because the hydrogen bond donor strength of these chemicals in their neutral form is experimentally inaccessible. As stated in Section 2.2 this comparison for data-poor chemicals only uses the neutral form of chemicals, and the chemical structures in the chemical structure dataset were de-salted and neutralized.
The results from inspecting the solute descriptors were confirmed by inspecting the atoms and functional groups present in the chemicals that are out of AD of the QSPR packages or have RMSD in the highest 75th percentile. First, all atoms in the typical organic subset (C, N, O, Si, P, S, F, Cl, Br, I) were counted in all chemicals in the chemical structure dataset, and then the number of chemicals containing at least one of each atom type were counted for subsets defined by AD and RMSD groupings. This was done for the three partitioning properties, and trends in the occurrence of each atom type were inspected. Atom types with comparable trends were combined, and some more specific functional groups were also inspected to see if they could better explain the observed trends, the results of this are shown in Table 3. Chemicals containing fluorine are enriched in the subsets of chemicals that are out of AD or have high RMSD, whereas the other halogen atoms either show the opposite trend or no trend. Note that chemicals containing a fluorine are not synonymous with PFAS, but most of the chemicals containing fluorine in the chemical structure dataset are PFAS. Chlorinated and brominated chemicals are well-studied and are well represented in the training data of the different QSPR packages. Iodinated chemicals are less well-represented but in general their PC properties follow similar mechanisms as the chlorinated and brominated chemicals. The heteroatoms N, O, P and S are also enriched in chemicals that are out of AD or have high RMSD. Likewise, Zhang et al.38 also found chemicals with N, S, and P are more likely to fall outside of the ADs of QSPRs investigated in their work. For log KOW and log KOA more than half of the enrichment of heteroatoms can be explained by the presence of just three strong acid groups: carboxylic, sulfuric, and phosphoric acids.
| Property | Class | % 0 out of AD | % 3 out of AD | % <25 perc. RMSD | % >75 perc. RMSD |
|---|---|---|---|---|---|
| a There are 6 chemicals out of all 3 AD for log KOA so these numbers are likely not meaningful. | |||||
| Log KOW | Fluorine | 11 | 15.7 | 13.4 | 21.7 |
| Other halogens | 20.1 | 23.5 | 20.6 | 22.2 | |
| Heteroatoms | 93.1 | 99.2 | 88.3 | 97.0 | |
| Acids | 9.2 | 17.7 | 8.8 | 13.9 | |
| Log KOA | Fluorine | 13.1 | 0a | 12.0 | 20.7 |
| Other halogens | 20 | 0a | 25.7 | 14.7 | |
| Heteroatoms | 89.1 | 100a | 82.1 | 94.7 | |
| Acids | 6.4 | 16.7a | 3.1 | 11.1 | |
| Log KAW | Fluorine | 5.4 | 60.8 | 7.9 | 15.6 |
| Other halogens | 21.7 | 7.8 | 21.6 | 15.5 | |
| Heteroatoms | 90 | 98.7 | 85.3 | 99.0 | |
| Acids | 9.9 | 10.5 | 7.7 | 8.0 | |
Each of the three QSPR packages assessed in this work has merits, and the pre-calculated predictions and corresponding AD as well as consensus values with uncertainty estimates can be accessed in the EAS-E Suite online platform. Each of the packages also has limitations that should be kept in mind when interpreting their results. IFSQSAR PC properties are based on PPLFER equations which have a mechanistic basis correlated to fundamental molecular interactions, this has been exploited in this work to identify chemical classes and functional groups related to extreme property values. IFSQSAR has shown good predictive power for data-poor chemicals classes, e.g., PFAS,26 and has robust AD and uncertainty estimates.25 The main limitation of IFSQSAR is that the PPLFER basis means that the predictions for PC properties are an aggregate of four different QSPRs for the solute descriptors, and the AD and uncertainty therefore are also an aggregate. Despite this, it was found that the uncertainty metrics still underestimated the prediction uncertainty by a factor of at least 1.25 when applied to external data. In contrast, the uncertainty metrics of EPI Suite and OPERA underestimated the prediction uncertainty by factors of at least 2 and 4 respectively.
The main merit of EPI Suite is that its QSPR for log KOW has the best predictive power for many of the cases investigated here, and in previous work.26 The EPI Suite QSPRs for other properties have significantly poorer predictive power, and the definition of AD and uncertainty metrics have been added post hoc or are absent entirely. The OPERA QSPRs have good predictive power within their AD, their AD is well defined, and uncertainty metrics are also supplied. The main limitation of OPERA is that its predictive power decreases precipitously when applied to chemicals that are out of its AD. This review shows the ADs provided by OPERA QSPRs do a good job of identifying the cases where the predictions can be expected to have egregious errors due to problems with extrapolation outside of the range of experimental values and structures in its training sets. Based on the current analysis, OPERA values were excluded from consensus predictions with IFSQSAR and EPI Suite only when OPERA predictions are out of their AD. The resulting consensus values showed better predictive power than any of the individual models across the whole range of experimental values.
By comparing the AD and uncertainty metrics of the three QSPR packages three broad chemical classes have been identified as requiring more research. PFAS are a major class of chemicals that require more research, as is well known and identified by other work.26,69–72 IFSQSAR has improved predictive power after including more partitioning data for PFAS, but there are still too few data compared to other chemical classes. Part of the problem is that PFAS as a class are so diverse, for example some are identified as both the most and least soluble chemicals in octanol. Based on the results in this work, heavy (>600 MW) non-polar PFAS may not be well modelled by partitioning-based models. Acids and bases, and partitioning of ions in general are also an obvious research need. The partitioning properties of strong acids and bases in their neutral form are, and will remain, experimentally inaccessible, so all QSPRs lack data to calibrate predictions for these chemicals. This limitation will likely only be resolved by studying ion partitioning in general and its relation to partitioning of neutral chemicals. The final class of chemicals identified are large complex chemicals with many heteroatom functional groups. The strong acids and bases are a sub-category of these complex chemicals, and many of the heteroatom functional groups are weak acids or bases so many chemicals in this group have the same research needs. Because of the abundance of polar and H-bonding functional groups and their large size, the chemicals in this class are virtually all solids. Predictions for the partitioning and solubility of solids was found to be more uncertain in previous work,25 but this is likely to be a simple case of interpolation being more accurate than extrapolation. Large complex structures are more likely to be out of the AD due a lack of similar chemicals in the training dataset, and therefore more uncertain. Increasing the accuracy of predictions for this chemical class will be difficult, because the structures are very diverse. Making measurements for even more complex chemicals might pull this chemical class further within the AD, but this strategy is intractable because the measurements would be even more difficult. A systemic, representative sampling of the known chemical space may be the best approach available, similar to what was done by Martel et al..56 All three of these chemical classes require more experimental data, but theoretical research and model calculations are also required to advance the science, guide testing strategies, and interpret experimental results.
Supplementary information which provides more details on the data and methods is available. See DOI: https://doi.org/10.1039/d5em00357a.
| This journal is © The Royal Society of Chemistry 2025 |