High-throughput screening of chemicals as functional substitutes using structure-based classification models†
Abstract
Identifying chemicals that provide a specific function within a product, yet have minimal impact on the human body or environment, is the goal of most formulation chemists and engineers practicing green chemistry. We present a methodology to identify potential chemical functional substitutes from large libraries of chemicals using machine learning based models. We collect and analyze publicly available information on the function of chemicals in consumer products or industrial processes to identify a suite of harmonized function categories suitable for modeling. We use structural and physicochemical descriptors for these chemicals to build 41 quantitative structure–use relationship (QSUR) models for harmonized function categories using random forest classification. We apply these models to screen a library of nearly 6400 chemicals with available structure information for potential functional substitutes. Using our Functional Use database (FUse), we could identify uses for 3121 chemicals; 4412 predicted functional uses had a probability of 80% or greater. We demonstrate the potential application of the models to high-throughput (HT) screening for “candidate alternatives” by merging the valid functional substitute classifications with hazard metrics developed from HT screening assays for bioactivity. A descriptor set could be obtained for 6356 Tox21 chemicals that have undergone a battery of HT in vitro bioactivity screening assays. By applying QSURs, we were able to identify over 1600 candidate chemical alternatives. These QSURs can be rapidly applied to thousands of additional chemicals to generate HT functional use information for combination with complementary HT toxicity information for screening for greener chemical alternatives.