Data-driven ligand field exploration of Fe(iv)–oxo sites for C–H activation†
Abstract
High-valent Fe(IV)–oxo intermediates, found in enzyme active sites, are excellent targets for biomimetic design of molecular catalysts for C–H bond activation. C–H bonds in inert aliphatic hydrocarbons, such as methane, possess strong bonds that are resistant to chemical functionalization. To aid in the screening of potential catalysts for C–H bond activation, computational methods, such as density functional theory (DFT) and machine learning (ML), are valuable tools for performing high-throughput virtual searches of the vast chemical compound space. In this study, we have designed a database of 50 Fe(IV)–oxo species with varying coordination environments which are further functionalized for a total of approximately 181k structures. DFT calculations are then performed on a subset of the molecular database to determine spin states and C–H bond activation energies. The collected data are then curated based on a series of chemically informed criteria. To avoid performing 181k DFT calculations on the total chemical compound space, we developed ML models that utilize a novel molecular representation based on persistence homology, called persistence images (PIs). In particular, we have developed a novel similarity search algorithm, followed by training a regression model to predict C–H activation energies and a classification model to predict the spin states. The priority is to provide high-fidelity predictions for C–H activation barriers. For this purpose, we divided the full database into low- and high-fidelity structures and introduced a metric (δΔG‡) which evaluates the effect of a specific ligand modification with respect to the parent, unsubstituted structure. A validation step that included additional DFT calculations on 15 structures demonstrated the credibility of the proposed methodology.