Challenges in data-driven catalysis modelling: case study on palladium-NHC catalyzed Suzuki–Miyaura reactions
Abstract
In this study, we synthesized a set of 21 N-heterocyclic carbene (NHC)Pd complexes and evaluated them in a benchmark reaction for Suzuki–Miyaura coupling under 12 different conditions, resulting in a high-quality dataset tailored for machine learning applications. We present a detailed analysis of the data, enabling a thorough assessement of the various parameters (ligand structure and reaction parameters) influencing the reaction yield. We used a new workflow to select descriptors for building linear regression models. The models achieved satisfactory performance in interpolation across all reaction conditions. To ensure these results were not artifacts, we critically examined our models, assessing features explainability, featurization strategies, the impact of train-test splits, and the influence of conformer sets. This work highlights key practical considerations for modeling catalytic activity using machine learning.

Please wait while we load your content...