Mechanism-informed machine learning for rational catalyst design: application to regioselectivity of allyl acetate hydroformylation
Abstract
Hydroformylation is a key strategy for C–C bond formation and the synthesis of value-added aldehydes, with regioselectivity critically determining downstream efficiency and product applicability. However, the design of highly regioselective catalysts still relies heavily on empirical knowledge derived from existing experiments, and for less-studied functionalized olefin substrates, effective and quantitative predictive methods remain underdeveloped. This work presents, for the first time, a mechanism-informed machine learning model for predicting regioselectivity in hydroformylation, identifying steric hindrance, Rh-centered electronic symmetry, hydride charge distribution, and dispersion interactions as the key cooperative factors governing selectivity. Notably, the Rh_Anisotropy is established as a critical descriptor that effectively captures the geometric and electronic environment around the Rh center, providing a quantitative basis for understanding how ligand structure dictates selectivity. Using the trained model for prediction, low-cost commercial ligands with high linear-selectivity potential were identified and experimentally validated, with the optimal ligand achieving approximately 98% linear aldehyde selectivity under mild conditions. Meanwhile, the model identified structurally innovative potential ligands, including modifications on the xanthene moiety of the xantphos scaffold, providing clear guidance for future ligand design and optimization. This work establishes a computation-data fusion framework that bridges mechanistic understanding and predictive modeling, offering a general paradigm for the rational design of highly selective catalysts for olefin hydroformylation.

Please wait while we load your content...