Discovery of Hydrogen Storage Molecules using Large Language Models and Machine Learning
Abstract
Accelerating the discovery of new molecules with targeted properties is a central challenge in molecular design. In this contribution, we present an AI-driven molecular discovery framework that integrates Large Language Models (LLMs) for generative molecular design with Machine Learning (ML)-based screening to identify novel Liquid Organic Hydrogen Carrier (LOHC) candidates. Using the developed framework, LOHC molecules were systematically generated, evaluated, and refined iteratively, combining LLM-guided molecular generation and ML-predicted hydrogenation enthalpies (∆H), under physicochemical property constraints such as optimal melting points (MP), desired hydrogen storage capacity (wt % H 2 ), and synthetic accessibility (SA) scores. This approach enabled the discovery of 42 new LOHC candidates in two distinct campaigns, one seeded with experimentally known and another with previously computationally identified LOHCs, respectively. Although we began with different numbers of starting molecules (31 vs. 7 seed molecules), both runs yielded a comparable number of viable candidates, suggesting an influence of chemically intuitive seed molecule selection for success.Selected LOHC molecules, such as 3-methyl pyridine, 1-ethylnapthalene, 1,1-diphenylethane, and benzofuran, were experimentally tested and compared with benchmark LOHCs (toluene and 9ethylcarbazole) for hydrogenation using a series of commercial supported metal catalysts. The order of conversion into fully hydrogenated products at 200 °C was 3-methyl pyridine (100 %) > 9-ethyl carbazole (86.4 %) > 2,3-benzofuran (74 %) > 1,1-diphenylethane (66.9 %) > 1ethylnapthalene (66.7 %) > toluene (57 %), further validating the AI-guided molecular design.This study demonstrates promise of LLM-driven molecular design in conjunction with ML-based screening for accelerated discovery and design of molecules.
Please wait while we load your content...