Discovery of Hydrogen Storage Molecules using Large Language Models and Machine Learning

Abstract

Accelerating the discovery of new molecules with targeted properties is a central challenge in molecular design. In this contribution, we present an AI-driven molecular discovery framework that integrates Large Language Models (LLMs) for generative molecular design with Machine Learning (ML)-based screening to identify novel Liquid Organic Hydrogen Carrier (LOHC) candidates. Using the developed framework, LOHC molecules were systematically generated, evaluated, and refined iteratively, combining LLM-guided molecular generation and ML-predicted hydrogenation enthalpies (∆H), under physicochemical property constraints such as optimal melting points (MP), desired hydrogen storage capacity (wt % H 2 ), and synthetic accessibility (SA) scores. This approach enabled the discovery of 42 new LOHC candidates in two distinct campaigns, one seeded with experimentally known and another with previously computationally identified LOHCs, respectively. Although we began with different numbers of starting molecules (31 vs. 7 seed molecules), both runs yielded a comparable number of viable candidates, suggesting an influence of chemically intuitive seed molecule selection for success.Selected LOHC molecules, such as 3-methyl pyridine, 1-ethylnapthalene, 1,1-diphenylethane, and benzofuran, were experimentally tested and compared with benchmark LOHCs (toluene and 9ethylcarbazole) for hydrogenation using a series of commercial supported metal catalysts. The order of conversion into fully hydrogenated products at 200 °C was 3-methyl pyridine (100 %) > 9-ethyl carbazole (86.4 %) > 2,3-benzofuran (74 %) > 1,1-diphenylethane (66.9 %) > 1ethylnapthalene (66.7 %) > toluene (57 %), further validating the AI-guided molecular design.This study demonstrates promise of LLM-driven molecular design in conjunction with ML-based screening for accelerated discovery and design of molecules.

Supplementary files

Article information

Article type
Paper
Submitted
04 Mar 2026
Accepted
27 Apr 2026
First published
28 Apr 2026
This article is Open Access
Creative Commons BY-NC license

Digital Discovery, 2025, Accepted Manuscript

Discovery of Hydrogen Storage Molecules using Large Language Models and Machine Learning

H. Harb, M. Ferrandon, T. A. Goetjen, S. Lee, O. K. Farha, M. Delferro and R. Surendran Assary, Digital Discovery, 2025, Accepted Manuscript , DOI: 10.1039/D6DD00102E

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements