MOFReasoner: think like a scientist—a reasoning large language model via knowledge distillation

Xuefeng Bai; Zhiling Zheng; Xin Zhang; Hao-Tian Wang; Rui Yang; Jian-Rong Li

doi:10.1039/D5DD00429B

MOFReasoner: think like a scientist—a reasoning large language model via knowledge distillation

Xuefeng Bai,^ab Zhiling Zheng,^c Xin Zhang,

*^ab Hao-Tian Wang,^ab Rui Yang^ab and Jian-Rong Li

*^ab

Author affiliations

* Corresponding authors

^a Department of Chemical Engineering, College of Materials Science & Engineering, Beijing University of Technology, Beijing 100124, P. R. China
E-mail: jrli@bjut.edu.cn, zhang.xin@bjut.edu.cn

^b State Key Laboratory of Materials Low-Carbon Recycling, Beijing University of Technology, Beijing 100124, China

^c Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02142, USA

Abstract

Large Language Models (LLMs) have the potential to transform chemical research. Nevertheless, their general-purpose design constrains scientific understanding and reasoning within specialized fields like chemistry. In this study, we introduce MOFReasoner, a domain model designed to enhance scientific reasoning, using Metal–Organic Framework (MOF) adsorption as a case study. By employing knowledge distillation from teacher models and Chain-of-Thought (CoT) reasoning extracted from a corpus of over 8242 research articles and 500 reviews, we developed a domain-specific chemical reasoning dataset. Using domain-specific chemical reasoning datasets, general chemistry datasets, and general reasoning datasets, the LLMs were fine-tuned. The model's performance was evaluated across four tasks: experimental studies, chemical mechanisms, application scenarios, and industrialization challenges. MOFReasoner outperformed existing general-purpose models, such as GPT-4.5 and DeepSeek-R1. Furthermore, the model achieves prediction accuracy comparable to DFT, enabling material recommendations. This work underscores the potential of integrating domain-specific knowledge, CoT reasoning, and knowledge distillation in creating LLMs that support scientific inquiry and decision-making within the discipline of chemistry.

Digital Discovery

MOFReasoner: think like a scientist—a reasoning large language model via knowledge distillation

Abstract

Supplementary files

Article information

Download Citation

Permissions

MOFReasoner: think like a scientist—a reasoning large language model via knowledge distillation

Social activity

Search articles by author

Spotlight

Advertisements