MOFReasoner: think like a scientist—a reasoning large language model via knowledge distillation
Abstract
Large Language Models (LLMs) have the potential to transform chemical research. Nevertheless, their general-purpose design constrains scientific understanding and reasoning within specialized fields like chemistry. In this study, we introduce MOFReasoner, a domain model designed to enhance scientific reasoning, using Metal–Organic Framework (MOF) adsorption as a case study. By employing knowledge distillation from teacher models and Chain-of-Thought (CoT) reasoning extracted from a corpus of over 8242 research articles and 500 reviews, we developed a domain-specific chemical reasoning dataset. Using domain-specific chemical reasoning datasets, general chemistry datasets, and general reasoning datasets, the LLMs were fine-tuned. The model's performance was evaluated across four tasks: experimental studies, chemical mechanisms, application scenarios, and industrialization challenges. MOFReasoner outperformed existing general-purpose models, such as GPT-4.5 and DeepSeek-R1. Furthermore, the model achieves prediction accuracy comparable to DFT, enabling material recommendations. This work underscores the potential of integrating domain-specific knowledge, CoT reasoning, and knowledge distillation in creating LLMs that support scientific inquiry and decision-making within the discipline of chemistry.

Please wait while we load your content...