MOFReasoner: Think Like a Scientist-A Reasoning Large Language Model via Knowledge Distillation
Abstract
Large Language Models (LLMs) have potential in transforming chemical research. Nevertheless, their general-purpose design constrains scientific understanding and reasoning within specialized fields like chemistry. In this study, we introduce MOFReasoner, a domain model designed to enhance scientific reasoning, using Metal-Organic Frameworks (MOFs) adsorption as a case study. By employing knowledge distillation from teacher models and Chain-of-Thought (CoT) reasoning extracted from a corpus of over 8242 research articles and 500 reviews, we developed a domain chemical reasoning dataset. Using domain chemical reasoning datasets, general chemistry datasets, and general reasoning datasets, the LLMs were fine-tuned. The model's performance was evaluated across four tasks: experimental studies, chemical mechanisms, application scenarios, and industrialization challenges. MOFReasoner outperformed existing general-purpose models, such as GPT-4.5 and DeepSeek-R1. Furthermore, the model achieves prediction accuracy comparable to DFT, enabling material recommendation. This work underscores the potential of integrating domain-specific knowledge, CoT reasoning, and knowledge distillation in creating LLMs that support scientific inquiry and decision-making within the discipline of chemistry.
Please wait while we load your content...