Advanced scientific information mining using LLM-driven approaches in layered cathode materials for sodium-ion batteries†
Abstract
Materials informatics (MI) has emerged as a powerful paradigm for accelerating materials discovery and development through data-driven approaches. The scarcity of structured materials data, however, remains a critical bottleneck in minimizing the error between experimental and predicted values. Here, we present an advanced large language model (LLM) framework for building a comprehensive materials database of layered metal oxide (LMO) cathode materials in sodium-ion batteries (SIBs). By implementing optimized advanced retrieval-augmented generation techniques, including the tree of clarity (ToC) methodology, our system achieved an accuracy of 0.8861 and an F1-score of 0.9371 in extracting structured materials data from open-source publications. The framework successfully processed 312 publications, rapidly extracting 945 data points related to material composition, crystallinity, operating voltage, and electrode composition at approximately 20 seconds per paper. This automated approach to materials data acquisition demonstrated here is expected to significantly accelerate the development of comprehensive materials databases and enable rapid materials discovery through MI.
- This article is part of the themed collection: Advances in Energy Generation and Conversion Technologies