Structured domain knowledge enables trustworthy materials science question-answering with large language models

Abstract

Large language models (LLMs) remain unreliable for materials science question answering because correct conclusions depend on detailed experimental conditions. Here, we show that a structured, domain-specific knowledge dataset is a critical prerequisite for trustworthy LLM-assisted question answering in materials science. Using water-splitting catalysis as a proof of concept, we curate the literature into a hierarchical, machine-queryable knowledge base encoding material synthesis, composition, and performance. This structured representation improves condition-aware retrieval and reduces context mismatches that commonly arise from superficial semantic similarity. Combined with query reformulation, it achieves 85.6% accuracy on 202 DOI-identification questions versus 21.3% for an unstructured baseline, while reducing operating cost by 39%. To assess broader free-form scientific question answering beyond exact-match retrieval, we further evaluate 202 descriptive questions using the RAGAS framework, which indicates more faithful, evidence-grounded answers. Together, these results show that structured domain knowledge can substantially improve the reliability of LLM-based materials science question answering.

Graphical abstract: Structured domain knowledge enables trustworthy materials science question-answering with large language models

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
21 Jan 2026
Accepted
22 Apr 2026
First published
29 Apr 2026
This article is Open Access
Creative Commons BY license

Digital Discovery, 2026, Advance Article

Structured domain knowledge enables trustworthy materials science question-answering with large language models

D. Lee, J. Choi, G. H. Yi, S. S. Sohn, B. Lee and D. Kim, Digital Discovery, 2026, Advance Article , DOI: 10.1039/D6DD00028B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements