Structured domain knowledge enables trustworthy materials science question-answering with large language models

Daegun Lee; Jiwoo Choi; Gyeong Hoon Yi; Seok Su Sohn; Byungju Lee; Donghun Kim

doi:10.1039/D6DD00028B

Structured domain knowledge enables trustworthy materials science question-answering with large language models

Daegun Lee,

^ab Jiwoo Choi,^a Gyeong Hoon Yi,^d Seok Su Sohn,^b Byungju Lee*^ac and Donghun Kim

*^d

Author affiliations

* Corresponding authors

^a Computational Science Research Center, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea
E-mail: blee89@kist.re.kr

^b Department of Materials Science and Engineering, Korea University, Seoul 02841, Republic of Korea

^c Nanoscience and Technology, KIST School, University of Science and Technology, Seoul, Republic of Korea

^d Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
E-mail: donghun.kim@kaist.ac.kr

Abstract

Large language models (LLMs) remain unreliable for materials science question answering because correct conclusions depend on detailed experimental conditions. Here, we show that a structured, domain-specific knowledge dataset is a critical prerequisite for trustworthy LLM-assisted question answering in materials science. Using water-splitting catalysis as a proof of concept, we curate the literature into a hierarchical, machine-queryable knowledge base encoding material synthesis, composition, and performance. This structured representation improves condition-aware retrieval and reduces context mismatches that commonly arise from superficial semantic similarity. Combined with query reformulation, it achieves 85.6% accuracy on 202 DOI-identification questions versus 21.3% for an unstructured baseline, while reducing operating cost by 39%. To assess broader free-form scientific question answering beyond exact-match retrieval, we further evaluate 202 descriptive questions using the RAGAS framework, which indicates more faithful, evidence-grounded answers. Together, these results show that structured domain knowledge can substantially improve the reliability of LLM-based materials science question answering.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

DOI: https://doi.org/10.1039/D6DD00028B
Article type: Paper
Submitted: 21 Jan 2026
Accepted: 22 Apr 2026
First published: 29 Apr 2026
This article is Open Access

Download Citation

Digital Discovery, 2026, Advance Article

Permissions

Request permissions

Structured domain knowledge enables trustworthy materials science question-answering with large language models

D. Lee, J. Choi, G. H. Yi, S. S. Sohn, B. Lee and D. Kim, Digital Discovery, 2026, Advance Article , DOI: 10.1039/D6DD00028B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Digital Discovery

Structured domain knowledge enables trustworthy materials science question-answering with large language models

Abstract

Supplementary files

Transparent peer review

Article information

Download Citation

Permissions

Structured domain knowledge enables trustworthy materials science question-answering with large language models

Social activity

Search articles by author

Spotlight

Advertisements