Large language models for knowledge graph extraction from tables in materials science

Abstract

Research in materials science increasingly harnesses machine learning (ML) models. These models are trained with experimental or theoretical data, the quality of their output hinges on the data's quantity and quality. Improving data quality and accessibility necessitates advanced data management solutions. Today, data are often stored in non-standardized table formats that lack interoperability, accessibility and reusability. To address this issue, we present a semi-automated data ingestion pipeline that transforms R&D tables into knowledge graphs. Utilizing large language models and rule-based feedback loops, our pipeline transforms tabular data into graph structures. The proposed process consists of entity recognition and relationship extraction. It facilitates better data interoperability and accessibility, by streamlining data integration from various sources. The pipeline is integrated into a platform harboring a graph database as well as semantic search capabilities.

Graphical abstract: Large language models for knowledge graph extraction from tables in materials science

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
09 Nov 2024
Accepted
28 Mar 2025
First published
07 Apr 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Advance Article

Large language models for knowledge graph extraction from tables in materials science

M. Dreger, K. Malek and M. Eikerling, Digital Discovery, 2025, Advance Article , DOI: 10.1039/D4DD00362D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements