Large language models for knowledge graph extraction from tables in materials science

Max Dreger; Kourosh Malek; Michael Eikerling

doi:10.1039/D4DD00362D

Large language models for knowledge graph extraction from tables in materials science

Max Dreger,

*^a Kourosh Malek^ab and Michael Eikerling

^abc

Author affiliations

* Corresponding authors

^a Theory and Computation of Energy Materials (IET-3), Institute of Energy Technologies, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
E-mail: m.dreger@fz-juelich.de

^b Centre for Advanced Simulation and Analytics (CASA), Simulation and Data Lab for Energy Materials, Forschungszentrum Jülich, 52425 Jülich, Germany

^c Chair of Theory and Computation of Energy Materials, Faculty of Georesources and Materials Engineering, RWTH Aachen University, 52062 Aachen, Germany

Abstract

Research in materials science increasingly harnesses machine learning (ML) models. These models are trained with experimental or theoretical data, the quality of their output hinges on the data's quantity and quality. Improving data quality and accessibility necessitates advanced data management solutions. Today, data are often stored in non-standardized table formats that lack interoperability, accessibility and reusability. To address this issue, we present a semi-automated data ingestion pipeline that transforms R&D tables into knowledge graphs. Utilizing large language models and rule-based feedback loops, our pipeline transforms tabular data into graph structures. The proposed process consists of entity recognition and relationship extraction. It facilitates better data interoperability and accessibility, by streamlining data integration from various sources. The pipeline is integrated into a platform harboring a graph database as well as semantic search capabilities.

This article is part of the themed collection: 2023 and 2024 Accelerate Conferences

Article information

https://doi.org/10.1039/D4DD00362D

Article type

Paper

Submitted

09 Nov 2024

Accepted

28 Mar 2025

First published

07 Apr 2025

This article is Open Access

Download Citation

Digital Discovery, 2025,4, 1221-1231

Permissions

Request permissions

Large language models for knowledge graph extraction from tables in materials science

M. Dreger, K. Malek and M. Eikerling, Digital Discovery, 2025, 4, 1221 DOI: 10.1039/D4DD00362D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Digital Discovery

Large language models for knowledge graph extraction from tables in materials science

Abstract

Transparent peer review

Article information

Download Citation

Permissions

Large language models for knowledge graph extraction from tables in materials science

Social activity

Search articles by author

Spotlight

Advertisements