Issue 1, 2025

A framework for reviewing the results of automated conversion of structured organic synthesis procedures from the literature

Abstract

Organic synthesis procedures in the scientific literature are typically shared in prose (i.e., as unstructured data), which is not suitable for data-driven research applications. To represent such procedures, there is a well-structured language, named chemical description language (χDL). While automated conversion methods from text to χDL using either a rule-based approach or a generative large language model (GLLM) have been proposed, they sometimes produce errors. Therefore, human review following an automated conversion is essential to obtain an accurate χDL. The aim of this work is to visualize embedded information in the original text with a structured format to support the understanding of human reviewers. In this paper, we propose a novel framework for editing automatically converted χDLs from the literature with annotated text. In addition, we introduce a rule-based conversion method. To improve the quality of automated conversions, a method of using two candidate χDLs with different characteristics was proposed: one generated by the proposed rule-based method and the other by an existing GLLM-based method. In an experiment involving six organic synthesis procedures, we confirmed that showing the outputs of both systems to the user improved recall compared with showing one output individually.

Graphical abstract: A framework for reviewing the results of automated conversion of structured organic synthesis procedures from the literature

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
18 Oct 2024
Accepted
25 Nov 2024
First published
27 Nov 2024
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025,4, 172-180

A framework for reviewing the results of automated conversion of structured organic synthesis procedures from the literature

K. Machi, S. Akiyama, Y. Nagata and M. Yoshioka, Digital Discovery, 2025, 4, 172 DOI: 10.1039/D4DD00335G

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements