Issue 4, 2024

Comparing software tools for optical chemical structure recognition

Abstract

The extraction of chemical information from images, also known as Optical Chemical Structure Recognition (OCSR) has recently gained new attention. This new interest is ignited by various machine learning methods introduced over the last years and the new possibilities to train image models for specific tasks such as OCSR. In the present paper, we have compared 8 open access OCSR methods (DECIMER, ReactionDataExtractor, MolScribe, RxnScribe, SwinOCSR, OCMR, MolVec, and OSRA) using an independent test set of images from patents and patent applications as this is an application area of general interest – precision and recall are highly desired by those who are analysing the intellectual property of chemistry patents. As a result, the used methods have shown different strengths when predicting structures from different images containing different modalities and chemistry categories. These existing methodologies for image extraction overall remain unsatisfactory, indicating a need for further advancements in the field. Further, we have created a machine learning image classifier, classifying images into one out of four image categories and applying the best performing OCSR method for each category. This classifier, the image comparator tools, and datasets have been made available to the public as open access tools.

Graphical abstract: Comparing software tools for optical chemical structure recognition

Article information

Article type
Paper
Submitted
21 Nov 2023
Accepted
16 Feb 2024
First published
07 Mar 2024
This article is Open Access
Creative Commons BY license

Digital Discovery, 2024,3, 681-693

Comparing software tools for optical chemical structure recognition

A. Krasnov, S. J. Barnabas, T. Boehme, S. K. Boyer and L. Weber, Digital Discovery, 2024, 3, 681 DOI: 10.1039/D3DD00228D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements