MARCUS: molecular annotation and recognition for curating unravelled structures

Kohulan Rajan; Viktor Weißenborn; Laurin Lederer; Achim Zielesny; Christoph Steinbeck

doi:10.1039/D5DD00313J

MARCUS: molecular annotation and recognition for curating unravelled structures

Kohulan Rajan,

^a Viktor Weißenborn,^a Laurin Lederer,^a Achim Zielesny

^b and Christoph Steinbeck

*^a

Author affiliations

* Corresponding authors

^a Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743 Jena, Germany
E-mail: christoph.steinbeck@uni-jena.de

^b Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665 Recklinghausen, Germany

Abstract

The exponential growth of chemical literature necessitates the development of automated tools for extracting and curating molecular information from unstructured scientific publications into open-access chemical databases. Current optical chemical structure recognition (OCSR) and named entity recognition solutions operate in isolation, which limits their scalability for comprehensive literature curation. Here we present MARCUS (Molecular Annotation and Recognition for Curating Unravelled Structures), a tool designed for natural product literature curation that integrates COCONUT-aware schema mapping, CIP-based stereochemical validation, and human-in-the-loop structure refinement. This integrated web-based platform combines automated text annotation, multi-engine OCSR, and direct submission capabilities to the COCONUT database. MARCUS employs a fine-tuned GPT-4 model to extract chemical entities and utilises a Human-in-the-loop ensemble approach integrating DECIMER, MolNexTR, and MolScribe for structure recognition. The platform aims to streamline the data extraction workflow from PDF upload to database submission, significantly reducing curation time. MARCUS bridges the gap between unstructured chemical literature and machine-actionable databases, enabling FAIR data principles and facilitating AI-driven chemical discovery. Through open-source code, accessible models, and comprehensive documentation, the web application enhances accessibility and promotes community-driven development. This approach facilitates unrestricted use and encourages the collaborative advancement of automated chemical literature curation tools.

This article is part of the themed collection: AI in Drug Discovery at ICANN2025

Digital Discovery

MARCUS: molecular annotation and recognition for curating unravelled structures

Abstract

Article information

Download Citation

Permissions

MARCUS: molecular annotation and recognition for curating unravelled structures

Social activity

Search articles by author

Spotlight

Advertisements