A FAIR research data infrastructure for high-throughput digital chemistry

Alice Gauthier; Laure Vancauwenberghe; Jean-Charles Cousty; Cyril Matthey-Doret; Robin Franken; Sabine Maennel; Pascal Miéville; Oksana Riba Grognuz

doi:10.1039/D5DD00297D

A FAIR research data infrastructure for high-throughput digital chemistry

Alice Gauthier,

†^a Laure Vancauwenberghe,

†^b Jean-Charles Cousty,

*^a Cyril Matthey-Doret,

^b Robin Franken,

^b Sabine Maennel,

^b Pascal Miéville

^a and Oksana Riba Grognuz

^b

Author affiliations

* Corresponding authors

^a Swiss Cat+ West Hub, Ecole Polytechnique Fédérale de Lausanne EPFL, 1015 Lausanne, Switzerland
E-mail: jean-charles.cousty@epfl.ch

^b Swiss Data Science Center – Open Research Data Engagement & Services, EPFL, INN Building, Station 14, 1015 Lausanne, Switzerland

Abstract

The growing demand for reproducible, high-throughput chemical experimentation calls for scalable digital infrastructures that support automation, traceability, and AI-readiness. A dedicated research data infrastructure (RDI) developed within Swiss Cat+ is presented, integrating automated synthesis, multi-stage analytics, and semantic modeling. It captures each experimental step in a structured, machine-interpretable format, forming a scalable, and interoperable data backbone. By systematically recording both successful and failed experiments, the RDI ensures data completeness, strengthens traceability, and enables the creation of bias-resilient datasets essential for robust AI model development. Built on Kubernetes and Argo Workflows and aligned with FAIR principles, the RDI transforms experimental metadata into validated Resource Description Framework (RDF) graphs using an ontology-driven semantic model. These graphs are accessible through a web interface and SPARQL endpoint, facilitating integration with downstream AI and analysis pipelines. Key features include a modular RDF converter and ‘Matryoshka files’, which encapsulate complete experiments with raw data and metadata in a portable, standardized ZIP format. This approach supports scalable querying and sets the stage for standardized data sharing and autonomous experimentation.

This article is part of the themed collection: Celebrating International Women’s Day 2026: Women in Digital Discovery

Digital Discovery

A FAIR research data infrastructure for high-throughput digital chemistry

Abstract

Supplementary files

Article information

Download Citation

Permissions

A FAIR research data infrastructure for high-throughput digital chemistry

Social activity

Search articles by author

Spotlight

Advertisements