A FAIR research data infrastructure for high-throughput digital chemistry
Abstract
The growing demand for reproducible, high-throughput chemical experimentation calls for scalable digital infrastructures that support automation, traceability, and AI-readiness. A dedicated research data infrastructure (RDI) developed within Swiss Cat+ is presented, integrating automated synthesis, multi-stage analytics, and semantic modeling. It captures each experimental step in a structured, machine-interpretable format, forming a scalable, and interoperable data backbone. By systematically recording both successful and failed experiments, the RDI ensures data completeness, strengthens traceability, and enables the creation of bias-resilient datasets essential for robust AI model development. Built on Kubernetes and Argo Workflows and aligned with FAIR principles, the RDI transforms experimental metadata into validated Resource Description Framework (RDF) graphs using an ontology-driven semantic model. These graphs are accessible through a web interface and SPARQL endpoint, facilitating integration with downstream AI and analysis pipelines. Key features include a modular RDF converter and ‘Matryoshka files’, which encapsulate complete experiments with raw data and metadata in a portable, standardized ZIP format. This approach supports scalable querying and sets the stage for standardized data sharing and autonomous experimentation.

Please wait while we load your content...