Open Access Article
Kristjan Eimre†
ab,
Matthew L. Evans†‡
*cde,
Bud Macaulay
a,
Xing Wang
ab,
Jusong Yuab,
Nicola Marzari
abf,
Gian-Marco Rignanese
c and
Giovanni Pizzi
*ab
aPSI Center for Scientific Computing, Theory and Data, 5232 Villigen PSI, Switzerland. E-mail: giovanni.pizzi@psi.ch
bNational Centre for Computational Design and Discovery of Novel Materials (MARVEL), 5232 Villigen PSI, Switzerland
cUCLouvain, Institute of Condensed Matter and Nanosciences, Chemin des Ètoiles 8, Louvain-la-Neuve 1348, Belgium. E-mail: me388@cam.ac.uk
dMatgenix SRL, A6K Advanced Engineering Center, Charleroi, Belgium
eDatalab Industries Ltd, King's Lynn, UK
fTheory and Simulation of Materials (THEOS), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
First published on 15th May 2026
Atomistic structural data are central to materials science, condensed matter physics, and chemistry, and are increasingly digitised across diverse repositories and databases. Interoperable access to these heterogeneous data sources enables reusable clients and tools, and is essential for cross-database analyses and data-driven materials discovery. Toward this aim, the OPTIMADE (Open Databases Integration for Materials Design) specification defines a standard REST API for atomistic structures and related properties. However, deploying and maintaining compliant services remains technically demanding and poses a significant barrier for many data providers. Here, we present optimade-maker, a lightweight toolkit for the automated generation of OPTIMADE-compliant APIs directly from raw atomistic structure and property data. The toolkit supports a wide range of raw datasets, enables conversion to a standardised OPTIMADE data representation, and allows for rapid deployment of APIs in both local and production environments. We further demonstrate it through an automated service on the Materials Cloud Archive, which automatically creates and publishes OPTIMADE APIs for contributed datasets, enabling immediate discoverability and interoperability. In addition, we implement data transformation pipelines for the Cambridge Structural Database (CSD) and the Inorganic Crystal Structure Database (ICSD), enabling unified access to these curated resources through the OPTIMADE framework. By lowering the technical barriers to interoperable data publication, optimade-maker represents an important step toward a scalable, FAIR materials data ecosystem integrating both community-contributed and curated databases.
The primary APIs of atomistic structure databases, where present, are often custom-built and lack any interoperability between different providers. Consequently, client applications aiming to access multiple databases must implement support for a variety of incompatible API standards, significantly increasing complexity for the development of clients and of data-driven pipelines that explore the data. To address this issue, the Open Databases Integration for Materials Design (OPTIMADE) consortium, comprising representatives from many major materials databases worldwide, has developed a common specification for a REST (representational state transfer) API15 to serve atomistic structure data and related properties. The OPTIMADE specification is designed to accommodate the diverse requirements and constraints of materials databases, enabling uniform access to atomistic structure data across providers. In addition, the specification defines a standard mechanism for dataset discoverability on the web, allowing compliant APIs to align with the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles.16
As a result, an OPTIMADE ecosystem has emerged. As of March 2026, 20 database providers expose OPTIMADE-compliant APIs, collectively indexing over 25 million materials.17 A growing number of clients and applications leverage the specification to discover, analyse, and aggregate materials data across multiple sources. OPTIMADE APIs have already been successfully employed in several materials discovery and design projects.18–20
Despite these advances, deploying and maintaining a materials database together with a fully compliant OPTIMADE API typically requires dedicated infrastructure consisting of hardware, software, and personnel, which is costly and time-consuming to build and maintain. The associated technical overhead and maintenance effort present a substantial barrier for individual researchers or small research groups who wish to disseminate their data in an interoperable manner.
In this work, we present optimade-maker, a toolkit that addresses this challenge by enabling the automated generation of OPTIMADE APIs directly from raw materials data files, such as simulation outputs or structural assignments. Built on top of the existing optimade-python-tools 21 Python library, optimade-maker can be integrated into data pipelines to provide OPTIMADE-compliant services for production web platforms, while also allowing researchers to quickly deploy a local API for using OPTIMADE-compliant clients with their raw data. Fig. 1 illustrates the position of optimade-maker within the OPTIMADE ecosystem.
We further present representative use cases of optimade-maker developed as part of this work. One such use case is the Materials Cloud Archive OPTIMADE service, which enables researchers to automatically obtain an OPTIMADE API for their datasets upon upload to the Materials Cloud Archive,13 an open research data repository, thereby facilitating immediate interoperability and discoverability. The Materials Cloud Archive links these datasets directly to the newly redesigned Materials Cloud OPTIMADE client, where they can be interactively browsed. This service has already been used to serve several contributed datasets. Furthermore, we used optimade-maker to implement the OPTIMADE data transformation pipelines for the Cambridge Structural Database (CSD) and the Inorganic Crystal Structure Database (ICSD).
command line interface (CLI) tool. The primary features provided by the toolkit include: (1) specification of a simple YAML (YAML Ain't Markup Language) configuration file that describes the raw data and makes it parsable for the toolkit; (2) conversion of the raw data into the standard OPTIMADE JSON Lines file format, based on JavaScript Object Notation (JSON), which was developed as part of this work and is now part of the official specification since v1.3.0; (3) serving an OPTIMADE API directly from a raw data archive or from an OPTIMADE JSON Lines file using the reference server implementation from optimade-python-tools.21
In a typical optimade-maker workflow, the user provides a collection of raw files describing atomistic structures, possibly in an archive file (e.g., a ZIP file), and optionally, any associated properties for these. optimade-maker assigns each structure a unique identifier based on its path and file name (see Section 3.2), and the property files can reference either the identifier or the full path. To make this data parsable by optimade-maker, an
configuration file is provided, describing the locations of files, and defining the relevant property metadata. After this setup, the CLI can be used to convert the data into a standardised OPTIMADE representation and start the API. A step-by-step, beginner-friendly tutorial demonstrating this workflow is available in the project repository, providing a guided introduction for new users.
A schematic overview of a concrete use case is shown in Fig. 2. In this example, the structures are packaged in a ZIP archive (
) containing multiple Crystallographic Information Files (CIFs),22,23 together with a Comma-Separated Values (CSV) file (
) that includes, for each structure, an identifier and a property value – the floating-point total energy per atom. Using this configuration, the
CLI command can convert the raw data into the standard OPTIMADE JSON Lines format, e.g., for archival purposes. The
command directly serves an OPTIMADE API from the raw data. By default, the
command starts the API locally (on
), where it is immediately available for OPTIMADE-compliant queries by client applications allowing for search across any of the standardised OPTIMADE fields and any extra properties defined in the YAML configuration file.
optimade-maker supports raw data in a variety of formats. Atomistic structure files are parsed using the Atomic Simulation Environment (ASE),24 enabling support for most standard and non-standard formats, including CIF, XYZ, and XSF. Pymatgen25 JSON files containing structures and related properties are also supported. These files may be compressed or archived using common formats such as
,
, and
. For property data, CSV and JSON files are supported (see also Section 3.2 on how to map properties to structures).
In addition, optimade-maker can ingest data from AiiDA26–28 archive files or directly from an AiiDA profile, as described in detail in Section 3.3.
The
command is designed to support both rapid local deployment and more complex data pipeline setups, including production-grade APIs. By default, the command converts the raw data and populates a temporary in-memory MongoDB database using the
Python library, eliminating the need for an external database and enabling immediate access to the data through an OPTIMADE API.
Alternatively,
can be configured using a custom optimade-python-tools configuration file. This allows the population of an external MongoDB instance (e.g., for production deployments), customisation of provider metadata, or execution of the data population pipeline without starting the API itself. The latter mode is particularly useful in automated workflows where the API is launched through separate orchestration mechanisms, e.g., as discussed in the next section.
Using optimade-maker, we developed an automated service on the Materials Cloud platform that deploys OPTIMADE APIs for archive entries compatible with the toolkit, as illustrated in Fig. 3. Fig. 3a shows a schematic overview of the data pipeline underlying this service. A Python job, referred to as the Archive watcher, regularly monitors newly published entries in the Materials Cloud Archive, and checks their compatibility with optimade-maker based on the existence of an
file in the appropriate format. Compatible entries are processed using the
command, which converts the raw data into the required internal format and generates a corresponding optimade-python-tools configuration. The processed data are then injected into a MongoDB instance, which is monitored by the Multi-OPTIMADE server, a light wrapper around optimade-python-tools that allows the server to efficiently manage multiple OPTIMADE APIs. Upon detecting new data, the Multi-OPTIMADE server launches the corresponding APIs and mounts them under distinct subpaths of a single Python REST API.
![]() | ||
Fig. 3 Materials Cloud Archive OPTIMADE service. (a) Data pipeline utilizing the optimade-maker toolkit. (b) A webpage for a Materials Cloud Archive entry29 containing an file, where a link is displayed to directly explore the dataset with the new Materials Cloud OPTIMADE client. | ||
The resulting OPTIMADE API endpoints are published to the wider OPTIMADE ecosystem under the Materials Cloud Archive provider identifier
, and can be accessed by any OPTIMADE-compatible client. Each OPTIMADE API gets its own database identifier based on the Materials Cloud Archive DOI that represents all versions of the entry. If a new version of an Archive entry is published, a new OPTIMADE API is set up with the same identifier that replaces the old version of the API.
The Materials Cloud Archive webpage containing the OPTIMADE configuration (Fig. 3b) will contain a direct link to the Materials Cloud OPTIMADE client, fully redesigned as part of this work (see details in Section 3.4). Finally, the created OPTIMADE API endpoint with all relevant metadata and links is also published to the Materials Cloud OPTIMADE overview page at https://optimade.materialscloud.org.
000 hypothetical compounds and the trained ML models were used to rapidly predict their ELFmax values and thermodynamic stability, identifying high priority materials to investigate further with DFT calculations. Through this approach, 41 stable and 104 metastable A2BC2 potential electrides were predicted, with diverse anionic electron densities across the range of electron-deficient, neutral and electron-rich electrides. The three most promising materials were experimentally validated for synthesisability and catalytic activity for ammonia synthesis. The raw data from this work were deposited on the Materials Cloud Archive with a supplementary
configuration file that indicated which aspects of the dataset to index as an OPTIMADE API;30 the authors chose to index the 145 compounds that were computed with DFT and their associated stabilities (formation energies, hull distances versus known phases) and their ELFmax values, exposing them for further search.
In two other related projects, Trinquet et al. performed a combined ML and DFT active learning screening for materials that exhibit strong optical responses: high refractive indices19 and strong second-harmonic generation for a given band gap.20 The initial screening dataset was defined by an OPTIMADE query to curate all hypothetically stable structures that were non-centrosymmetric across contributing databases. Then, seeding the active learning with DFT calculations on the subset of known high performing materials and a random exploratory selection of unknown materials, models were trained to predict optical response. These models were then used to prioritise the next structures on which to perform the high-fidelity DFT calculations, with the overall search campaign comprising several such repeating loops. The resulting datasets from each study, comprising structures, their corresponding DFT-computed refractive indices31 and their second-harmonic generation coefficients,32 were collated and uploaded to Materials Cloud Archive. Both datasets provide annotations describing the additional computed fields, e.g.,
with description “The effective Kurtz–Perry powder coefficient from the conventional HSE scissor-corrected SHG tensor” and unit “pm per V”, which are reported in the
endpoint of the corresponding OPTIMADE API and are exposed for search.
Each of the three datasets described above return results when OPTIMADE users make cross-provider queries, making use of the OPTIMADE APIs served by Materials Cloud Archive. Custom fields specified by the user are prepended with the
provider prefix and the underlying properties are made queryable via OPTIMADE; for example, the query
will return all low-lying hypothetical structures from ref. 29.
000 entries, respectively. However, as commercial databases, programmatic access is limited to license holders and requires bespoke software. OPTIMADE makes no requirement that data conforming to it be open or freely available; this benefits users and database providers as tools written to target commercial (or otherwise closed) datasets that conform to OPTIMADE should also automatically work on open datasets, and vice versa, preventing further fragmentation of the ecosystem.
The UK's Physical Sciences Data Infrastructure (PSDI),34 following the former Physical Sciences Data Service, provides access to both the CSD and ICSD to UK academics through a combined license. PSDI identified the need for materials API standardisation to enable cross-search of data resources that they collect, aggregate and curate and decided upon OPTIMADE as the enabling technology.
To enable this, data pipelines were developed to map all entries in the CSD and ICSD into the OPTIMADE format, encompassing structural, bibliographic and chemical data pertaining to each entry, via the CSD Python API35 and the ICSD REST API, respectively. These mapped entries were then written to a combined OPTIMADE JSON Lines file which can be served as an OPTIMADE API using the tools provided by optimade-maker.
This approach is significantly simpler than the alternative of mapping database queries and outputs from the existing database-specific formats and returning them in an OPTIMADE compliant way, but comes at the cost of needing to run a secondary database and API that must be periodically updated from the live source. Each database required its own extension fields to the core OPTIMADE structure type; the CSD focusing on chemical identifiers (SMILES, InChI etc.) and molecular crystal properties (Z, Z′), whereas the ICSD made use of several tabulated CIF23 fields that describe diffraction experiments such as measurement conditions and goodness-of-fit.
Although not the focus of this work, software implementations for these pipelines can be found on GitHub at https://github.com/datalab-industries/csd-optimade and https://github.com/datalab-industries/icsd-optimade. The resulting OPTIMADE APIs are not publicly accessible, as only licence holders are allowed to query and retrieve the underlying data. However, these pipelines are deployed by the PSDI so that UK academic users can receive seamless access to structures from the CSD and ICSD in the PSDI Cross Data Search service,34 alongside many other resources, with unified querying and semantics to access property definitions powered by OPTIMADE.
, (2)
, (3)
, and (4)
.
defines the
configuration format, making use of Pydantic36 to provide typed, versioned schema definitions for each field. The configuration can be provided as JSON, YAML, or directly as a Python dictionary. As shown in Fig. 2, the configuration consists of a few top-level metadata fields about the database itself, and then subconfiguration objects per entry type that define the files to parse as entries and, optionally, properties. When user-defined properties are provided, they must be accompanied by extra metadata in the OPTIMADE property definition, including the field name, title, OPTIMADE data type (e.g., float or integer), unit, and a human-readable description.
implements the pipeline that takes the archived data and applies the scheme defined in the user-supplied
configuration file to create a single combined OPTIMADE JSON Lines file for the dataset. The basic process is as follows: first, decompress the archived data (typically provided as a ZIP or tar file), then loop through the OPTIMADE resource types (e.g., structures, references) that have processing rules provided in the configuration file. These rules include a list of patterns that match file paths to attempt to parse as the given entry type,
(e.g., the wildcard
), and an optional list of
corresponding to auxiliary files that contain property data pertaining to those entries. The properties themselves need to be defined in the
field of the configuration. The conversion process then constructs the appropriate entries for each entry type, decorates them with any provided properties, creates the corresponding
resource that describes the user-extended entry type in the OPTIMADE format, and finally saves each entry as an individual line in a combined JSON Lines file.
The
module is a registry of tools that map files of a given representation (standardised or otherwise) into intermediate objects that can be mapped to OPTIMADE entries. These tools tend to be implemented in other open source libraries, such as ASE24 or pymatgen.25 These libraries, combined, provide parsers for many atomistic simulation codes and otherwise standardised structural file formats (e.g., CIF, XYZ), with light wrappers to allow the parsers to fail fast. For structural data, parsers can return either ASE
or Pymatgen
objects, which are then mapped to OPTIMADE structures using the adapters implemented in optimade-python-tools.21 Rather than relying on file extensions, each parser is run in turn on each file until it can be successfully parsed and converted into an OPTIMADE structure. A similar process is followed for parsing property data, although here only CSV and JSON files are supported, which can both be unambiguously read using the
library or the Python standard library. Extra validation is performed against the user-provided property definitions, ensuring that data types can be appropriately coerced and IDs can be uniquely matched to the created entries (see Section 3.2). These parser components can be easily extended to accommodate new libraries, file formats and entry types.
Finally,
takes the constructed OPTIMADE JSON Lines file and serves it with an OPTIMADE API using the reference server implementation from optimade-python-tools (based on FastAPI and leveraging MongoDB as the database backend), which supports the majority of OPTIMADE 1.3 features and enables search over both standard and user-provided properties. This allows future updates (e.g., new features, performance improvements) to optimade-python-tools and the OPTIMADE specification itself to be readily accommodated in optimade-maker.
) based on its path relative to the
configuration file. The identifier is constructed using a simple deterministic rule: from the set of all file paths, the longest common prefix and suffix (including file extensions) shared by all paths are removed. For example, given the two structures:
and
, the corresponding identifiers are
and
.
The optimade-maker package can access either a live AiiDA database or an exported
archive file. The user should specify an AiiDA group containing the subset of the structures to be exposed through the OPTIMADE API. AiiDA UUIDs (universally unique identifiers) are used for the OPTIMADE structure identifiers. Additional properties can be associated with each structure by extracting them either directly from the AiiDA structure node (e.g., from the extras) or from other nodes, according to a query encoded in the property definition.
Fig. 4a shows a schematic example of an AiiDA provenance graph representing a density-functional theory (DFT) crystal-structure relaxation followed by an electronic band-structure calculation, using the Quantum ESPRESSO37,38 DFT code. The workflow internally performs the two steps, producing data nodes as outputs. The band-structure node does not directly store the band-gap value; instead, this value is computed in an additional postprocessing step, which returns a dictionary (
) node. To use optimade-maker to create an OPTIMADE database and API for the relaxed structures in this AiiDA database, and also associate the calculated band-gap values as properties of each structure, the user can prepare the configuration file shown in Fig. 4b. Structures are selected from the AiiDA group
. The property definition for the band gap includes an
section, which defines a query for the relevant property node relative to the structure node, using the standard filtering and projection keywords provided by the AiiDA
.
Finally, the functionality supporting AiiDA databases is lazily loaded and is not a mandatory dependency, so that optimade-maker does not require an AiiDA installation for its basic functionality.
optimade-maker and the automated Materials Cloud Archive OPTIMADE service mark an important step toward a fully FAIR (Findable, Accessible, Interoperable, and Reusable) ecosystem for materials science. By automatically exposing raw datasets through a standardized API and registering them in the OPTIMADE provider registry, datasets become immediately discoverable and interoperable across the OPTIMADE ecosystem. This enables seamless access through a wide range of clients and tools while placing no additional burden on data contributors, lowering the barrier to FAIR data publication and reuse.
Future developments in the OPTIMADE specification could further enhance this framework. Recent additions, including new entry types such as
and
and an updated property definition format with controlled vocabularies and richer array typing,18 could be incorporated into optimade-maker. Support for externally defined community properties may also enable richer queries over existing datasets; for example, the
namespace39 could enable molecular substructure searches via SMILES annotations.
More broadly, the approach could extend beyond structural data. By leveraging the federated registry of machine-actionable parsers developed in the datatractor initiative,40 similar API-based access could be provided for experimental datasets while reusing many of the components of optimade-maker.
Footnotes |
| † These authors contributed equally. |
| ‡ Present address: Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, CB2 1 EW, United Kingdom. |
| This journal is © The Royal Society of Chemistry 2026 |