Alex M.
Ganose
*a,
Hrushikesh
Sahasrabuddhe
bc,
Mark
Asta
bd,
Kevin
Beck
e,
Tathagata
Biswas
f,
Alexander
Bonkowski
g,
Joana
Bustamante
h,
Xin
Chen
bc,
Yuan
Chiang
bd,
Daryl C.
Chrzan
bd,
Jacob
Clary
i,
Orion A.
Cohen
dq,
Christina
Ertural
h,
Max C.
Gallant
bd,
Janine
George
hk,
Sophie
Gerits
l,
Rhys E. A.
Goodall
m,
Rishabh D.
Guha
d,
Geoffroy
Hautier
n,
Matthew
Horton
o,
T. J.
Inizan
dqaf,
Aaron D.
Kaplan
d,
Ryan S.
Kingsbury
p,
Matthew C.
Kuner
bd,
Bryant
Li
b,
Xavier
Linn
q,
Matthew J.
McDermott
d,
Rohith Srinivaas
Mohanakrishnan
bd,
Aakash N.
Naik
hk,
Jeffrey B.
Neaton
drs,
Shehan M.
Parmar
ae,
Kristin A.
Persson
bd,
Guido
Petretto
ft,
Thomas A. R.
Purcell
uv,
Francesco
Ricci
dft,
Benjamin
Rich
w,
Janosh
Riebesell
dmx,
Gian-Marco
Rignanese
fty,
Andrew S.
Rosen
z,
Matthias
Scheffler
v,
Jonathan
Schmidt
j,
Jimmy-Xuan
Shen
aa,
Andrei
Sobolev
vab,
Ravishankar
Sundararaman
ac,
Cooper
Tezak
l,
Victor
Trinquet
f,
Joel B.
Varley
aa,
Derek
Vigil-Fowler
i,
Duo
Wang
c,
David
Waroquiers
ft,
Mingjian
Wen
ad,
Han
Yang
o,
Hui
Zheng
d,
Jiongzhi
Zheng
n,
Zhuoying
Zhu
c and
Anubhav
Jain
*c
aDepartment of Chemistry, Imperial College London, London W12 0BZ, UK. E-mail: a.ganose@imperial.ac.uk
bDepartment of Materials Science and Engineering, University of California, Berkeley, California 94720, USA
cEnergy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA. E-mail: ajain@lbl.gov
dMaterials Science Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
eProgram in Applied Mathematics, University of Arizona, 617 N. Santa Rita, Tucson, Arizona 85721, USA
fUCLouvain, Institut de la Matière Condensée et des Nanosciences, Université catholique de Louvain, Chemin des Étoiles 8, Louvain-la-Neuve 1348, Belgium
gInstitute of Physical Chemistry, RWTH Aachen University, 52074 Aachen, Germany
hDepartment Materials Chemistry, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
iNational Renewable Energy Laboratory, Golden, Colorado 80401, USA
jDepartment of Materials, ETH Zürich, Zürich, CH-8093, Switzerland
kInstitute of Condensed Matter Theory and Solid-State Optics, Friedrich Schiller University Jena, Jena, Germany
lDepartment of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80302, USA
mRadical AI, Inc., New York City, NY, USA
nThayer School of Engineering, Dartmouth College, Hanover, NH 03755, USA
oMicrosoft Research AI for Science, USA
pDepartment of Civil and Environmental Engineering and the Andlinger Center for Energy and the Environment, Princeton University, Princeton, NJ, USA
qCollege of Chemistry, University of California, Berkeley, CA 94720, USA
rDepartment of Physics, University of California, Berkeley, CA 94720, USA
sKavli Energy NanoSciences Institute at Berkeley, Berkeley, CA, USA
tMatgenix SRL, A6K Advanced Engineering Centre, Square des Martyrs 1, 6000 Charleroi, Belgium
uDepartment of Chemistry and Biochemistry, University of Arizona, 1306 East University Boulevard, Tucson, Arizona 85721, USA
vThe NOMAD Laboratory, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4–6, D-14195 Berlin, Germany
wDepartment of Chemistry, University of Colorado Boulder, Boulder, Colorado 80302, USA
xCavendish Laboratory, University of Cambridge, J. J. Thomson Ave, Cambridge, UK
yWEL Research Institute, Avenue Pasteur 6, 1300 Wavre, Belgium
zDepartment of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
aaLawrence Livermore National Laboratory, Livermore, CA 94551, USA
abMolecular Simulations from First Principles – MS1P e.V., Clayallee 167, 14195 Berlin, Germany
acDepartment of Materials Science of Engineering, Rensselaer Polytechnic Institute, Troy, New York, USA
adInstitute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
aeSchool of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
afBakar Institute of Digital Materials for the Planet, University of California, Berkeley, CA 94720, USA
First published on 1st July 2025
High-throughput density functional theory (DFT) calculations have become a vital element of computational materials science, enabling materials screening, property database generation, and training of “universal” machine learning models. While several software frameworks have emerged to support these computational efforts, new developments such as machine learned force fields have increased demands for more flexible and programmable workflow solutions. This manuscript introduces atomate2, a comprehensive evolution of our original atomate framework, designed to address existing limitations in computational materials research infrastructure. Key features include the support for multiple electronic structure packages and interoperability between them, along with generalizable workflows that can be written in an abstract form irrespective of the DFT package or machine learning force field used within them. Our hope is that atomate2's improved usability and extensibility can reduce technical barriers for high-throughput research workflows and facilitate the rapid adoption of emerging methods in computational material science.
Deploying HT calculations in a generic manner requires a robust software infrastructure. In response to this need, a variety of software frameworks, including AFLOW,1 AiiDA,2–5 Atomic Simulation Environment (ASE6), pyiron,7 qmpy,8 and our previously developed atomate,9 have been developed. Such frameworks have not only made it possible to run DFT calculations at an unprecedented scale, but have also as a side effect made such calculations much more accessible to a larger audience. This is because full automation necessitates the development of automatic parameter decisions, automatic error detection and recovery, and automated execution on heterogeneous computing resources. Such advancements have ultimately resulted in more user-friendly programming interfaces to complex materials calculation procedures.
In this manuscript, we introduce atomate2, an evolution of our earlier work with atomate. atomate2 is designed to enhance the programmability of computational workflows, offer greater flexibility with respect to different simulation models (including those based on MLIPs), support various workflow execution engines, and accommodate a broader spectrum of materials properties with less re-coding. atomate2 represents a comprehensive overhaul of atomate, building on its predecessor's successful application in numerous materials design projects and its integral role in the Materials Project (MP)10 database. In the following sections we detail the enhancements and capabilities of atomate2, emphasizing its improved usability and flexibility, which we anticipate will significantly benefit the next wave of HT DFT calculations.
There exists a wide range of DFT packages, each with their own strengths and set of unique features. The atomate package was centered around the use of VASP for periodic systems and Q-Chem15 for molecular systems. In atomate2, we have expanded support to a wider array of computational methods including FHI-aims,16 ABINIT,17–20 and CP2K,21,22 in addition to many state-of-the-art machine learning interatomic potentials (MLIPs). Throughout this paper these methods and codes are termed Calculators. A key challenge is to enable heterogeneous workflows where different parts of a workflow are performed using different computational methods. Such workflows are necessary to take advantage of the range of features implemented in different DFT packages. For example, hybrid DFT calculations in CP2K can be significantly accelerated by the auxiliary density matrix method (ADMM), but this implementation is currently limited to the use of a single k-point in reciprocal space. atomate2 enables chaining an initial fast hybrid relaxation using CP2K with a slower second relaxation using VASP with denser k-point sampling for improved accuracy. Together this simulation procedure can significantly accelerate the computation of complex structures and is a key feature of the heterogeneous defect calculation workflow in atomate2. Achieving interoperability between multiple DFT packages and MLIPs is facilitated by the standardization of workflow inputs and outputs through use of a common application programming interface (API).
Together, standardization and interoperability enable composable workflows. This is a unique feature of atomate2 whereby the substituent parts of a workflow can be seamlessly substituted without impacting the overall workflow execution. This has been facilitated through the use of the jobflow23 workflow engine explicitly designed to support “nested” workflows. One example of composability is given by generalizable workflows. For example, the calculation of elastic constants requires obtaining the energy and stress of a series of strained cells before the results are compiled and elastic properties extracted. In atomate2, the elastic constant workflow is defined in an abstract form, where the various parts of the workflow are linked together independent of the computational method used to obtain energies and stresses. The implementation of the workflow for a specific Calculator is as simple as defining the method for a static calculation using that Calculator. The rest of the workflow remains unchanged. Another aspect of composability is the ability to modify workflows in non-trivial ways. For example, the default workflow for point defect in atomate2 is designed to use a single calculation to relax the defect geometry. This calculation can easily be replaced by a sub-workflow that first runs CP2K and then runs VASP as described in the previous paragraph. Again, the workflow definition remains unchanged and is agnostic to the specific sequence of steps, provided the final calculation yields a relaxed structure and the associated energy.
Another aspect of composability is defined by workflow optimization. For example, the FHI-AIMS calculator facilitates the creation of automatic convergence workflows, atomate2 contains a code-agnostic job that performs a series of consecutive code runs with changing inputs, until the absolute difference between the selected result values in two subsequent runs becomes smaller than a predefined value. This job has been used to achieve the k-point convergence of energy in static point calculations, as well as the band gap value convergence within the GW framework with respect to the number of frequency points, basis set size, and k-point grid used for the self-energy calculation.
Beyond these considerations, atomate2 aims to address several challenges faced by users of the original atomate code. Workflows are written using the jobflow library rather than the FireWorks code, which provides a streamlined experience for complex workflows with modular components. atomate2 broadens the range of databases available for storing large files and objects, including MongoDB (primary document store), Amazon S3, and Azure Blobs, along with simple configuration options for selecting the storage location of specific objects. A high-level summary of the differences between atomate2 and the original atomate is provided in Table 1.
Code | Workflow engine | Workflow executor | Database | Dynamic workflows | Generalisable workflows | Calculators |
---|---|---|---|---|---|---|
a A full list of calculators supported by atomate2 is provided in Table 2. | ||||||
atomate | FireWorks | FireWorks | MongoDB | ✓ | ✗ | VASP, Q-Chem |
atomate2 | jobflow | jobflow, FireWorks, jobflow-remote | MongoDB, Amazon S3, Azure Blob, File Storage | ✓ | ✓ | Manya |
The computational efficiency and scalability of atomate2 are comparable to those of the original atomate and other modern high-throughput frameworks. In atomate2, as in the original atomate, the heavy-lifting is done by the underlying electronic structure codes (e.g., VASP), so the workflow layer adds minimal runtime overhead. Like other frameworks (e.g., AiiDA or pyiron), atomate2 can fully leverage large-scale computing resources through its workflow execution framework (FireWorks), so it is capable of orchestrating thousands of calculations in parallel with similar efficiency and scalability.
It must be noted that atomate2 differs from ASE in many ways. The writing of input files and parsing of output files is handled primarily by pymatgen (not ASE) in atomate2, specifically for VASP, FHI-aims, ABINIT, CP2K, and LOBSTER. The “orchestration” of a single electronic structure task is executed using bespoke code in atomate2, and involves writing input files, running the external code, and then parsing the results of the calculation into a remote database. For VASP and Q-Chem, additional calls to the custodian python package allow for handling issues that may arise during the calculation. Thus, atomate2 is a framework for these various pieces which can be used to define complex computational workflows in high throughput.
By contrast, ASE is currently used as a backend only for tight-binding Hamiltonian and machine learning forcefield calculations in atomate2. However, its optimization classes are extensible to any ASE calculator, allowing for a high degree of user customization. Again, the actual task orchestration, which involves parsing calculation results into structured, JSON-format output and inserting them into a remote database, is handled by atomate2.
A clear example of this distinction is the MPMorph workflow, which can optionally use a machine learning forcefield to drive its adaptive equation of state fitting of a non-crystalline material. If such a forcefield is used, then ASE is called to perform NVT molecular dynamics at a given volume. However, the determination of which volumes to use, how the range of fitted volumes should be adjusted dynamically, and the parsing and fitting of molecular dynamics runs is all handled by atomate2.
A high-level flowchart that unifies the overall atomate2 architecture into a single diagram can be found in Fig. 1. This flowchart maps the end-to-end data flow. The user structural inputs (via pymatgen/Materials Project) feed into the jobflow “Workflow” layer, which decomposes the task into individual jobs dispatched through FireWorks (locally or on supercomputers) with custodian handling error checking and recovery. Upon job completion, raw outputs are parsed by pymatgen and emmet and optionally post-processed by tools such as phonopy, Pheasy, hiPhive, or AMSET before being archived in the user-specified backend (MongoDB, Amazon S3, Azure Blob, JSON, etc.). By making the full process explicit, this diagram provides context for each of the detailed workflow diagrams presented later in the paper.
To take advantage of the rich library of atomistic simulations supported by ASE, atomate2 implements a generic AseMaker class in atomate2 which allows users to define ASE-dependent jobs via a Calculator attribute and a run_ase method. This Maker supports both periodic and non-periodic structures as input. The Calculator attribute can be any ASE-compliant Calculator. The run_ase method defines what operations are performed on the input atomic configuration, for example, structural or molecular relaxations via the AseRelaxMaker class, or MD via the AseMDMaker class. As these classes are easily adapted to a given use case, no workflows are currently implemented in the ASE library.
Outputs from ASE are stored in structured documents: the AseStructureTaskDoc for periodic systems and the AseMoleculeTaskDoc for non-periodic systems. Both document classes inherit from existing document schemas in emmet-core. By default, trajectories (more generally, data for each ionic step) are stored in the user's “large-object” database (such as MongoDB's GridFS) if established.
Currently, the FHI-aims interface to atomate2 can perform both single point and geometry optimization calculations, as well as more complicated workflows. It is also integrated into the phonon, elastic constants, equation of state, magnetic ordering (via Mulliken analysis), anharmonicity quantification, and MD workflows of atomate2. All of these provides a template for integrating FHI-aims into other common workflows such as the anharmonic and quasiharmonic phonons. For applications where symmetry is important, this can be activated by using the rlsy_refine_structure keyword in FHI-aims. However, this should not be used when performing calculations on displaced geometries.
All keywords needed to run the calculations are passed to atomate2 through the user_parameters argument and kpt_settings, which are python dictionaries. The default relaxation method used is the trust radius method (TRM) with a maximum allowed force of 1 meV Å−1. When running multiple related (same material or molecule) calculations, the atomate2 interface allows for both parallel and serial execution of this job via the run_aims and run_aims_socket functions, respectively. The advantage of using the run_aims_socket function is that it uses the i-PI interface29 in FHI-aims and the ASE SocketIOCalculator to initialize the electron density to the converged value from the previous calculation when possible, reducing the total number of SCF iterations by a factor of two.
FHI-aims can run the GW calculation in a single shot, without having to restart the calculation after completing an SCF cycle. Such a run will consist of an SCF part, during which the ground-state electronic density is obtained, and a post-SCF part when the GW self-energy is evaluated. However, the two parts can also be separated using FHI-aims restart capabilities. The GW workflow for FHI-aims, implemented in the atomate2 package, dumps the resulting SCF eigenfunctions and reads them at the beginning of the GW run. It helps in several ways by: (1) making calculations more computationally efficient, (2) achieving consistency in the results, and (3) allowing more flexible exploration of the parameters space.
At present, standard DFT tasks, that is, structural relaxation (atoms and/or cells), SCF- and NSCF-calculation (uniform or bandstructure) as well as many-body perturbation theory (MBPT) calculations such as quasiparticle energies within the GW approximation and dielectric function calculation by solving the Bethe–Salpeter equation, are interfaced within atomate2. The plan is to port the previously developed abiflows package, which was among others used to calculate 1521 semiconductors in the harmonic approximation in collaboration with the MP,30 to atomate2. In this regard, the abiflows DFPT workflow for calculating the static second-harmonic generation tensor31 (and the static dielectric tensor) has been implemented in atomate2 and is under review.
The global machinery heavily relies on functionalities provided by abipy such as the automatic input generation and outputs processing.20 Following the philosophy of atomate2, each Maker or calculation type (inheriting from BaseAbinitMaker) has its own AbinitInputGenerator, which in turn calls a specific abipy factory function to generate the proper AbinitInput. Once a job is completed, the parsing capabilities of abipy are fully leveraged to retrieve relevant outputs. Indeed, abipy provides a specific parser class for each file, whether text or netcdf. The available methods of those parsers allow to construct an AbinitTaskDoc following the same schema as for the other codes with common basic fields such as output.energy, output.bandgap or output.forces. In addition, it is possible to directly store relevant files such as the DDB or netcdf ones into a FileStore partition of the interacting MongoDB. They can then be retrieved at will for further manipulation with abipy such as automatic plots generation of bandstructure, density of state, or spectra. When possible, basic figures are already saved to allow a quick inspection. By default, the ABINIT workflows will look for pseudopotentials from the Pseudodojo32 in the default folder (/.abinit/pseudos). It is thus necessary to download them using the abipy abips.py command. The –help option lists the valid subcommands such as avail, list, and install. Although difficult, it is possible to use custom pseudopotentials. Active developments are focusing on improving this aspect.
JDFTx uses the GC-SCF and AuxH electronic algorithms, which outperform outer-loop grand canonical electronic algorithms found in other codes.35 Advanced solvation models are available including the non-linear non-local SaLSA implicit solvent model as well as CANDLE, a linear implicit solvent model with asymmetric charge response.36,37 JDFTx can also be integrated directly into excited state calculations in BerkeleyGW,38 although atomate2 support for GW workflows with JDFTx is not expected soon. JDFTx output and input files are parsed with code in pymatgen.io.jdftx. JDFTx log files, eigenvalue and bandProjection files are currently supported by the parsers.
Moreover, fully MLIP-based workflows have been implemented as well. Specifically, one can use any of the supported MLIPs to calculate the elastic constant tensor or harmonic phonons of a material as recently demonstrated with MACE-MP-0. Other applications of the force field Calculators and workflows include ref. 49. Substituting DFT-based Calculators with MLIPs allows faster and cheaper runs, and makes atomate2 an ideal tool for easily reproducible benchmarking against DFT calculations. More details on the respective implementations can be found in the corresponding workflow sections below. Additionally, MLIP molecular dynamics (MLMD) calculations have been incorporated for the micro-, grand-, and canonical ensembles, with more complex workflows using MLMD to, e.g., rapidly equilibrate amorphous structures. Virtually all workflows which do not require electronic properties can be adapted to MLIPs, such as quasi-/harmonic phonon calculations and equation of state properties.
VASP is the main code used by the Materials Project to generate structural, electronic, and thermodynamic materials data, and thus has a wide breadth of workflow coverage. Within atomate2, VASP-based tasks and workflows include: geometry optimization, single-point calculations, AIMD, equation of state, band structure scans, phonon dispersion, amorphous solid equilibration, etc. Transition state workflows based on nudged elastic band (NEB)24 and ApproxNEB51 are currently being added for VASP.
The VASP Calculators in atomate2 rely on pymatgen52 to define input sets (minimally, the INCAR, POSCAR, and POTCAR files) which are defined in the pymatgen.io.vasp.sets library. The output of a VASP calculation is parsed by emmet53 into its TaskDoc schema. This schema is sufficiently flexible to incorporate key electronic structure information from non-dynamical DFT, AIMD, and MBPT calculations. By default, jobs are run with the custodian package54 to monitor for VASP and computational resource errors and possibly correct these on the fly.
In atomate2, VASP input files are represented as JSONable objects via the VaspInputGenerator class. This class lightly wraps pymatgen's VaspInputSet class with appropriate defaults set for high-throughput calculations.52 These sets essentially determine which kind of calculation is run, for example: geometry optimization, static single-point energy calculation, band structure calculation, or AIMD. A single VASP calculation is represented as a jobflow Maker object, which can then be chained together to form workflows (jobflow Flow objects). At present, nearly all legacy atomate VASP jobs and workflows have been ported to atomate2 and many new workflows have been added.
The atomate2 Q-Chem integration supports several fundamental tasks, including geometry optimization, single-point energy calculations, and frequency analysis. More sophisticated tasks, like potential energy surface (PES) scans and transition state optimizations, are also available. The interface with pymatgen facilitates these tasks through the InputGenerator and InputSet architecture. This design allows users to encapsulate all calculation settings into a QCInputGenerator class, which, when provided with a molecule from the pymatgen library, produces a complete set of Q-Chem inputs specific to that molecule.
The infrastructure is highly customizable, making it amenable for advanced users to implement new jobs and workflows. The Maker class in the jobflow library forms the backbone for constructing and managing these computational jobs. A key component, the BaseQCMaker, utilizes the QCInputGenerator to yield a QCInputSet from a given pymatgen molecule while supporting additional parameters for job execution, error-handling, and result documentation. Q-Chem calculators within atomate2 automatically archive inputs and outputs using a structured schema known as a Task Document, defined in the emmet.core.qc_tasks TaskDoc class. This schema ensures standardized data processing by storing specific results (e.g., final energy) in predefined attributes (TaskDoc.output.final_energy). The TaskDoc is easily serialized for integration into a results database (e.g. MongoDB) or storage as a local JSON file, ready for automated handling by tools such as the Builder classes in Emmet.
A list of all the supported workflows are listed in Table 3. Each workflow covers the methodology, usage notes, and any existing use cases/papers using the workflow, and has a workflow diagram covering all the steps. A template workflow diagram, providing a legend to enable their interpretation is shown in Fig. 2.
Workflow | System | Supported calculators | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
ASE | FHI-AIMS | ABINIT | CP2K | JDFTx | MLIPs | VASP | OpenMM | Q-Chem | ||
Geometry optimization | Both | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Static | Both | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Dielectric | Periodic | — | — | — | — | — | — | ✓ | — | — |
Polarization | Periodic | — | — | — | — | — | — | ✓ | — | — |
Electronic band structure | Periodic | — | ✓ | ✓ | ✓ | — | — | ✓ | — | — |
Bond analysis workflow with LOBSTER | Periodic | — | — | △ | — | — | — | ✓ | — | — |
Excited states (GW-BSE) | Periodic | △ | ✓ | ✓ | ✓ | — | — | ✓ | — | — |
Ab initio molecular dynamics | Periodic | ✓ | △ | — | — | — | — | ✓ | — | — |
Force field molecular dynamics | Periodic | ✓ | ✓ | — | — | — | ✓ | ✓ | — | — |
Elastic constants | Periodic | △ | ✓ | △ | △ | — | ✓ | ✓ | — | — |
Harmonic phonons | Periodic | △ | ✓ | △ | △ | — | ✓ | ✓ | — | — |
Equation of state | Periodic | △ | ✓ | △ | △ | — | ✓ | ✓ | — | — |
Electron–phonon | Periodic | — | — | — | — | — | — | ✓ | — | — |
Grüneisen | Periodic | — | △ | — | — | — | ✓ | ✓ | — | — |
Matpes | Periodic | — | — | — | — | — | — | ✓ | — | — |
Quasiharmonic approximation for phonons | Periodic | — | △ | — | — | — | ✓ | ✓ | — | — |
Anharmonic phonons | Periodic | △ | ○ | △ | △ | — | ✓ | ✓ | — | — |
MPMorph | Periodic | △ | △ | △ | △ | — | ✓ | ✓ | — | — |
Magnetic ordering | Periodic | △ | ✓ | △ | △ | — | — | ✓ | — | — |
Adsorption | Periodic | △ | △ | △ | △ | ○ | △ | ✓ | — | — |
Point defect | Periodic | △ | ○ | — | — | — | — | ✓ | — | — |
Anharmonicity quantification | Periodic | △ | ✓ | △ | △ | — | △ | △ | — | — |
Electrode discovery | Periodic | △ | — | — | — | — | — | ✓ | — | — |
Ferroelectric | Periodic | △ | — | — | — | — | — | ✓ | — | — |
Materials project | Periodic | — | — | — | — | — | — | ✓ | — | — |
Amset | Periodic | — | — | — | — | — | — | ✓ | — | — |
Frequency flatterning optimizer workflow | Molecular | — | — | — | — | — | — | — | — | ✓ |
Classical molecular dynamics workflow | Molecular | ✓ | — | — | — | — | — | — | ✓ | — |
The workflow (Fig. 5) involves the following steps: (1) structural optimization, (2) calculating the number of bands based on available projection basis functions, (3) writing static calculation inputs with the number of bands set equal to as evaluated in step 2, (4) performing a static self-consistent, plane-wave based DFT calculation with symmetry, (5) performing a static non-self-consistent plane-wave based DFT run with symmetry switched off to compute the wave function, (6) generating a set of LOBSTER computations based on different combinations of available atomic orbital basis functions for projection of the wavefunctions, (7) running LOBSTER computations and analyzing outputs automatically with LobsterPy for each of the LOBSTER runs, (8) deleting the wavefunction files from the static calculation and LOBSTER runs directories.
This workflow originates from a previously implemented workflow in atomate.71 The latter was used to produce a database for about 1500 semiconductors and insulators.72 The key methodological difference in the previous implementation and workflow in atomate2 is that the wave function is now computed in a two-step procedure including a self-consistent DFT run with symmetry and a non-self consistent DFT run without symmetry to speed up the computation. In addition, now an analysis of the outputs via the LobsterPy71,73 package is performed. Important to less experienced users of HT software, the atomate2 framework enables efficient workflow execution on one computing node with a simple submission script and only a minimal setup. Only the installation of atomate2 and the configuration of the run commands for VASP and LOBSTER are required. This significantly lowers the barrier to workflow usage in contrast to atomate. Structural optimization and static DFT runs are performed in this workflow using VASP. The automatic analysis via LobsterPy is performed for symmetrically inequivalent sites in the structure for cation–anion and all bonds. This analysis involves identifying the most relevant bonds along with coordination environments based on Integrated Crystal Orbital Hamilton Populations (ICOHPs), numerical evaluation of bonding-antibonding contributions, corresponding Crystal Orbital Hamilton Populations (COHP) plots, a JSON summary, and text description of the bonding analysis. More details about our automatic bonding analysis implementation can be found in the publications associated with LobsterPy71,73 and its tutorials.
The GW and BSE workflows implemented in atomate2 are built for automating these multistep and interdependent calculations. For example, the calculation of quasiparticle energies using Abinit involves a four-step calculation. First, one performs a standard DFT (SCF) calculation to obtain self-consistent charge density which is then used to perform an exact diagonalization calculation (NSCF) to generate a large number of unoccupied bands required for the actual GW calculation. Once these bands are generated the inverse dielectric matrix is computed (SCR) and used to obtain the quasiparticle corrections to the DFT eigenvalues (SIGMA). Similarly, the BSE calculation involves obtaining the inverse dielectric matrix and quasiparticle corrections to compute the frequency-dependent macroscopic dielectric function (ε(ω)). Workflows such as G0W0Maker and BSEFlowMaker perform such standard calculations with a given crystal structure and input parameter set. In addition, multiple workflows have been developed to check the convergence of desired output results with any particular input parameter. For example, GWConvergenceMaker implements the convergence test of the calculated quasiparticle gap with respect to parameters such as the number of unoccupied bands or the number of plane waves. The BSEConvergenceMaker checks the convergence of the frequency-dependent dielectric function calculated using BSE with the number of k-points. Due to the enormous computational cost of these calculations, a BSEmdfMaker has been developed that performs the BSE calculation with a model dielectric function. One can also use a scissor shift to simulate the quasiparticle correction and skip the SCR and SIGMA jobs mentioned earlier.
The implementation of the GW workflow for FHI-aims is simpler, as FHI-aims allows the user to run a SCF calculation and all the post-SCF steps in one run. However, if one wants to study the convergence of the quasiparticle energies on the parameters defining the GW calculation, such as the number of frequency points used to expand the elements of self-energy on the imaginary frequency axis, or the type of its analytical continuation on the real frequency axis,78 they can effectively re-use the results of the SCF part of the calculation by restarting the calculations from the converged charge density. This functionality is implemented in GWMaker for FHI-aims. GWConvergenceMaker, allowing the study of the convergence of GW results with respect to calculation parameters, is also implemented similarly to the ABINIT workflows.
In addition to the GW workflow for FHI-aims, atomate2 also implements the workflow to compute G0W0 quasiparticles with VASP in the MVLGWBandStructureMaker class. It conducts G0W0 calculations compatible with the parameters defined by MVLGWSet in pymatgen. First, a static DFT calculation is performed using the MVLStaticMaker by default, which can be customized with a user-defined static maker. Next, a non-self-consistent calculation is carried out with the MVLNonSCFMaker, starting from the CHGCAR produced in the static calculation. Finally, the MVLGWMaker class builds the dielectric matrix, performs the G0W0 calculations, and obtains the quasiparticle energies.
AIMD workflows are available for VASP via MDMaker, which allows easy selection of common options like the temperature and the ensemble (NVE, NVT, NpT), with suitable default choices for the thermostat and/or barostat. It is also possible to run AIMD using the ASE calculator interfaces to electronic structure codes, such as Q-Chem, SIESTA, Quantum Espresso, VASP, and others. One could use the AseMaker class in atomate2 to communicate with electronic structure codes and use their native AIMD implementations. Alternatively, one could use the AseMDMaker to take energies, forces, and stresses from a single-point electronic structure calculation, and perform AIMD using ensembles defined internally in ASE.
One of the main challenges in MD simulations is the number of ionic steps that should be performed to extract reliable information from the post-processing of the data. A total simulation time of a few ps or a few tens of ps is usually required, with a time-step in the order of the fs. Considering that simulation boxes often include up to hundreds of atoms, this can be particularly challenging for AIMD, where the total simulation time can easily exceed the maximum time per job allowed by computing centers. To address this issue, a multi-step MD workflow (MultiMDMaker) has been implemented. This permits splitting the total simulation time in a customizable number of chunks so that each separate MD calculation can finish within the allotted time. A final job is added to summarize the output and provide references to the different output chunks so that the total trajectory can easily be reconstructed. In addition, the final document can be used as a starting point for a new MultiMDMaker workflow, enabling the user to concatenate multiple such workflows. This workflow is illustrated in Fig. 6.
The same MultiMDMaker workflow can also be used to concatenate trajectories with different thermo- and barostat profiles, a feature that is not currently implemented in VASP. For example, the workflow can define an initial chunk with a ramp-up of the temperature, followed by additional steps at constant temperature.
MLMD is made possible by the MD ensembles (NVE, NVT, and NpT), thermostats, and barostats defined in the ASE python package. It is also possible to use the native MLIP functionality of VASP to perform MD with the previously-mentioned atomate2.vasp.MDMaker class. To make the ASE interface forward-looking, an AseMDMaker template has been defined, which defines the ensemble and various computational parameters for a generic ASE Calculator object. The AseMDMaker supports both periodic and non-periodic systems. To perform MLMD, users can access pre-defined CHGNet, GAP, M3GNet, MACE-MP-0, and Nequip MD workflows, or they can load any force-field ASE Calculator by specifying the package to import. Both temperature and pressure scheduling features have been added for force field MD Makers. If arrays of temperature and pressure are input, the temperature and pressure will be linearly interpolated across simulation steps, allowing users to customize MD simulations with, e.g., temperature ramp, annealing, or cyclic expansion–compression loading. The highly modular nature of the MLIP MD workflows makes them amenable to inclusion in complex workflows, such as the MPMorph workflows80,81 used to simulate quenched amorphous structures. This enables the rapid generation of amorphous structures at a much lower cost than DFT and is being actively explored as an application of MLMD.
Last, classical MD is a popular technique for investigating electrolytes, polymers, proteins, and a wide variety of other systems, particularly when bond breaking and formation is not of interest. This module shown in Fig. 7 includes (1) an extensible MD engine-agnostic schema and setup tools built on the open force field ecosystem and (2) MD workflows for energy minimization, NpT, NVT, and annealing. Together, these allow facile system construction, atom typing, execution, and analysis. Representing the intermediate state of a classical MD simulation is challenging. While the intermediate representation between stages of a periodic DFT simulation can include just the elements, Cartesian coordinates, and box vectors, classical MD systems must also include the force field. This is particularly challenging because all MD engines represent force fields differently. Rather than implement our own representation, the workflow uses the openff.interchange.Interchange object, which catalogs the necessary system properties and interoperates between several MD engines. Alongside this, the workflow tracks convenient metadata not critical to the simulation, like molecule names and partial charge methods. For system setup, a generate_interchange job in atomate2.classical_md.base has been implemented, which processes a simple input dictionary into a task document. Though the task document is designed to be easily used by multiple MD codes, the existing workflows are in OpenMM. OpenMM workflows are built around the BaseOpenMMMaker, which includes shared logic to create a OpenMM.Simulation, attach OpenMM.Reporters, and output a task document. Jobs subclass BaseOpenMMMaker and implement a unique run_openmm method, which evolves the system as needed. Several Makers are implemented: NVTMaker, NPTMaker, TempChangeMaker, and EnergyMinimizationMaker. A common challenge in HT MD workflows is the selection of equilibration timescales a priori.82,83 To this end, the DynamicOpenMMFlowMaker is implemented to enable continuous execution of any BaseOpenMMMaker subclass in discrete stages while monitoring physical observables (e.g., potential energy, density, etc.) until custom convergence criteria are met. Unlike other codes supported by atomate2, OpenMM is run through a python API and has no notion of input files. Instead of building and writing input sets, the workflow implements simulation logic directly in python.
The elastic workflow takes an atomic structure as input and produces the elastic tensor as output. This is accomplished through a series of steps detailed below. First, as two optional steps, the input structure can be further optimized and converted to a conventional cell.85 Using a conventional cell can help reduce numerical errors, particularly for crystals whose primitive cell can be highly skewed, such as monoclinic and triclinic systems. Next, the structure is strained along the six independent strain directions (xx, yy, zz, yz, xz, and xy), with multiple strain magnitudes applied in each direction to deform the structure. To further optimize the process, optionally, the set of strained structures can be reduced by symmetry, which involves checking if a strained structure is equivalent to another using the space group symmetry operations of the original structure. This can significantly reduce the number of structures to be calculated, particularly for high-symmetry structures like cubic systems. Next, a Calculator is employed to compute the stress tensor for each strained structure, and any atomate2 Calculator that supports stress tensors, as mentioned in the Calculators section, can be used for this computation. Finally, the sets of strains and stresses are used to fit the elastic tensor using the strain–stress relationship, as implemented in pymatgen.
The output is a fourth-rank elastic tensor corresponding to the input structure, with different crystal systems possessing different numbers of independent components according to the symmetry in the crystal. Many other isotropic and anisotropic elastic properties can be directly derived from the elastic tensor, such as Young's modulus, shear modulus, bulk modulus, Poisson's ratio, linear compressibility, and sound velocity. Numerical values of these derived properties can be obtained via, e.g. pymatgen, and visual exploration is made possible via packages like elate.86
The present workflow is a direct adaptation of the original elastic workflow in the atomate package. Aside from default settings such as DFT pseudopotentials and energy cutoffs, the main difference is that this workflow includes the option to further optimize the input structure. The original elastic workflow has been widely employed to calculate the elastic tensor of materials.84,87,88 Notably, the elastic tensor data provided in the MP database are computed using this workflow. These data are driving the development of modern machine-learning models for predicting the elastic properties of materials. For instance, derived mechanical properties such as bulk modulus and shear modulus serve as key benchmarking properties in the MatBench suite.61 Furthermore, the data have been utilized to develop equivariant graph neural networks such as MatTen88 for predicting the full elastic tensors of all crystal systems.
Using phonopy89,90 as the underlying framework, the atomate2 implementation of harmonic phonon workflow requires forces that can be computed from DFT calculations (VASP or FHI-aims), but also from (universal) MLIPs (e.g., M3GNet, CHGNet, MACE-MP-0, NEP, NequIP). phonopy handles the creation of supercells with single displacements and subsequent calculation of the dynamical matrix. Based on DFT calculations, the non-analytical term correction91 (NAC) can be included in the workflow to account for polarisation effects on the force constants near Γ in non-metallic systems. This can be combined with both DFT or MLIP Calculators. Unfortunately, current MLIPs model atomic structure only and have no notion of electronic degrees of freedom. As such, they cannot be used to perform the non-analytical term correction at the moment. Additionally, the FHI-aims interface does not currently support these corrections, but will in the near future.
The workflow can be described as follows: in the first (optional) step, the structure is fully optimized with a strict force convergence. This is essential to ensure that the forces from the displaced supercells do not contain any spurious noise from residual forces. Next, the supercell transformation is determined based on the minimum length of each lattice vector. The supercell is generated such that it is as cubic as possible, to ensure that the forces converge better with the supercell size. Supercells with displacements are created by phonopy based on the unit cell symmetry (the number of displacements is determined dynamically) and jobs for the computation of the forces created. In the last step, these forces are used by phonopy to compute the force constants and subsequent band structure and density of state plots. Fig. 8 shows a workflow diagram. In addition to the phonon dispersion, the workflow also outputs the phonon density of states and thermodynamic properties such as the heat capacity and free energy.
Similar workflows have been implemented in other frameworks. One noteworthy example is the implementation in AiiDA, which can utilise force calculations from a variety of DFT codes (VASP, Quantum ESPRESSO,92,93 FHI-aims, etc.). Other implementations are available in the pyiron framework and with the FHI-vibes package.94 All of these implementations rely on phonopy for the calculation of the dynamical matrix.
Our implementation of the harmonic phonon workflow has been used in recent studies on MLIPs.39,95 Instead of a DFT code, a universal MLIP was employed to calculate the forces for the displaced supercells. The studies serve as a benchmark for the accuracy of MLIPs and show that the workflow can create phonon dispersions from any force Calculator implemented in atomate2. The workflow has been extended to run at different cell volumes so that the thermal expansion and the Grüneisen parameter can be calculated (see below).
To generate an EOS, one performs a set of fixed-volume relaxations of a crystal at different volumes. By relaxing the atomic positions within a cell of fixed volume, one is typically better able to fit the resultant energies to a model EOS. The base implementation in atomate2 is abstract: the CommonEosMaker class defines only a workflow for computing a set of energies for an input crystal at different volumes (by default, six volumes within a range of ±5% of the input structure volume). Optionally, the user can relax the structure closer to its equilibrium volume before performing the EOS-specific calculations. The workflow is illustrated in Fig. 9.
Concrete implementations of the EOS workflow exist for VASP, including MP-compliant parameters, FHI-AIMS, and MLIPs. In the MLIP implementation ForceFieldEosMaker, a convenience method called from_force_field_name allows one to generate an EOS workflow solely from the name of an atomate2-supported MLIP and input structure. By default, the EOS data is then fit to a handful of theoretical models from authors including Murnaghan,98 Birch,99 Poirier and Tarantola,100 or Vinet and coworkers.101 The extrapolated EOS parameters (minimum energy, equilibrium volume, bulk modulus, etc.) are stored in a dictionary for each model EOS alongside the original energy and volume data for later analysis.
To obtain the free energies, we sum the total DFT energy E0(V) to the vibrational part of the free energy Fvib(V; T) at different volumes, V, according to
F(V, T) = E0(V) + Fvib(V; T). | (1) |
The current implementation does not consider contributions to the free energy beyond harmonic vibrations and therefore might not be suitable for metals or alloys where electronic or configurational entropy can be non-negligible. Our goal is to incorporate these impacts in future versions. The Gibbs free energy as a function of pressure p and temperature T is obtained as
![]() | (2) |
The MGP workflow begins with an initial structural relaxation, followed by two additional relaxations conducted at slightly expanded and slightly compressed volumes (Fig. 11). Subsequently, phonon computations are performed for all three structures using phonopy. With the phonon frequencies of the three structures at hand, the mode Grüneisen parameter γqν at the wave vector q and band ν is defined as
![]() | (3) |
![]() | (4) |
![]() | (5) |
The workflow begins with a structural relaxation using tight convergence criteria to eliminate the presence of imaginary phonon modes (Fig. 12). A large supercell (>15 Å) is constructed to provide sufficient convergence of the renormalised properties. Next, the phonon frequencies and eigenvectors are obtained using DFPT. Subsequently, the ZG special displacement approach is employed to construct displaced supercells that yield accurate thermal averages of the mean squared atomic displacements.104 We note, an alternative approach is to employ Monte-Carlo sampling of displacements,105 however the ZG method enables convergence of properties in a one-shot approach. The displaced supercells are constructed using the implementation available in VASP, with one supercell generated for each temperature of interest. For each structure, a uniform band structure calculation is conducted, comprising a static calculation followed by a uniform non-self-consistent field (NSCF) calculation. Meanwhile, a corresponding band structure calculation is performed on the equilibrium supercell to serve as a reference for calculating the renormalized band gap. Finally, the renormalized band gap at each temperate is obtained by comparing the band gap of the equilibrium structure with the averaged band gap calculated from the temperature-dependent displaced structures.
Second-order IFCs, which define phonons, can be calculated through DFPT, finite-displacements, or the random-displacement method. These calculations allow for the derivation of macroscopic thermal properties at the harmonic level (see Section 4.1.8). Anharmonic IFCs, which are crucial for properties like thermal expansion and lattice thermal conductivity, are more difficult to compute due to combinatorial explosion of terms. Even at the third-order level, high compute efficiency is necessary to achieve wide-scale deployment to small and large systems. Recent advancements in sampling IFCs from high-information-density configurations have made the calculation of anharmonic IFCs more feasible. Tools such as CSLD,106,107 ALAMODE,108 hiPhive109,110 and Pheasy111 enable the fitting of IFCs to any desired order with few training samples and have paved the way for HT computing of thermal properties.
Our anharmonic phonon workflow112 automatically calculates interatomic force constants up to 4th order from perturbed training supercells, and uses them to obtain lattice thermal conductivity, coefficient of thermal expansion, and vibrational free energy and entropy. The workflow starts with a primitive structure and adjustable parameters such as the force field Calculator, hiPhive/Pheasy fitting options, and temperatures of interest (Fig. 13). The optimum supercell size is obtained following the same process as the harmonic phonon workflow. A series of random perturbations are performed and the energies and forces obtained using a static calculation. This dataset is subsequently used for fitting force constants using hiPhive or Pheasy. Harmonic phonon properties are calculated using phonopy, while lattice thermal conductivity is obtained using FourPhonon113 and phono3py.89,114 There is also the option to renormalization the phonon band structure using techniques from thermodynamic integration. The workflow is dynamic: for example, if the fitting RMSE exceeds a certain threshold, the workflow will automatically add a new displacement calculation to increase the training set size, ensuring the accuracy and reliability of the results.
The atomate counterpart of the same workflow has been utilized for calculating lattice dynamical properties from first principles, as detailed in ref. 112. This paper demonstrates the application of the workflow in calculating interatomic force constants, lattice thermal conductivity, thermal expansion, and vibrational free energies. Deployment of either workflow at a large scale would facilitate materials discovery efforts towards functionalities including thermoelectrics, contact materials, ferroelectrics, aerospace components, as well as general phase diagram construction.
The MPMorph workflow80,81 circumvents NpT equilibration by recursively fitting a set of NVT-MD runs to an equation of state (EOS). After performing the minimum number of fixed-volume calculations needed to fit a standard EOS, the code fits an EOS and determines if the extrapolated equilibrium volume V0 lies within the range of volumes already computed. If V0 lies outside this range, the workflow rescales the volume to V0 and repeats the fitting and analysis steps until V0 is in range. As a fail-safe, the workflow terminates if an in-range V0 cannot be determined after a set number of steps. A final NVT “production run” is then performed at volume V0 with a quench temperature schedule to move the atoms into their lower-temperature disordered configuration. A few different options for the quench temperature schedule are available: a “slow” quench, which ramps down the temperature in NVT, and a “fast” quench, which performs a T = 0 DFT relaxation of the structure.
The current MPMorph workflows, visualized in Fig. 14, have been generalized to a code-agnostic framework that only requires the user to define MD jobs for the various stages of the run (initial equilibration, and production). Code-specific implementations for VASP and MLIPs are currently available.
Collinear magnetic spin configurations (i.e., up and down spins) can be modeled through conventional DFT codes, such as VASP. Previously, Horton et al.117 established a scheme for enumerating the likely collinear magnetic orderings for a given input structure, and, using relaxation and static energy calculations performed with VASP, identifying the ground-state collinear magnetic ordering for a given structure. This methodology was implemented as an atomate workflow and has been adapted for atomate2. Unlike its implementation in atomate, it is now written as a common workflow, allowing it to be easily adapted to other DFT codes.
Fig. 15 shows a schematic for the collinear magnetic ordering workflow, implemented in atomate2 as MagneticOrderingsMaker. The workflow consists of three overarching jobs: (1) magnetic ordering enumeration, (2) relaxation and energy calculations with DFT, and (3) post-processing to determine the ground-state ordering.
Montoya and Persson118 previously established a streamlined workflow for modeling surface adsorption in atomate. The workflow involves constructing distinct adsorbate configurations for arbitrary surface terminations to efficiently handle the extensive DFT calculations. The workflow in atomate2 retains the core structure of the original workflow while integrating jobflow for enhanced automation. This revised workflow supports the automated generation of symmetrically distinct adsorption sites, calculating the enthalpy energies for adsorbed surface configurations and the reference states, as well as returning the optimized structures and their adsorption energies.
The workflow starts from the relaxation of a bulk structure and target molecule (Fig. 16), followed by a static calculation to obtain a reference energy for the molecule. In the second step, the workflow then performs the surface adsorption site searching to generate surface-adsorbate configurations based on the Miller index. The workflow includes default parameters for the thickness of the slab and vacuum, the length and width of the surface, and the surface miller index. For each potential adsorption site, the workflow performs a relaxation followed by a static to obtain the energy. A slab without any adsorbate is generated for the reference state of the surface. Finally, the adsorption energy for surface-adsorbate configurations is calculated by subtracting the enthalpy from the two reference energies of molecules and slab. The outputs of the workflow include the relaxed surface-adsorbate configurations and energies, sorted by ascending adsorption energy. It is possible that during the relaxation calculation, a reaction occurs that decomposes the original molecule structure. At present, the workflow does not include additional analysis to determine whether this has occurred and further validation of adsorption configurations is recommended, especially for complex molecules.
Addressing all of these challenges simultaneously is not feasible given current computational approaches and resources. As such, we have focused on developing a flexible workflow with two requirements in mind:
(1) The workflow must be modular and allow the user to use any combination of structure and electronic optimizer to obtain the ground energy of a charge defect supercell.
(2) Since we cannot guarantee that the ground-state electronic configuration is achieved, we must design some system to aggregate the results of multiple defect calculations.
This allows the user to perform defect simulations in an HT manner to obtain an initial database of defect properties but also allows users to update the atomic and electronic optimization if lower-energy configurations are found. The composable nature of our workflows and the variety of DFT and MLIPs Calculators supported by atomate2 allows us to thoroughly address (1). Addressing (2) is more challenging, since there is no one-to-one correspondence between the isolated defect you are trying to simulate and the defect supercell used to perform the calculation. There are multiple valid choices for the defect supercell, but they all represent the same isolated defect. To address (2), a structure-based defect object defined using only the unit cell of the host material has been developed, providing a supercell-independent representation of the defect.127 This defect object is used as the primary input of the workflow and is also stored alongside each charged-defect supercell calculation to facilitate aggregation of the results from multiple runs of the same defect charge state.
The defect workflow (Fig. 17) requires the users to first define a supercell relaxation workflow which will take an automatically generated defect supercell and a charge state as input and return the relaxed atomic structure and total energy. By default, these supercell cell relaxations will be composed of a less expensive structure optimization step with a PBE functional, followed by a high-quality HSE06 static calculation. We note that care must be taken with this approach, as local minima for more symmetric and charge-delocalized states favored by PBE may not be able to be overcome by HSE06 calculations initialized with such configurations, which also motivates the need for accessible databasing in (2) for enabling extensible potential energy surface sampling. The full defect workflow will take a defect object as an input and automatically generate the initial defect supercell and determine the possible charge states from formal oxidation states of the species involved in the creating the defect. The supercell relaxation & static workflow will be performed for each charge state and a post-processing step will be called to apply the finite-size corrections and populate, tabulate the energies and other metadata for constructing persistent defect databases.
The anharmonicity quantification workflow uses the output of the harmonic phonon workflow (Section 4.1.8) as a starting point to calculate the anharmonicity metric σA first introduced by Knoop and coworkers.128σA estimates the anharmonicity of a material at a temperature, T, by taking the ratio between the root mean square error of the forces calculated by the harmonic model and the standard deviation of the actual forces in a thermodynamic ensemble average, assuming the mean force is zero. It is defined as
![]() | (6) |
The workflow to calculate σA is shown in Fig. 18. The first step is to calculate the harmonic force constants using phonopy and the workflow shown in Fig. 8. From here a set of thermally displaced structures are generated by either by harmonic sampling or via a one-shot approximation.94,128 The one-shot approach approximates the complete thermodynamic ensemble as a single structure, where all atoms are displaced to the classical, harmonic turning points for each vibrational mode.105 All generated structures have the same supercell size as the harmonic phonon workflow for consistent results. Next, the forces for each structure are evaluated using the same methodology as the harmonic phonon maker. The sample generation and force evaluations can also be combined with a single molecular dynamics job, but this has not yet been implemented. The calculated forces and displacements are then used to calculate σA for the full structure using eqn (6). The force components can also be masked onto individual element types, lattice sites, or vibrational modes to generate an element-, site-, or mode-resolved σA vector, respectively.
The electrode workflow (Fig. 19) is composed of a series of repeatable ion-insertion steps that produce the most energetically favorable new structure containing one additional WI. Each ion-insertion step begins with an atomic structure that is topotactically matched to the host lattice or the host lattice itself. The workflow firstly performs a static DFT calculation to obtain the electronic charge density. From the electronic charge density, the workflow identifies symmetry-distinct ion insertion sites and ranks them based on the integrated charge density in a small sphere around the insertion site. For a subset of sites with the least integrated charge density, the workflow performs a DFT structure optimization calculation to obtain the relaxed atomic structure and total energy. Finally, it aggregates these results and filters them to find the lowest energy structure that is topotactically matched to the host lattice, resulting in the input structure for the next ion-insertion step.
The starting structure and the WI species are the only required inputs to the workflow, however, the behavior of the workflow is flexible and can be controlled by additional parameters. The maximum number of insertion steps and the maximum number of distinct sites to consider at each step can be specified by the user. If not specified, the workflow will continue to insert WIs until none of the new structures are topotactically matched to the host lattice. We note that charge balance could also be used to define maximum insertion, however this functionality has not yet been implemented in the workflow. The final output is a voltage profile of the material in question. However, when the workflow is applied to a large number of structures and chemical compositions, this workflow serves as a systematic way to explore lower symmetry configuration spaces. In these cases, the insertion workflow is used to populate a database with a new structure, and relegate the aggregation of topotactically matched structures and the computation of the voltage profile to a separate post-processing step.
The present workflow implements this procedure. The workflow inputs are the polar structure of interest and a nonpolar reference structure that is in the same low-symmetry setting of the polar one. In addition, the atoms in both structures have to be in the same order so that the intermediate structures between the nonpolar and polar endpoints can be generated using a linear interpolation. The workflow begins by calculating the polarization of the polar and nonpolar structures (Fig. 20). This includes an optional relaxation followed by a static calculation before the dipole moment is obtained using the Berry phase approach.132 In the next stage, the polarization of a number of structures linearly-interpolated between the polar and nonpolar structures are obtained following the same process. Finally, the output of all calculations is collected and the polarization is computed by finding the common branch. Other outputs include the electronic and ionic contribution to the polarization and the quantum of polarization. The workflow is in principle general and any ab initio DFT code can be used, however at present only VASP implementation exists. The workflow was originally developed in atomate by Smidt et al.133 and extensively used to build two databases of candidate ferroelectrics.133,134
We note that determining an appropriate nonpolar reference structure for a given polar structure can be obtained by using the PSEUDO tool135 in the Bilbao Crystallographic Server (BCS), as recently done to build the ferroelectrics database in ref. 134. Alternatively, if a nonpolar reference is already known, it can be transformed into the polar setting by using the Structure Relations tool in BCS, as was done to build the ferroelectrics database in ref. 133.
These workflows have been rewritten in atomate2 with minor modifications to improve their robustness. In both cases, the atomate2 workflows consist of two sequential relaxations followed by a static total energy (single-point) calculation. This is due to a quirk of VASP, wherein electronic properties, such as the density of states, are not physically meaningful after a relaxation, as they are averaged over previous ionic configurations, and do not correspond to the final relaxed structure. To both speed the workflow and potentially stabilize complex calculations, the wave function from a given step is used to initialize the subsequent calculation in the workflow. Note that neither atomate flow used this initialization scheme, and that the atomate r2SCAN flow did not perform a final static calculation.
In the PBE/+U workflow, two relaxations with PBE/+U are performed followed by a PBE static. When a material containing Co, Cr, Fe, Mn, Mo, Ni, V, or W and either O or F is studied, PBE+U is automatically used; otherwise PBE without a +U is used.10,138 In the r2SCAN workflow, consistent with ref. 137, a coarser PBEsol relaxation (at a larger force convergence tolerance) is followed by a finer r2SCAN relaxation (at a smaller force convergence tolerance), followed by an r2SCAN static at its self-consistent relaxed geometry. Thus the outputs of the flows are a structure corresponding to an energy, eigenvalue spectrum, and density of states. This information is then used to build material entries within MP, which include also formation enthalpy (relative to a set of elemental reference configurations) and thus convex hull distance.
The MP input sets are also used to define workflows for determining equations of state (EOS), and other flows. Such a workflow also exists in atomate, and was used in ref. 97. However, the computational demands of this workflow are quite high. For compatibility with ref. 97, a set of “legacy” EOS MP-compatible flows exist in atomate2, along with a set of PBE/+U and r2SCAN EOS flows which are more tractable in HT, and have recently been used to generate numerous equations of state to aid in benchmarking computational parameters used by the Materials Project.139
To date, datasets of MatPES calculations have been generated and used to train and/or finetune foundational MLIPs including CHGNet and M3GNet. An initial release of a MatPES-compliant dataset through the Materials Project is forthcoming.
The AMSET workflow in atomate2 automates the calculation of all materials properties required to obtain electronic transport. The main inputs are a structure and the temperature and doping concentrations of interest. The workflow begins with an initial structural relaxation with tight convergence settings to avoid the presence of imaginary modes. The elastic tensor, dielectric tensor, piezoelectric tensor, and band structure are obtained using the workflows described above. Deformation potentials are calculated by applying a series of strains to the unit cell, followed by static calculations, and the comparison of the band energies to a reference unperturbed static. The wavefunction coefficients are extracted from a dense band structure calculation, while an averaged polar optical phonon frequency is obtained from the Γ-point DFPT calculation used to obtain the ionic dielectric constant. As with all BTE implementations, the resulting transport properties are highly sensitive to the density of the k- and q-point sampling used to integrate the matrix elements. The workflow includes automated convergence checking to sequentially increase the interpolated mesh density until transport properties converge. Two versions of the workflow are provided, one based on GGA inputs and another more accurate but more expensive version using HSE06. The main outputs of the workflow include the temperature-dependent electronic mobility, conductivity, Seebeck coefficient and electronic contribution to the thermal conductivity. An early version of the workflow was used to obtain the transport properties of 23000 materials using machine learned materials inputs.143
In the initial stage of the MOF workflow (Fig. 22), a MOF structure is processed using Zeo++.145 The inputs include a crystal structure, a sorbate molecule, and the number of processors for parallelization. Zeo++ extracts various pore characteristics via Monte Carlo sampling (e.g., the pore limiting diameter and accessible pore volume), applicable more generally to crystalline porous materials (e.g., MOF, zeolites). The workflow enables users to specify filtering criteria for which a structure is considered porous or meets a given porosity threshold. For example, a candidate structure might be required to (i) exhibit a pore limiting diameter greater than a predetermined value for the given sorbate molecule and (ii) possess a probe-occupiable accessible volume fraction exceeding a set percentage threshold.
For structures far from equilibrium, those passing the initial Zeo++ screening are then subjected to coarse geometry optimization using methods such as MLIPs and semi-empirical DFT. Following this, the workflow permits the user to reapply Zeo++, using similar or novel criteria, for further filtering. The MOF candidates emerging from these steps are then refined using higher-level quantum mechanical computations, analogous to those employed in the MatPES workflow. Where an initial structural relaxation is conducted with PBE-D4, the converged wavefunctions obtained at this stage are used to initialize an r2SCAN-D4 relaxation. These wavefunctions are then further employed to compute an r2SCAN-D4 single-point calculation at the optimized geometry, yielding well-converged properties such as the phonon and bulk modulus.
Furthermore, an additional module can be plug in using MOFid146 that aims to extract individual building block of the MOF (e.g., organic linkers and metal nodes). This functionality may be employed either for post-analysis or as an additional filtering step based on user-defined rules.
The fundamental improvements introduced in atomate2 have facilitated the expansion of the software's scope to encompass a broader array of calculators, including machine-learning based calculators, and to include a larger set of workflows. The framework also allows for workflows that employ a mix of different calculators, which may become more commonplace in the future as MLIPs are used prior to accurate physics-based simulations. A combination of DFT and MLIP calculators within a single workflow has already been used to automatically train MLIPs through random structure searches.147 We expect that the capabilities of atomate2 will continue to improve over time. Such potential improvements could simply be the expansion of Calculators and workflows, or may additionally include usability improvements such as calculation dashboards and materials design and submission frameworks. atomate2 is designed to support such community extensions through the python namespace mechanism. As the user community for atomistic and electronic structure calculations grows and calculation methods continue to evolve, software tools must also adapt to meet the changing needs of this community. The improvements implemented in atomate2 represent a path forward to adapting to and accommodating these changes.
By default, atomate2 retains all inputs and outputs of each calculation (unless the user chooses to delete them), thereby preserving complete data provenance. Additionally, because atomate2 currently uses the FireWorks engine to execute workflows, it automatically inherits FireWorks' comprehensive workflow tracking capabilities. All relevant metadata such as job states (waiting, running, completed), timestamps for each state change, and execution details (compute host, return codes, etc.), are recorded in a database. This approach ensures that results remain reproducible and that users can verify whether a given calculation has been run before. Moreover, the underlying workflow engine provides advanced control features (e.g., automatic error handling, reruns, and duplicate workflow detection) to prevent redundant computations. In summary, atomate2 offers full workflow provenance tracking and management functionality on par with other state-of-the-art frameworks.
To further support users at all levels, atomate2 provides a suite of educational materials ranging from interactive tutorials and comprehensive documentation to workshop resources and community forums that guide both newcomers and experienced researchers alike. The support resources include:
• Interactive tutorials: a growing collection of hands-on examples is available in the official tutorials repository. These walkthroughs cover fundamental workflows, advanced features, and practical tips, helping users grasp essential concepts step by step. https://github.com/materialsproject/atomate2/tree/main/tutorials.
• Comprehensive documentation: the primary documentation portal, serves as a centralized resource for best practices, API references, configuration details, and troubleshooting guidance. Its modular structure allows users to dive into topics of interest at their own pace. https://materialsproject.github.io/atomate2/.
• Workshop resources: for those seeking more intensive training, materials from the recent CECAM workshop provide in-depth presentations, interactive demos, and hands-on exercises that illustrate real-world scenarios. The workshop page on the CECAM site offers a detailed overview of session topics and supplementary materials. https://www.cecam.org/workshop-details/automated-ab-initio-workflows-with-jobflow-and-atomate2-1276.
https://lhumos.org/collection/0/680bb4d7e4b0f0d2028027ce.
https://lhumos.org/collection/0/680bb4d3e4b0f0d2028027c9.
https://lhumos.org/collection/0/680bb4d0e4b0f0d2028027c5.
https://lhumos.org/collection/0/680bb4c7e4b0f0d2028027c1.
• Community support: new users and veteran practitioners alike can seek help through GitHub issues, and the MatSci public forum, where an active community of developers and collaborators regularly share insights, answer questions, and foster ongoing improvements to atomate2. These combined resources ensure that anyone from newcomers building their first automated ab initio workflow to advanced users exploring complex, customized jobs can efficiently get started and receive the support they need to succeed with atomate2.
https://matsci.org/c/atomate/atomat2/55.
https://github.com/materialsproject/atomate2.
This journal is © The Royal Society of Chemistry 2025 |