Evan Xiea,
Xijun Wang*b,
J. Ilja Siepmann
c,
Haoyuan Chend and
Randall Q. Snurr
*b
aDeerfield Academy, 7 Boyden Lane, Deerfield, Massachusetts 01342, USA
bDepartment of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, USA. E-mail: snurr@northwestern.edu; wangxijun1016@gmail.com
cDepartment of Chemistry and Chemical Theory Center, University of Minnesota, 207 Pleasant Street SE, Minneapolis, Minnesota 55455, USA
dDepartment of Chemistry, Southern Methodist University, Dallas, Texas 75275, USA
First published on 17th July 2025
Generative artificial intelligence (AI) is emerging as a powerful tool for advancing the design of nanoporous materials such as metal–organic frameworks, covalent–organic frameworks, and zeolites. These materials have potential application in important areas such as carbon capture, catalysis, gas storage, chemical separation, and drug delivery due to their modular, tunable structures, and their performance in these areas depends on precise control over their structure, chemical functionalities, and properties. Herein, we provide a review of generative AI algorithms that are emerging as powerful tools for the design of nanoporous materials, namely generative adversarial networks, variational autoencoders, diffusion models, genetic algorithms, reinforcement learning, and large language models. Some models are particularly good at generating diverse and high-quality designs, while others excel at exploring large design spaces or optimizing materials with desired properties. Certain algorithms also allow for efficient transitions between different designs, and some offer versatility in generating materials based on textual input. We discuss the advantages, limitations, and applications of these algorithms in porous material design and emphasize the future potential of integrating AI with experimental workflows to accelerate the development and validation of AI-generated materials.
Nanoporous materials,9 such as activated carbons and zeolites, are important in a variety of important processes, including adsorptive separations and heterogeneous catalysis. Zeolites are crystalline framework materials made from interconnected rings of silicon (or other atoms in tetrahedral sites) and oxygen atoms. They are widely used in petroleum refining, air separation, and other separations.10 Activated carbons, by contrast, are amorphous materials with a high surface area and tunable porosity, commonly employed in gas purification, water treatment, and energy storage applications.11 In the past 25 years, several classes of new nanoporous materials have emerged in which the materials are synthesized from well-defined building blocks. For example, metal–organic frameworks (MOFs) are synthesized from metal nodes and organic “linkers” that connect the metal nodes. Covalent–organic frameworks (COFs) are constructed from organic molecules linked together by strong covalent bonds. Due to the building-block synthesis approach, a wide variety of MOFs and COFs can be synthesized, and it is possible to tune properties such as their porosity,12–14 surface area,15,16 and topology.17–19 These attributes make them ideal candidates for various applications contributing to clean energy solutions and environmental sustainability. For example, nanoporous materials are being developed for storage of hydrogen20 and methane21 and for carbon dioxide capture22,23 and other molecular separations.24–26 In catalysis, metal atoms in MOF nodes or decorated in these frameworks can serve as active sites to catalyze various chemical reactions, including hydrogenation, oligomerization, and electron donor–acceptor reactions.27–30 Additionally, the porous structures of these frameworks allow for the loading of drugs into their cavities.31 By modifying the pore sizes, topology, and surface chemistry, the release rate of the encapsulated drugs can be finely tuned, ensuring sustained and controlled drug delivery over time.32–35
The immense application potential of nanoporous materials has motivated tremendous efforts to accelerate their discovery using ML. These efforts have successfully predicted gas adsorption,36–38 catalytic,39,40 thermal,41 and electronic properties42,43 for various families of nanoporous materials.44–47 However, ML in this field relies on large, labeled datasets for model training. Acquiring such datasets can be challenging and resource-intensive due to the inherent complexity of porous materials, especially when considering that performance metrics may require predictions at a range of temperatures, pressures, and adsorbate compositions. Additionally, traditional ML models struggle with generalizing beyond the data they are trained on, making it difficult to efficiently explore the vast chemical space or generate new materials with targeted properties. In contrast, generative models have shown great promise in mitigating these challenges, either by rapidly generating a large number of new materials beyond the training data for further screening or by purposefully designing new materials with desired properties. This enables more efficient exploration of the vast material space with reduced sampling requirements and thereby facilitates material design, where desired properties directly guide the generation of suitable material structures.48 This approach is particularly compelling for porous frameworks, given their modular nature, which allows for precise tuning of building blocks to achieve targeted properties.
The remainder of this review is organized as follows. First, we present six generative AI approaches that have shown potential in the design of porous materials. Next, we examine key practical considerations, including data requirements, user-friendliness, and the scalability of these AI approaches. Then, we discuss the challenges and opportunities in applying generative AI to porous material design. We conclude with a summary of key findings and a perspective on the future of generative AI in nanoporous materials design.
Ref. no. | Year | Generative AI method | System studied | Application | Training dataset size | Features used for AI | Performance metrics | Validation methods | Key findings | Limitations/remarks |
---|---|---|---|---|---|---|---|---|---|---|
Ref. 55 | 2020 | ZeoGAN (WGAN-GP variant) | Pure silica zeolites | Methane adsorption | 31![]() |
Energy grids (methane potential energy), material grids (Si and O positions) | Methane heat of adsorption: 18–22 kJ mol−1 | Molecular simulations (classical); comparison with IZA/PCOD databases | Demonstrated design of zeolites with specific methane adsorption properties; generated 121 new crystalline materials | Limited to pure silica zeolites; requires significant computational resources and cleanup steps for connectivity |
Ref. 72 | 2021 | Supramolecular variational encoder (SmVAE) | MOFs | Separation of carbon dioxide from natural gas | 45![]() |
Representation of MOFs in RFcode (composed of edges, vertices, topologies) | CO2 uptake, CH4 uptake, CO2/CH4 selectivity for natural gas separation; CO2 uptake, N2 uptake, CO2/N2 selectivity for flue gas separation | Comparison of top performing MOFs with well-known MOFs and zeolites reported in previous literature | Demonstrated effectiveness of automated design process of MOFs using SmVAE; identified top-performing MOF with CO2 capacity of 7.55 mol kg−1 and a CO2/CH4 selectivity of 16.0 | Hard to compare performance of top materials with literature, since experimental measurements are done at different conditions |
Ref. 69 | 2023 | Cage-VAE | Porous organic cages (POCs) | General application | 1.2 million structures (after data augmentation) | Tri-topic precursor (BB1) skeletons, di-topic precursor (BB2) skeletons, reaction type | Validity, novelty, uniqueness, precursor validity, number of reaction sites, symmetry | MD simulations for stability validation; PCA analysis of latent space; manual inspection for shape-persistency | Successfully generated novel shape-persistent POCs with the Tri4Di6 topology using latent space traversal | Limitations in predicting shape persistence accurately; limitations in exploring diverse reaction types without model adjustments |
Ref. 82 | 2024 | DiffLinker (diffusion model) | MOFs | CO2 capture | 78![]() ![]() |
Molecular fragments from MOF linkers in the hMOF dataset | CO2 adsorption capacity threshold (high performing if > 2 mmol g−1 at 0.1 bar); validity, synthesizability (SAscore, SCscore), uniqueness, internal diversity for MOF linker evaluation | Interatomic distance check; pre-simulation check; structural validation using MD simulations; property validation using GCMC simulations | Identified six AI-generated MOFs with CO2 adsorption capacities >2 mmol g−1 at 0.1 bar, outperforming 96.9% of MOFs in the hMOF dataset; combines generative modeling, AI prediction, and molecular simulations to screen 120![]() |
Generated linkers occasionally failed valency checks, requiring additional filtering; steps become more computationally intensive |
Ref. 80 | 2024 | ZeoDiff (based on DDPM workflow) | Pure silica zeolites | Methane adsorption | 63![]() ![]() |
Three-dimensional grids composed of energy, silicon, and oxygen channels | Structural validity; geometric uniqueness (102 geometrically unique structures) | Post-processing for correct Si/O ratio and accurate connectivity; chemical property distribution analysis; comparison with ZeoGAN model; test of conditional generation with user-desired properties | Generates valid zeolite structures 2000 times more effectively than GAN; successfully generated novel zeolite structures, including those with user-desired properties | Model efficiency is limited by the slow sampling speed of diffusion models; challenges with generating Henry coefficient-optimized structures; applicability to other porous materials might require increased data dimensionality |
Chemical properties: void fraction (target values 0.05, 0.1, 0.15, 0.2, 0.25), Henry coefficient, heat of adsorption (15, 20, 25 kJ mol−1) | ||||||||||
Ref. 107 | 2016 | Genetic algorithm | MOFs | Precombustion carbon capture | 51![]() |
Chromosome representation of MOFs (using 6 integers for 6 features) | CO2 working capacity; CO2/H2 selectivity; adsorbent performance score (APS) | Test of GA robustness by identifying hMOFs with highest gravimetric and volumetric surface areas and methane working capacity; experimental synthesis and testing of top-performing MOFs; GCMC simulations; performance comparison of newly generated MOFs with previously identified MOFs | Identified a list of 50 top-performing MOFs; synthesized MOF NOTT-101/OEt, which achieved a CO2 working capacity of 3.8 mol kg−1 (highest ever under studied conditions at the time) and CO2/H2 selectivity of 60; GA reduced computational time by over 99% compared to brute-force search | Limited identification of high-performing MOFs through structure–property relationships from GA; future work could extend the GA to more complex applications and larger databases |
Ref. 81 | 2025 | MOFFUSION – composed of vector quantized-variational autoencoder (VQ-VAE), diffusion model, and a MOF constructor | MOFs | General application | Dataset of 247![]() |
Signed distance function (SDF) for MOF representation | Hydrogen working capacity (WC) – target values: 5, 15, 25, and 35 g L−1 | MOF generation comparison with previous models SMVAE and MOFDiff | Demonstrated the effectiveness of using SDF for MOF representation. MOFFUSION showed a structural validity of 81.7%, outperforming SMVAE and MOFDiff models. Shows MOFFUSION's ability to process diverse input data formats | MOFFUSION faces challenges with extrapolation; struggles to generate structures with the desired target property when there is limited data |
Total database contained 605 different topologies, 432 metal nodes, 51 organic nodes, and 220 organic edges | Largest cavity diameter (LCD) – target values: 5, 15, 25, and 35 Å | Validation of hydrogen WC through GCMC simulations | ||||||||
Void fraction: 0.6 | ||||||||||
Surface area: 5000 m2 g−1 | ||||||||||
Ref. 109 | 2021 | Multispecies genetic algorithm with fitness approximation (MSGA-FA), combined with artificial neural network for property prediction (MOF-NET) | MOFs | Methane storage | Over 100 trillion hypothetical MOFs | Topology, building block information (consists of edge building blocks and node building blocks) | Methane working capacity (high performing working capacity > 180 cm3/cm3) | GCMC simulations through in-house GPU code and RASPA software | Successfully identified 964 MOFs with methane working capacities exceeding 200 cm3/cm3, with 96 of them surpassing the existing world record of 208 cm3/cm3 | Computational resources required for iterative GA cycles; further exploration needed to understand correlation between building blocks and MOF performance |
Demonstrated the ability of a systematic approach (evolutionary algorithm + ML + MOF constructor) for efficient screening of MOFs | ||||||||||
Ref. 110 | 2021 | Genetic algorithm (GA) combined with machine learning model (MOF-NET) and a flexible cost function | MOFs | Xenon/krypton (Xe/Kr) separation from used nuclear fuel | 245![]() |
Used PORMAKE to generate hypothetical MOFs. Consists of node building block (NBB), edge building block (EBB), topology | Xe/Kr selectivity; xenon and krypton Henry coefficients | Molecular simulations to see the impact of framework flexibility; RASPA simulations; polymorphic simulations | Discovered two viable MOFs with record-breaking Xe/Kr selectivity; demonstrated their model can also incorporate fine-tuned targeting of user-desired properties | High computational cost due to iterative GA cycles; prediction capability of machine learning model decreases with higher selectivity values |
Ref. 111 | 2016 | MOF functionalization GA (MOFF-GA) | MOFs | Postcombustion CO2 capture | 1.64 trillion structures | Chromosome representation of MOFs using parent MOF and functional group code (FGC) | CO2 uptake capacity (>3 mmol g−1 at 0.15 atm and 298 K); surface area; parasitic energy | Validation of MOFF-GA on a set of 48 experimentally characterized MOFs; evaluation of CO2 uptake with GCMC simulations | Discovered an average of 3.7-fold increase in CO2 uptake for 141 optimized MOFs; demonstrated effectiveness in finding top-performing structures with minimal sampling | Some structures may be difficult or impossible to synthesize |
Ref. 122 | 2024 | Deep reinforcement learning | MOFs | Direct air capture (DAC) of CO2 | 646![]() ![]() ![]() |
Combination of organic linkers (using SELFIES representation), metal clusters, and topologies | CO2 heat of adsorption (>30 kJ mol−1); CO2/H2O selectivity (>1); validity, scaffold, and uniqueness of generated MOFs | Molecular simulations for generated MOF validation; structural feasibility tests through synthetic accessibility score and topological RMSD | Successfully designed structures with high CO2 affinity (heat of adsorption > 40 kJ mol−1) and CO2/H2O selectivity (>1); revealed distinctive features in top-performing structures | Relies on large training dataset, which requires a tradeoff between computational cost and predictive accuracy; limited experimental validation of results |
Note: integrated two predictive models, one optimizing CO2 heat of adsorption, the other CO2/H2O selectivity | ||||||||||
Ref. 142 | 2023 | ChatGPT-based workflow with ChemPrompt engineering | MOFs | Use of LLMs as chemical research assistant through text mining and data analysis | 228 MOF peer-reviewed papers (to extract 26![]() |
18![]() |
Precision (>95%), recall (>90%), F1 scores (>92%) for text mining; accuracy (87%) and F1 score (92%) in determining MOF crystalline state based on synthesis conditions | Manual verification of results; use of training/test sets for model predictability; comparison of predicted crystalline states with experimental results | Introduces an AI-driven workflow using ChatGPT to efficiently mine, analyze, and present MOF synthesis data; successfully predicts MOF experimental crystallization outcomes; introduces a data-driven MOF chatbot | Difficulties in accurately determining volumes/concentration of chemicals; limited by factors such as token count and paragraph segmentation |
Ref. 139 | 2023 | GPT-4-based reticular chemist | MOFs | Guided discovery and synthesis of MOFs | Not applicable | Leverages features like MOF structures, synthesis parameters, properties, and literature data to guide prompt engineering and in-context learning for GPT-4 | Accuracy, validity, precision of the GPT-4 answer/suggestions | Experimental validation (NMR, XRD, etc.) | Demonstrates that iterative human-AI collaboration can accelerate material discovery and optimization. Successfully discovered and synthesized four new isoreticular MOFs (MOF-521 variants) | Performance is reliant on human feedback for learning; challenges with advanced analytical tasks, such as detailed topological analysis of MOF structures, are beyond GPT-4's capabilities |
Ref. 140 | 2024 | GPT-based ChatMOF system (GPT-4, GPT-3.5-turbo, and GPT-3.5-turbo-16k) | MOFs | Search, prediction, and generation of MOFs with user-desired properties | MOFs from CoRE MOF and QMOF databases | ChatMOF uses 4 categories of tools. Searcher: uses MOFs from CoRE MOF and QMOF. Predictor: Utilizes MOFTransformer with features like bonds, atoms, surface area, and topology. Generator: applies a genetic algorithm based on topology and building blocks. Utilities: Leverages LangChain for file search, internet search, calculations, etc. | Accuracy: 96.9% (search) 95.7% (prediction), 87.5% (generation tasks); RASPA simulations for generated structures | Computational simulations; manual verification of results; accuracy analysis | Demonstrated the versatility of LLMs in predicting, generating, and searching for MOF structures based on user input | Constrained by token and computational limits in LLMs; scarcity of specialized data; need for experimental validation of the generated MOFs |
![]() | ||
Fig. 2 (a) Basic architecture of a GAN, featuring two neural networks: the generator and discriminator, which work adversarially to generate realistic data. (b) Overview of the ZeoGAN model. Energy (green) refers to the potential energy for methane adsorbate molecules, and material grids indicate silicon (yellow) and oxygen (red) atoms. Adapted with permission from ref. 55. Copyright 2020 American Association for the Advancement of Science (AAAS). |
In the context of porous material design, GANs are known for their ability to produce highly realistic samples.50,56 The generator proposes new frameworks meeting specific criteria, such as optimal pore size,57 chemical stability,51 or surface area, while the discriminator ensures that these proposed designs resemble real frameworks. This adversarial setup allows GANs to explore expansive chemical spaces and generate novel porous frameworks that might be overlooked by human intuition. For example, Kim et al.55 developed a zeolite GAN, named ZeoGAN, to generate pure silica zeolite structures (Fig. 2b). The input features for training include material grids representing fixed silicon and oxygen atom distributions, and energy grids representing the methane–host interaction potential derived from classical force fields. The workflow of ZeoGAN involves feeding structured grids into the generator, which attempts to create realistic zeolites while the critic evaluates their plausibility. The model iteratively refines its outputs using adversarial training. In this work, the Earth mover's distance (EMD)58 which represents the minimum cost required to transform one probability distribution into another, is used to quantify the difference between the distribution of generated data and that of the training data. The goal of optimizing EMD is to make the generated data distribution increasingly similar to the training data distribution, ensuring that the generated samples are realistic and physically meaningful. Using this approach, trained on 31173 methane-accessible zeolites, ZeoGAN generated 1 million potential structures. After screening for proper bond connectivity and maintaining the correct Si
:
O ratio, eight unique zeolites were identified that were not present in the training dataset, suggesting that ZeoGAN generated structures beyond the scope of its training data. ZeoGAN was further refined to generate structures with specific user-desired properties, by biasing its learning process to generate materials within a specific heat of adsorption range (18–22 kJ mol−1), resulting in 121 feasible zeolites with the desired adsorption properties.
GANs offer significant flexibility in porous design because of their ability to learn and model complex data distributions. Unlike traditional methods like descriptor-based regression models that assume relatively simple structure property relationships, GANs can adapt to a wide variety of data patterns. For instance, Mao et al.53 leveraged GANs to design 2D porous materials with optimized isotropic elastic properties by generating configurations based on crystallographic symmetries and porosity constraints. They constructed datasets representing different symmetry groups, each containing around one million configurations with varying pixel matrices, Young's modulus, and isotropy. By training GANs on these various datasets, they produced 400 configurations that achieved over 94% of the theoretical maximum Young's modulus across different porosities, demonstrating the ability of GANs in generating near-optimal designs without extensive trial-and-error.59
While GANs have been successfully used for designing materials with relatively simple compositions, such as zeolites (especially all-silica zeolites),53,55,60 their application to more complex materials like MOFs and COFs remains challenging. The primary difficulty stems from the significant structural diversity of these materials, as traditional GAN architectures struggle to capture the vast range of topologies, bonding patterns, and coordination environments present in MOFs and COFs.61 Unlike zeolites, these materials incorporate a wide variety of atom types, metal–ligand interactions and the complexity of organic molecules, which GANs find difficult to encode in a latent space and accurately reconstruct during generation. Another fundamental challenge lies in mode collapse, a well-known limitation of GANs, where the model tends to generate only a limited subset of structures rather than fully exploring the diverse chemical space. Given the complexity of MOFs and COFs, this issue is exacerbated as the model struggles to balance long-range periodicity with local coordination constraints, often leading to unrealistic or repetitive frameworks.
To mitigate these challenges, some studies have used advanced versions of GANs, such as deep convolutional GANs (DCGANs), to better manage these complexities. For example, Long et al.51 developed a constrained crystal DCGAN (CCDCGAN), integrating deep convolutional layers, to learn hierarchical features from the input data.62 By leveraging deep convolutional layers, the model progressively extracts hierarchical features from input data. Early layers focus on simple geometric details, such as edges or corners, while deeper layers learn more complex representations, such as the spatial arrangements and symmetries that define crystal lattices. This layered approach enables the model to capture both local bonding environments and global structural characteristics. The CCDCGAN further incorporates constraints directly into the generative process, ensuring that the generated structures meet thermodynamic stability and symmetry requirements. By embedding these constraints, the model not only adheres to physical and chemical principles but also explores a broader latent space to identify novel configurations. This combination of hierarchical feature learning and constraint integration allows CCDCGAN to overcome the limitations of traditional GANs in capturing the vast structural diversity and complex connectivity of porous materials.
We note that traditional GANs also face challenges with training instability, where the generator and discriminator fail to converge properly,63 or with mode collapse, where the generator fails to capture the full diversity of the target distribution and repeatedly produces only a limited subset of samples.64 These issues also hinder discovering new materials that may differ significantly from the training data, such as MOFs and COFs with similar building blocks yet different topologies. To mitigate these challenges, some studies52,55,65 have adopted Wasserstein GANs (wGANs),66 which replace the traditional GAN loss function with the EMD introduced earlier. This leads to more stable training and helps the model converge more effectively.
The training objective of VAEs is to maximize the Evidence Lower Bound (ELBO) :
![]() | ||
Fig. 3 (a) Basic architecture of a VAE with an encoder-decoder structure for molecular or material design. Adapted from ref. 71. Licensed under CC BY 4.0. (b) Automated porous framework discovery platform using the supramolecular variational autoencoder (SmVAE). Reprinted from ref. 72, with permission from Springer Nature Copyright 2021. |
One of the major advantages of VAEs is their ability to create a smooth and continuous latent space, which makes it easier to explore new material structures and discover materials with specific properties. This latent space represents the complex, high-dimensional data of material structures in simpler, lower-dimensional form. The continuous nature of this latent space is particularly beneficial for exploring and interpolating between different material designs. Additionally, optimization in the continuous latent space is more tractable than optimizing discrete structures, as it allows for the use of gradient-based methods.
In contrast, discrete optimization is often challenging due to the combinatorial nature and non-differentiability of the structure space. A notable example of this is the supramolecular variational encoder (SmVAE) developed by Yao et al.72 which aimed to design new MOFs with enhanced properties for CO2/N2 and CO2/CH4 separation. The structural training data came from the CoRE MOF 2019-ASR database,73 which contains experimentally synthesized MOFs. The dataset was augmented to approximately two million MOF structures by applying random functionalization to known molecular fragments. The features extracted for input into the model included the MOF edges, vertices (both inorganic and organic), and topologies defining the reticular framework connectivity. Grand canonical Monte Carlo (GCMC) simulations were performed on 45000 randomly selected MOFs to obtain the gas adsorption properties. Four textural properties (pore-limiting diameter (PLD), largest cavity diameter (LCD), density, and accessible gravimetric surface area (AGSA)) were computed geometrically for these 45
000 structures. The workflow of the SmVAE consists of an encoder that maps discrete framework representations (RFcodes) into a continuous latent vector space and a decoder that reconstructs MOFs from this space. RFcode is an extension of MOFid,74 which is a unique identifier string that encodes the metal node, organic linker, and topology information of a MOF. Similarly, RFcode72 represents the structure as a tuple of edges (represented by SMILES), vertices, and topology of the decomposed MOF. The model was trained in a semi-supervised manner using both structures with known properties (45
000 MOFs) and those without property data (the remaining dataset). A Gaussian Process (GP) model was then trained on the latent space to guide optimization towards structures with improved properties. The optimization was achieved by navigating the latent space and generating new MOFs predicted to have superior CO2 separation capabilities. Using this approach, the SmVAE successfully identified candidates with high CO2 capacity and selectivity, with the top-performing MOF achieving a CO2 capacity of 7.55 mol kg−1 and a selectivity of 16.0 for CO2/CH2 separation, making it strongly competitive against the best performing materials in the literature for this separation.
In a related study, Zhou et al.69 developed a VAE called Cage-VAE, specifically designed for generating porous organic cages (POCs). Cage-VAE encodes the structural features of existing POCs into a continuous latent space, effectively capturing their geometric and stability characteristics. By sampling different points in the latent space of the model, the authors found that Cage-VAE was highly effective at creating new POCs, particularly in biasing the generation process toward a specific desired property, such as shape persistence, which refers to the ability of a cage to retain its three-dimensional geometry without collapsing. Cage-VAE achieved a high success rate for producing valid, novel, and unique POC structures, with validity, novelty, and uniqueness scores all exceeding 0.900. Here, validity refers to the proportion of chemically valid molecules, as determined by whether the generated SMILES strings can be successfully parsed into molecular graphs. Novelty measures the fraction of valid molecules that do not appear in the training dataset. Uniqueness represents the proportion of valid molecules that are non-duplicated within the generated batch. Additionally, the study incorporated advanced techniques like Bayesian optimization and spherical linear interpolation to explore the latent space more efficiently, demonstrating how VAE, when integrated with other ML methods, can enhance the targeted design of functional materials by guiding generative processes toward specific chemical and structural goals.
Another advantage of VAEs is their stability during training. Unlike GANs that need much fine-tuning, VAEs tend to converge consistently because of their well-defined loss function. This loss function balances how well the model reconstructs the original data with a regularization term that shapes the structure of the latent space. As a result, VAEs are less likely to experience issues like mode collapse, which is a common problem with GANs where the model fails to capture the full diversity of the training data. Furthermore, the latent space created by VAEs allows researchers to generate new structures with combined or intermediate properties.
In recent years, variants of VAEs have been increasingly applied to assist porous materials design. For instance, Sun et al.47 developed a VAE-like encoder-decoder architecture within a meta-learning framework to extract structural fingerprints of nanoporous materials and predict their hydrogen adsorption behavior. Their study leveraged high-throughput MC simulations to generate adsorption data for a diverse set of materials, including MOFs, hyper-cross-linked polymers (HCPs), and zeolites, across a broad range of temperatures and pressures. By encoding the adsorption loading surface into a latent fingerprint representation, their model enabled accurate prediction of hydrogen uptake while circumventing the limitations of traditional adsorption isotherm fitting approaches. Instead of training separate models for different materials, the authors developed a single meta-learning model that generalizes across material classes and effectively predicts their hydrogen adsorption performance, demonstrating improved accuracy and transferability compared to conventional methods.54
A common problem with VAEs is an insufficient disentangling effect. This issue arises when the VAE learns a latent space where multiple factors are entangled or overlapping in a single latent dimension, making it difficult to control or interpret specific features of the data. This happens because the VAE's decoding process is probabilistic, which can blend different features together and smooth out important details.75 In the context of materials design, this means that the VAE may not be able to differentiate between subtle variations in properties like chemical composition, pore structure, or topology required for practical applications.76,77 As a result, additional refinement steps, such as using further computational or experimental validations71,75,78 may be required to ensure that the generated materials meet the desired performance and exhibit clearly defined and controllable structural and chemical features necessary for real-world synthesis and application.
![]() | ||
Fig. 4 (a) Overview of the diffusion model, which begins with random noise and iteratively denoises the input through learned probabilistic transitions to generate outputs resembling the original data distribution. (b) Graphical representation of the diffusion process for zeolite generation using ZeoDiff. Adapted from ref. 80. Licensed under CC BY 3.0. (c) Model architecture of MOFFUSION. Within MOFFUSION, a denoising 3D U-Net is used for the diffusion process. Adapted from ref. 81. Licensed under CC BY-NC. |
This gradual corruption encodes the data into a form that is easy to model statistically but retains traces of the original structure. Next, the reverse process learns to reverse the noise addition by iteratively denoising the data to recover the original distribution. Using a trained neural network, the model predicts the noise added at each step and refines the data accordingly. The reverse process can be approximated as:
In generative discovery, DMs have been shown to create high-performing, complex material structures, including MOFs. For example, Park et al.82 utilized a diffusion model named DiffLinker to generate chemically diverse MOF linkers for enhanced CO2 capture. The model was trained on the hMOF dataset,83 which contains 137652 hypothetical MOFs with geometric features and adsorption data for various gases. The training data included high-performing MOF linkers, which were extracted and decomposed into molecular fragments serving as input features. DiffLinker employed a generative diffusion process, where Gaussian noise was iteratively added to the molecular fragments and then removed through a denoising network, enabling the generation of chemically diverse and unique linkers. These linkers were subsequently assembled with pre-selected metal nodes (Cu paddlewheel, Zn paddlewheel, Zn4O nodes) into MOFs with a primitive cubic (pcu) topology. To evaluate these AI-generated MOFs, the study employed a comprehensive screening workflow that included MD and GCMC simulations. This process ensured that the MOFs not only met structural validity and stability requirements but also demonstrated high CO2 adsorption capacities. Among the generated candidates, six MOFs exhibited CO2 adsorption capacities exceeding 2 mmol g−1 at 0.1 bar pressure and room temperature, outperforming 96.9% of the MOFs in the reference dataset.
Researchers have also worked to enhance the robustness of DMs by combining them with other generative algorithms, such as VAEs. For example, the Crystal Diffusion Variational Autoencoder (CDVAE) was introduced by Xie et al.84 in 2021 to generate realistic 3D periodic structures of stable crystalline materials. They integrated a VAE with a diffusion model, specifically a noise conditional score network (NCSN), by encoding material structures into a latent space and using the NCSN in the decoder to refine noisy structures (a process that predicts adjustments needed to move towards a stable state) through Langevin dynamics. This integration embeds physical inductive biases, such as energy minimization and bonding preferences, ensuring that the generation process respects stability constraints and invariances, thus improving model robustness. Since then, it has been adapted for various applications. For example, Lyngby et al.85 adapted CDVAE to generate 2D materials, training it on 2615 known stable materials. Their model predicted 11630 new 2D materials, many of which were more complex than the training examples. Among these, over 8500 materials were found to be chemically stable, with formation energies within 0.3 eV per atom of the convex hull (reference energy), and over 2000 were potentially synthesizable, within 50 meV per atom of the convex hull. In another study, Pakornchote et al.86 employed a different approach called the denoising diffusion probabilistic model (DDPM) in the diffusion model component of the CDVAE. They found that this modified model generated structures that were closer to their true ground states, as predicted by DFT, with an improvement of around 68.1 meV per atom compared to the original CDVAE.
One reason that DMs are effective is that they can introduce diversity in the generated samples, which is crucial for discovering materials that might be overlooked by human intuition. Park et al.80 developed a diffusion model named ZeoDiff to generate all-silica zeolites. ZeoDiff significantly outperformed a previously developed GAN model, ZeoGAN,55 in terms of structural validity, achieving a 2000-fold increase in the ratio of valid to total generated structures. Specifically, after post-processing, only 0.0008% of the structures generated by ZeoGAN were valid, whereas ZeoDiff achieved a validity rate of 1.83%, highlighting its enhanced capability in producing physically realistic and synthesizable materials. ZeoDiff introduces diversity in the generated samples through its stochastic diffusion-denoising process. Its workflow begins with a representation of zeolite structures as three-dimensional grids composed of energy, silicon, and oxygen channels (Fig. 4b), akin to RGB channels in image processing. These grids are progressively noised and then denoised by the model to generate new, realistic zeolite frameworks. To ensure the validity of generated structures, a post-processing procedure corrects atomic connectivity and Si/O ratios, further refining the outputs. Using this approach, ZeoDiff successfully generated a variety of complex zeolite structures that were previously unknown. Among the 183 generated structures, 84 were entirely new and featured unique geometric properties (Fig. 4b).
In another study, Alverson et al.52 compared the performance of Wasserstein GANs, Vanilla GANs, and DMs in generating crystal structures that are both synthesizable and chemically stable, as determined by predicted formation energy using a pre-trained ML model and stability analysis through iterative DFT relaxation calculations. They found that the diffusion model greatly outperformed the GAN models, creating symmetrical and realistic-looking structures that were validated through energy relaxation calculations. Importantly, the DMs did not suffer from mode collapse, a common problem with GAN models where diversity in generated samples is lost. Instead, the DMs produced a wide range of lattice parameters, lattice angles, and space groups. The ability of DMs to effectively process and accurately reconstruct complex data distributions ensures that the generated frameworks not only meet a variety of design requirements but also maintain structural stability.
One challenge for DMs is their high computational cost. Despite offering high fidelity and rich structure generation, training a DM can require several days on multiple high-performance GPUs, with reported carbon emissions reaching ∼9 kg of CO2 equivalent for training alone, and up to hundreds of kilograms for large-scale data generation depending on resolution and sample size.87 Although efficient sampling methods88–90 such as the DDPM88 employed by Pakornchote et al.86 can help reduce some of this cost by speeding up the inference process, the overall computational demands are still significant. For example, when comparing regular DMs, DDPMs, and GANs in image synthesis on the ImageNet 256 × 256 dataset, regular DMs and DDPMs have significantly higher computational demands compared to GANs. Regular DMs require the longest training time—7 million steps—and have the largest model size, with 675 million parameters.91 In contrast, GANs offer the fastest inference time at 0.07 seconds (ref. 92) and the smallest model size, with 166.3 million parameters.93 Although DDPMs are 3× faster than regular DMs, they still require substantial computational resources compared to GANs.93
This challenge has driven researchers to develop innovative approaches that balance computational efficiency and generative performance in DMs. A notable example is the work by Park et al.,81 who developed MOFFUSION, a denoising diffusion probabilistic model for MOF structure generation designed to efficiently explore the vast chemical space of MOFs while ensuring structural validity and tunable properties (Fig. 4c). A key innovation of MOFFUSION is its use of the signed distance function (SDF) representation for MOFs, a mathematical framework that encodes geometric shapes by measuring the shortest distance from any point in space to the nearest surface. SDF provides a highly effective way to describe the intricate pore structures of MOFs, but its high dimensionality and large data volume (323 grid points) pose significant computational challenges, making it infeasible for conventional DMs to process efficiently. To address this issue, the authors incorporated a vector quantized-VAE (VQ-VAE), a discrete latent representation variant of VAE, for feature compression and latent space mapping. By reducing the input data dimensionality from 323 to 83 before feeding it into the diffusion model and subsequently scaling the generated data back up to 323, this compression-decompression process significantly reduces the computational load. As a result, MOFFUSION enables the efficient processing of high-dimensional feature space containing diverse modalities of data including 3D structural data, numeric, categorical, and text data, making large-scale MOF generation computationally affordable.
DMs also require large amounts of high-quality training data to cover the diversity of materials, typically on the order of tens of thousands of examples.84,94 As introduced by Xie et al.,84 the Perov-5 dataset consists of 18928 perovskite materials with 56 elements and 5 atoms per unit cell. The carbon-24 dataset95 contains 10
153 carbon-based materials with 6–24 atoms per unit cell, while the MP-20 dataset96 from the materials project includes 45
231 materials with up to 89 elements and 1–20 atoms per unit cell. These datasets highlight the scale and diversity needed for training DMs. Datasets for generative discovery of nanoporous materials are often quite limited,97 especially when targeting novel or difficult-to-compute properties. One solution to this challenge is to use data augmentation techniques to expand the training dataset98 or to apply transfer learning, leveraging existing data from related materials.99–101
![]() | ||
Fig. 5 (a) Workflow of GA and (b) An example chromosome and the corresponding hMOF structure. Colors help illustrate the correspondence between the genes and the hMOF structural features. Adapted from ref. 107. Licensed under CC BY-NC. |
In the context of porous material design, e.g., MOFs, to generate new individuals, genetic operators like crossover and mutation are applied. Crossover, or recombination, combines the structural building blocks (“genes”) of two parent configurations to create offspring. For instance, a typical crossover involves exchanging structural units between two selected MOFs, creating new combinations of inorganic nodes, organic linkers, and functional groups. Mutation introduces random changes to the offspring to create diversity and explore new regions of the design space.102,103 It occurs with a predefined probability (e.g., 5%) for each gene, where a randomly chosen gene (such as the type of metal node, organic linker, or functional group) is altered to a different valid option from the dataset. This introduces structural variations that help the algorithm explore novel MOF configurations and avoid premature convergence to suboptimal solutions. The iterative process of crossover and mutation continues for a fixed number of generations or until a material achieving a desired fitness is found. The inherent parallelism of GAs allows them to evaluate multiple solutions simultaneously,104,105 significantly speeding up the search process, especially when using computationally expensive molecular simulations and DFT calculations to evaluate the fitness of the generated candidates.106 GAs are particularly advantageous when the design space is vast and not easily navigable by traditional methods. In contrast to DMs, GAs rely on simulation-based fitness scoring and do not involve neural network training, which is the major contributor to the carbon footprint of DMs. However, since each GA evaluation involves simulations that may take hours, whether GAs have a lower carbon footprint than DMs ultimately depends on the specific application and computational setup.
The effectiveness of GAs in discovering superior porous frameworks has been demonstrated in various studies. For example, Chung et al.107 used a GA to identify high-performance MOFs for precombustion CO2 capture. As depicted in Fig. 5a, the search space consisted of 51163 unique structures from the hMOF database,108 where each MOF was represented by a chromosome of six integers (Fig. 5b), encoding key structural units such as inorganic nodes, organic linkers, and functional groups. The GA workflow began with an initial population of 100 MOFs, selected to ensure diversity. The algorithm then evolved these MOFs over multiple generations through tournament selection, crossover, and mutation. Crossover was applied with a 65% probability, where a single-point crossover mechanism was used to exchange structural units (e.g., inorganic nodes, organic linkers, and functional groups) between two selected parent MOFs. A random crossover point was chosen along the chromosome, and the genes beyond this point were swapped between the two parent MOFs. This process helped preserve beneficial traits while introducing new combinations. Following this, mutation was introduced with a 5% probability, where one or more structural units were randomly modified. This step enabled the algorithm to explore novel configurations and avoid premature convergence to local optima. In each generation, high-performing MOFs were identified based on CO2 working capacity and CO2/H2 selectivity, evaluated using GCMC simulations. These high-performing MOFs were then recombined and mutated to create new candidates, and the process was repeated for 10 generations. Using this approach, Chung et al. identified and experimentally validated NOTT-101/OEt, a MOF with a CO2 working capacity of 3.8 mol kg−1 and a CO2/H2 selectivity of 60, outperforming previously reported MOFs under the same conditions. Additionally, their GA model reduced computational effort by over 99% compared to a brute-force screening of the entire database, demonstrating the efficiency of AI-driven material discovery.
In another instance, Lee et al.109 employed genetic algorithms to explore over 100 trillion potential MOFs for methane gas storage. By utilizing GCMC simulations and Artificial Neural Networks (ANN) to assess the working capacity of these MOFs, their algorithm successfully identified 964 MOFs with methane working capacities exceeding 200 cm3/cm3, with 96 of them surpassing the existing world record of 208 cm3 (gas at STP)/cm3 (MOF). Lim et al.110 used a similar approach, combining genetic algorithms with GCMC and ANN, to identify two MOFs that outperformed the current benchmark for xenon/krypton separation. Moreover, their research enhanced the genetic algorithm by considering additional properties such as the cost and selectivity of the frameworks, demonstrating its capability not only to identify optimal materials but also to ensure the practical applicability of MOFs.
Collins et al.111 developed a GA-based approach, named MOFF-GA, to optimize functional groups within MOFs for enhanced CO2 capture. Focusing on experimentally characterized MOFs, the algorithm employs tailored crossover and mutation schemes to efficiently explore the vast search space of possible functional group combinations. This approach was applied to 141 parent MOFs, resulting in 1035 functionalized derivatives with CO2 uptake capacities exceeding 3 mmol g−1 at 0.15 atm and 298 K evaluated using GCMC simulations, outperforming the original MOFs by an average of 3.7 times. Remarkably, the MOFF-GA was effective even when working with a small search space of fewer than 1000 structures.
GAs can be applied to a wide range of material design problems, which makes them versatile tools that can be combined with other ML algorithms for better results. For example, Jennings et al.103 combined an on-the-fly trained Gaussian Process (GP) regression model with a GA. The GP serves as a computationally inexpensive surrogate to predict the energy of candidate materials, significantly reducing the need for time-consuming energy calculations using DFT. This hybrid approach, termed ML-accelerated GA (MLaGA), incorporates two levels of evaluation: the ML-predicted energy for quick screening and DFT calculation for final verification. By allowing the GP model to rapidly eliminate less promising candidates, the MLaGA achieved a 50-fold reduction in the number of required energy evaluations compared to a traditional GA.
It should be noted that in several of the examples described above, the GA is not really generative; instead, the GA was used as an optimization tool on an existing set of structures. However, by combining MOF features in new combinations, it is possible to generate new structures that have not previously been considered. One drawback of GAs is their slow convergence in complex and high-dimensional search spaces.103 Also, since GAs are heuristic, they do not guarantee finding the global optimum. Instead, they rely on stochastic processes that may converge to local minima in the search space.112,113 This heuristic nature requires careful tuning of parameters, such as mutation rate, crossover rate, and population size, to find the right balance between exploring new solutions and refining existing ones.114,115 Poorly chosen parameters may lead to premature convergence, a loss of diversity, or an inefficient search process.116 Additionally, evaluating the fitness of each individual in a population can be computationally expensive, especially when dealing with large populations or many generations. To address this, many recent applications of GAs in materials design integrate surrogate models such as neural networks to predict the performance of generated materials.117–120 This combination reduces the need for costly computational simulations to evaluate material performance, thus lowering overall resource requirements and speeding up the optimization process.
![]() | ||
Fig. 6 (a) In RL, an agent learns to make decisions by interacting with an environment, receiving rewards or penalties, and adjusting its strategy through trial and error to improve outcomes. (b) Schematic of the RL framework for generative design of MOFs for direct air capture of CO2. The agent (generator) generates a MOF structure, which the environment (predictor) evaluates to return a reward. The agent uses this feedback to iteratively generate improved MOF structures with desirable properties. Adapted from ref. 122. Licensed under CC BY 3.0. (c) Schematic of the collaborative deep RL system pipeline for optimal digital material discovery, using a 3 × 3 design space of 2D soft and stiff material components. Adapted with permission from ref. 123. Copyright 2021 American Chemical Society. |
Mathematically, the agent's goal is to find the optimal policy π* that maximizes the expected cumulative reward:
RL treats the discovery process as a series of interdependent decisions, where each step builds upon the previous one to optimize the overall outcomes. This makes RL well-suited for handling complex, multi-step synthesis or optimization tasks. A key challenge in RL for material design is balancing exploration and exploitation. Exploration seeks novel material configurations, while exploitation refines known high-performing structures. Too much exploration increases computational costs and inefficiency, while excessive exploitation risks missing superior materials. Striking this balance is crucial for optimizing both efficiency and discovery.
Park et al.122 used a deep RL model to design MOFs for direct air capture of CO2. Their RL model consists of two key components: a generator (agent) that proposes MOF structures and a predictor (environment) that evaluates these structures based on their estimated CO2 heat of adsorption and CO2/H2O selectivity. The training data was derived from computationally generated MOFs, constructed using PORMAKE,109 a tool developed by the authors to assemble MOF structures from predefined metal nodes, organic linkers, and topologies. The RL workflow begins with a pre-training phase, where the generator learns how to construct chemically valid MOFs by analyzing a large dataset of MOFs. The predictor is trained separately from GCMC-computed target properties. Once pre-trained, the RL process starts, with the generator sequentially selecting a topology, metal cluster, and organic linker to propose new MOF structures. These structures are then evaluated by the predictor, which estimates their adsorption properties and provides a reward signal to refine the generator's design strategy. To balance the trade-off between exploitation and exploration, the RL model employs a dual-generator system: one biased toward existing high-performance structures and another encouraging novel MOF exploration. The RL process iterates over multiple rounds, each time refining the generator's ability to propose MOFs that meet the dual objectives of strong CO2 adsorption and high CO2/H2O selectivity—a significant challenge due to the strong water affinity of many materials. Their study demonstrated that with each round of training, the generated MOFs increasingly met the desired property criteria. The RL-optimized MOFs exhibited some of the highest reported values for CO2 heat of adsorption (∼62 kJ mol) and CO2/H2O selectivity, indicating a strong affinity for CO2 under atmospheric conditions (400 ppm, 1 bar, 298.15 K) for direct air capture (DAC). Further chemical analysis of the generated MOFs revealed distinctive features in top-performing structures, such as Mn and Eu-based metal clusters in MOFs with high CO2 adsorption, and Cu and Zn-based clusters in MOFs with high CO2/H2O selectivity.
Zheng et al.124 applied a policy-gradient RL framework to iteratively distribute hydroxyl and epoxide groups on the basal plan of graphene to maximize material toughness. This approach successfully addressed the combinatorial complexity of the problem, achieving optimized designs within a vast solution space of up to 1016 possibilities. Additionally, RL can incorporate different objectives during its learning process, allowing it to optimize multiple properties simultaneously.125 For example, Sui et al.123 used a deep RL framework to optimize two mechanical properties of complex materials, specifically targeting both stiffness and toughness (Fig. 6c). The authors demonstrated how RL can balance conflicting design objectives and explore vast design spaces efficiently. These studies, although not directly focused on porous materials, demonstrate the efficiency and innovation of RL in multi-objective-driven design.
The process of learning through trial and error, which is central to RL, typically requires a large number of samples or simulations to find an optimal solution.126–129 This issue is further compounded in material design applications, where the state space (i.e., the possible configurations of materials) is extremely large123,130 and the relationship between actions (design decisions) and rewards (material properties) is highly non-linear.131 For instance, the deep RL framework developed by Park et al.122 required extensive computational resources due to the sheer scale of data and iterative training. The generator was trained on 1540
889 MOFs, validated on 385
223, and tested on 10
000, running for 50 epochs with a batch size of 128. The predictor, trained separately over 100 epochs, relied on ∼33
000 MOFs for CO2 heat of adsorption and ∼24
000 for CO2/H2O selectivity, requiring costly GCMC simulations for data generation. Based on our group's recent benchmarks,132 such simulations take on average 3–4 hours per MOF using the CPU-based RASPA2 code. Even with our recently developed gRASPA code,132 which achieves a 20-fold speedup on a single A100 GPU node, generating these datasets still requires ∼2000 GPU-hours for CO2 heats of adsorption and ∼1500 GPU-hours for CO2/H2O selectivity. The RL phase further increased the burden, with each policy gradient training epoch selecting 8000 MOFs and running over 20 epochs. The repeated evaluations, training cycles, and dependence on high-fidelity simulation data made this RL approach computationally expensive.
Recently, LLMs have been applied to understand and predict material properties, generate new material compositions, and suggest synthesis pathways based on literature and databases. Their versatility, combined with their integration with other generative models, makes them a promising tool for advancing material design. Adapting LLMs for material design involves fine-tuning them on specialized datasets containing information about suitable material features like their chemical compositions and desired properties. One key aspect of fine-tuning LLMs is prompt engineering, where the researcher interacts with the LLM through carefully designed prompts to elicit specific and meaningful responses. By crafting prompts that guide the model's reasoning and knowledge retrieval, researchers can optimize LLM outputs for specific tasks, such as synthesis planning and material property prediction. Once fine-tuned, LLMs can carry out several important tasks within the material design process (Fig. 7a).135 For instance, LLMs can search for known materials and provide detailed descriptions of their structures and properties.136 In this role, LLMs serve as highly sophisticated encyclopedias, offering researchers comprehensive and easily accessible information on existing materials.136–138
![]() | ||
Fig. 7 (a) Overview of materials science (Mat. Sci.) LLM requirements for knowledge acquisition and science acceleration. Adapted from ref. 135. Licensed under CC BY 4.0. (b) Schematic of the GPT-4 Reticular Chemist, which includes three states: “ReticularChemScope,” “ReticularChemNavigator,” and the “ReticularChemExecutor.” Each state uses GPT-4 with distinct prompts, operating entirely through natural language, without coding. Adapted with permission from ref. 139. Copyright 2023 Wiley-VCH. (c) Schematic of ChatMOF featuring three core components: agent, toolkit, and evaluator. The agent formulates a plan based on a user query, selects an appropriate toolkit, and the evaluator provides the final response. Adapted from ref. 140. Licensed under CC BY 4.0. (d) Overview of the task-solving process in ChemCrow, which employs an automated, iterative chain-of-thought process to select tools, define inputs, and determine solution pathways. Toolsets in ChemCrow include modules for molecules, safety, reactions, and general-purpose tasks. Adapted from ref. 141. Licensed under CC BY 4.0. |
A key challenge in human-AI collaborative materials design lies in enabling AI to effectively learn and utilize existing human knowledge. LLMs have shown significant potential in organizing and interpreting data extracted from the literature. Zheng et al.142 employed prompt engineering to guide GPT-3.5-turbo in automating the extraction of MOF synthesis conditions from scientific publications, addressing the common issue of information hallucination in LLMs. They developed a ChemPrompt Engineering strategy, which integrates principles such as minimizing hallucination through carefully designed queries, providing explicit and structured instructions, and ensuring standardized output formats for reliable data extraction. To achieve this, they constructed a multi-step workflow that enables ChatGPT to parse, filter, and summarize synthesis data with high accuracy. Their approach combined direct summarization of preselected experimental sections, automated classification of synthesis-related paragraphs, and embedding-based filtering to enhance processing efficiency. Applying this system, they extracted 26257 synthesis parameters for approximately 800 MOFs with an accuracy of 90–99%. The extracted dataset was further used to train a machine learning model that achieved over 87% accuracy in predicting MOF crystallization. Further, they developed a data-driven MOF chatbot capable of answering chemistry-related queries based on literature-derived synthesis conditions and applied it to linker design for water harvesting applications.143 These studies demonstrate how LLMs can be effectively harnessed for automated knowledge extraction and predictive modeling in chemistry, requiring no coding expertise. This makes them particularly accessible to researchers who may lack coding training.
ChatGPT has also been applied to assist in the design and synthesis of porous materials. For instance, Zheng et al.139 proposed a framework integrating GPT-4 into chemical experimentation to enhance the collaborative dynamic between humans and AI in the synthesis and characterization of MOFs. The system leverages GPT-4's natural language capabilities to streamline complex processes and make design guidance accessible to humans. This collaborative platform is designed to operate in iterative cycles where researchers execute tasks based on GPT-4's suggestions and provide feedback, enabling the model to refine its understanding and recommendations over time. The framework comprises three interconnected phases (Fig. 7b). The first phase, Reticular ChemScope, establishes a detailed research blueprint by breaking the project into manageable activities. The second phase, Reticular ChemNavigator, serves as the central hub, assessing progress and suggesting three possible actions for the researcher to undertake. These suggestions are developed using human feedback, ensuring they align with experimental results. Lastly, the Reticular ChemExecutor offers step-by-step procedural guidance tailored to the selected task, enabling precise execution. The iterative process enables GPT-4 to adapt and learn from both successes and failures, effectively acting as a virtual mentor.
Jablonka et al.144 demonstrated that GPT-3, originally trained on diverse text data, can be fine-tuned for material property prediction. Notable examples involved predicting Henry coefficients, heat capacities, and water stability of MOFs, using datasets as small as hundreds of samples. GPT-3 achieved these predictions with errors lower than conventional ML models in low-data scenarios, which is remarkable.
Another advantage of LLMs in material design is their versatility. LLMs can be fine-tuned for a variety of tasks, ranging from generating textual descriptions of known material structures to predicting the properties of new materials.145 For example, Kang et al.140 developed ChatMOF, a LLM specifically designed for predicting and generating MOFs. They employed ChatMOF as a central coordinator, facilitating appropriate responses to user requests through three main components – an agent, a toolkit, and an evaluator (Fig. 7c). The agent breaks down queries, selects the best approach, and selects an appropriate tool from the toolkit. The evaluator then determines if the results are sufficient or if further refinement is needed. The toolkit consists of four categories: Searcher, retrieving information from existing MOF data; Predictor, using the MOFTransformer146 model to predict desired material properties; Generator, applying a genetic algorithm to create new MOFs; and Utilities, handling general tasks like internet queries and calculations. ChatMOF achieves high accuracy rates by leveraging specialized tools for specific tasks: 96.9% for search tasks, 95.7% for prediction tasks, and 87.5% for structure generation. This model represents a significant step toward greater AI autonomy in nanoporous design.
As a more general tool, Bran et al.141 introduced ChemCrow, a chemistry-focused LLM agent designed to tackle tasks in organic synthesis, drug discovery, and materials design. By integrating 18 expert-developed tools with GPT-4, ChemCrow enhances the LLM's chemistry capabilities (Fig. 7d). ChemCrow successfully planned and executed the synthesis of various compounds, including an insect repellent and organocatalysts, and aided in discovering a novel chromophore. Expert chemists found that ChemCrow outperformed GPT-4 in chemical accuracy, logical reasoning, and response completeness, especially when handling complex problems.
Inspired by these advancements, experimental chemists can begin integrating pre-trained LLM assistants into their lab workflows for tasks such as literature text mining and synthesis planning. For example, Zheng et al.142 used ChatGPT to extract MOF synthesis conditions including temperature, solvent, concentration, and time parameters from published papers without requiring coding expertise, achieving high accuracy through carefully designed prompts. In another study, Zheng et al.139 integrated GPT-4 into the experimental design process to propose actionable synthesis steps and provide step-by-step procedural guidance for MOF preparation. More advanced use cases may involve combining LLMs with lab management or automation tools to suggest experimental designs, plan sequential workflows, or automate documentation, where the LLM acts as an accessible interface translating textual instructions into structured experimental plans, as demonstrated by the ChemCrow framework.141
Despite their powerful capabilities, LLMs pose challenges related to interpretability. The decision-making process within these models is often seen as a “black box,” making it difficult for researchers to understand why a particular material structure was suggested by the model.135 This lack of transparency can be a hurdle in scientific research, where understanding the rationale behind a prediction is often as important as the prediction itself. Furthermore, training and deploying LLMs from scratch is extremely expensive, making it prohibitively costly for most research groups. A common approach is to leverage pre-trained models such as GPT-4.0. However, there are two key points to keep in mind. First, these models are typically trained on publicly available data rather than the full body of scientific literature, which often resides behind publisher paywalls. To adapt them for specific materials design tasks, researchers need to input relevant datasets and conduct meticulous prompt engineering. Second, some of these models are not free and operate on a token-based pricing system, meaning that for research topics requiring extensive materials data or involving multiple complex prompts, the associated costs can become substantial. It is also important to note that when using LLMs for literature-based data mining, one must consider that most published studies predominantly report positive results while omitting negative or less favorable results. This imbalance introduces a “survivorship bias,” potentially skewing the model's understanding of structure–property relationships.147 As a result, the model may overestimate the effectiveness of certain design strategies while overlooking potentially valuable insights hidden in unreported or unpublished data. Addressing this issue requires careful curation of training datasets, including efforts to incorporate negative results from supplementary materials in published articles, preprints, or experimental databases to improve model robustness and reliability.
Some approaches have been adopted to mitigate data limitations. Data augmentation, for example, involves generating new material samples by applying functionalization to existing samples72 or by permuting and combining structural building blocks and topologies to create a vast number of new structures. For instance, the number of possible MOF structures can reach up to 247 trillion.109 This enhances data diversity and improves the generative capability of the models. Similarly, transfer learning leverages pre-trained models trained on large, general-purpose datasets, and adjusting them for specific tasks can potentially reduce the need for extensive data. Accelerating the computation of material properties is another promising direction. This can be achieved by developing faster and more accurate force field-based methods (including machine-learned interatomic potentials) or leveraging machine learning models (surrogate models) for direct and rapid property prediction.149,150
Scalability is another critical factor in applying these algorithms effectively. Diffusion models, while able to generate chemically viable samples, can require significant computational resources when handling large datasets, with the training process taking multiple GPU days.93 GANs are also resource-intensive, particularly during training, although they become more efficient for generating samples once trained. For instance, Dan et al. introduced MatGAN, which was trained on more than 380000 inorganic materials. Once trained, MatGAN reached a novelty of 92.5% and a validity of 84.5% when generating more than 2 million samples, demonstrating the model's efficiency in producing viable materials following extensive training.65 GAs are inherently scalable due to their parallel nature, allowing the evaluation of multiple candidate solutions simultaneously. However, their performance may decrease when working on very large populations or many generations, as the computational cost can become prohibitive. RL can optimize multiple objectives through iterative learning, but the complexity of environments often necessitates considerable amounts of agent interactions with the environment and advanced hardware.152 VAEs are somewhat more scalable compared to GANs, as they can generate new samples even with limited data, though they still benefit from larger datasets for improved performance. LLMs, while highly scalable and able to process large amounts of text data, demand substantial computational resources for training and deployment. As these models grow, the need for resources also increases, which can limit their use for many research groups.
A comparative summary of the strengths and limitations of these six generative AI approaches is provided in Table 2 to guide their selection for different material design tasks.
Generative AI method | Strengths | Limitations/challenges |
---|---|---|
Generative adversarial networks (GANs) | • Generates realistic, high-quality structures | • Training instability and potential mode collapse |
• Effective at modeling complex data distributions | • Difficulty capturing structural diversity in complex materials like MOFs and COFs | |
• Conditional GANs can target specific properties | • Requires large datasets and careful hyperparameter tuning | |
Variational autoencoders (VAEs) | • Smooth and continuous latent space for interpolation and optimization | • May fail to generate valid or realistic structures |
• Stable and efficient training | • Limited disentanglement in latent representations | |
Diffusion models (DMs) | • Effective at learning complex distributions without mode collapse | • Computationally expensive due to iterative denoising |
• Generates diverse and complex structures like MOFs | • Requires large high-quality training datasets | |
Genetic algorithms (GAs) | • No requirement for gradient information | • Convergence can be slow, especially in high-dimensional spaces |
• Effective at exploring vast and discrete design spaces | • May converge to locally optimal material structures rather than the global optimum | |
• Simple concept and relatively easy to implement | • Computationally expensive when combined with simulation-based fitness evaluations | |
Reinforcement learning (RL) | • Enables sequential decision-making for goal-directed design | • Typically requires a large number of samples and evaluations |
• Can optimize multiple objectives and incorporate feedback | ||
• Flexible for integration with experimental workflows | • Designing effective reward functions can be challenging | |
Large language models (LLMs) | • Versatile in tasks such as literature mining, property prediction, and structure generation | • Limited interpretability (“black box” outputs) |
• User-friendly via natural language prompts | • Training from scratch is resource-intensive | |
• Can integrate with other AI models as AI agent or assistant | • Prompt engineering and fine-tuning for specialized tasks can be challenging |
Building on this foundation, the choice of a suitable generative AI algorithm is critical and should align with the specific design task. For instance, DMs are effective for generating high-resolution structures with complex pore architectures, such as MOFs designed for CO2 capture. GAs are well-suited for early-stage exploration of vast design spaces. RL is particularly advantageous for sequential design tasks, as it iteratively refines designs based on feedback. LLMs can streamline literature review, propose initial material structures, and guide synthesis planning based on textual inputs.
As described in the corresponding sections above, different generative models exhibit varying strengths in generating materials with defined target properties (this is sometimes referred to as conditional design or inverse design; in this review, we have simply referred to it as design or material design). A short summary is provided in Table 2. VAEs are well suited for conditional generation due to their continuous latent space, enabling property optimization through latent space navigation.72 GANs can incorporate property conditions through conditional GAN architectures, although training stability remains a challenge.55 DMs can implement conditioning to guide generation toward desired properties but often require large datasets and significant computational resources.80,82 Reinforcement learning inherently supports conditional design by optimizing reward functions defined by target properties, while genetic algorithms impose conditions through fitness functions, acting more as optimization rather than true generative conditioning. Large language models can provide conditional outputs via prompt engineering,142 but their application in directly generating material structures conditioned on quantitative properties is still emerging. Improving conditional generation capabilities across these models will accelerate the effective design of materials with tailored functionalities.
To further enhance the material design process, hybrid and ensemble approaches can be adopted. For example, the MOFFUSION model,81 introduced in Section 2.3, combines the generative power of DMs with the dimensionality reduction and reconstruction capabilities of VQ-VAE, making it computationally feasible for DMs to process high-dimensional data. Likewise, LLMs have recently been explored as powerful tools for the early stages of material design, where they can generate initial material concepts by drawing on patterns from large scientific literature and databases.153 Studies have demonstrated that these models can suggest candidate compositions and synthesis routes,154 as well as assist in property prediction.155 Building on this emerging capability, such initial outputs may be further refined using downstream algorithms like genetic algorithms or diffusion models. This combination can leverage the unique strengths of each algorithm to enable innovative solutions. Additionally, LLMs can be trained as AI assistants capable of making decisions, automating the selection of suitable models, and mining datasets tailored to specific applications.140 These hybrid strategies allow researchers to address complex design challenges more effectively.
Another important limitation of current generative AI models for MOFs and COFs is their restricted ability to generate new topologies. Most existing approaches use topologies from the training dataset, focusing primarily on varying building blocks or functional groups. While this strategy enables the generation of chemically valid and potentially synthesizable structures, it limits the discovery of frameworks with novel topologies, which may become a bottleneck in advancing reticular material design. Future improvements could focus on developing models that integrate topology generation as part of the design process. However, given that mathematicians have identified thousands of topologies, a simpler strategy might incorporate these topologies, which are known mathematically but are new to MOFs.
In addition, a critical task for generative AI methods is careful selection of appropriate descriptors to distinguish one material from another. Defining relevant evaluation metrics for specific applications to ensure accurate and meaningful results is also critical. For example, in adsorption separations, there is often a tradeoff between selectivity, working capacity, and other properties that should be considered. Finally, establishing an iterative feedback loop between AI predictions and experimental or computational validations is essential for refining models and ensuring reliability. Outputs from generative models can be validated using computational methods such as DFT, MD, or GCMC simulations. In addition, integrating experimental workflows allows researchers to verify the performance of AI-generated materials, enabling continuous improvement of the models over time based on real-world data. This iterative refinement process bridges the gap between computational predictions and practical implementation. Currently, experimental validation rates for AI-generated materials remain low, due to synthesis challenges and stability issues. However, there are successful cases, such as the synthesis of MOF NOTT-101/OEt reported by Chung et al.,107 that demonstrate the promising future of AI-enabled materials discovery and its potential to accelerate the design-to-synthesis process. Improving the translation of generative AI outputs into experimentally accessible synthesis procedures and validated nanoporous materials remains a critical task, and it presents an exciting opportunity to integrate AI design with automated synthesis and high-throughput experimental workflows in the future.
Generative AI is shaping new trends in material design, revolutionizing the way we design and discover new materials like zeolites and MOFs. Looking forward, several promising research directions could significantly advance the field of generative AI in material design. One important focus is to improve the interpretability of generative AI models, particularly for LLMs and deep learning based methods. Developing frameworks to explain the reasoning behind generated suggestions will enhance user experience and increase trust in automated design processes. Another exciting direction is integrating generative AI models with experimental workflows in real time, enabling rapid feedback between computational predictions and laboratory results to accelerate material discovery. As these methods become more powerful and user-friendly, they are poised to become a transformative tool to accelerate the discovery and optimization of the next generation of nanoporous materials.
This journal is © The Royal Society of Chemistry 2025 |