Open Access Article
Diego A. Gómez-Gualdrón
*ab,
Tatiane Gercina de Vilas
a,
Katherine Ardila
a,
Fernando Fajardo-Rojas
ab and
Alexander J. Pak
*abc
aDepartment of Chemical and Biological Engineering, Colorado School of Mines, 1601 Illinois St, Golden, CO 80401, USA. E-mail: dgomezgualdron@mines.edu; apak@mines.edu
bMaterials Science Program, Colorado School of Mines, 1601 Illinois St, Golden, CO 80401, USA
cQuantitative Biosciences and Engineering Program, Colorado School of Mines, 1601 Illinois Street, Golden, Colorado 80401, USA
First published on 4th December 2025
This review critically examines work at the intersection of machine learning (ML) and metal–organic frameworks (MOFs). The modular nature of MOFs enables immense design flexibility and applicability to a wide range of applications. However, the combinatorially large design space also stresses the resource-intensive nature of traditional high-throughput screening approaches. Due to the increasing availability of data in the form of experimental and hypothetical MOF structures and their properties, ML methods have emerged as a promising solution to accelerate MOF discovery, yet successful application of these methods will require strategies that maximize data and resource efficiency. This work surveys approaches to reduce data and resource burdens for MOF property prediction and design through feature engineering, model architecture choices, transfer learning, active learning, and generative models. We also discuss challenges related to data quality and scalability, as well as future opportunities for ML-empowered methods that, up to this point, have primarily focused on MOF adsorption properties. By focusing on efficiency at every stage (from data generation to model inference), we identify future pathways for making ML-aided MOF design more robust and accessible to both theorists and experimentalists alike.
Wider impactMetal–organic frameworks (MOFs) are materials with the potential to revolutionize numerous areas of research and technology. MOFs are modular materials combining inorganic and organic building blocks. The premise in MOF research is that there are specific building block combinations that can yield breakthrough-enabling properties. The challenge is thus to identify these combinations out of a vast “design space” spanning trillions of possibilities. Since the early days of high throughput computational screening, artificial intelligence and machine learning (AI/ML) have helped explore this vast MOF design space. However, with the recent explosion of all things AI/ML, there is a lot of excitement about the prospect of AI/ML touching nearly all aspects of MOF design and development, but there are also important questions about where or how AI/ML can make the biggest impact. Aiming to help provide such perspective, this review discusses how AI/ML involvement in MOF research has evolved, but with data efficiency as the guiding underlying theme. Data efficiency is an aspect of ML research in MOFs that has not received much attention and only been implicitly discussed in the past, but that now is coming to the forefront due to the increasingly complex AI/ML models/methods at one's disposal, more ambitious tasks for AI/ML, and the desire to explore new aspects/properties of MOFs. |
For more than a decade, computation has sought to help experimentalists navigate the design space of MOFs by predicting relevant properties for as many “prototypes” as possible, so that lab efforts and resources are only directed towards the most promising ones.11 This paradigm is now pervasive in materials science and is known as high throughput computational screening (HTCS). The early vision for HTCS was to exploit the “boom” in computational power to simply automate the prediction of MOF properties using “standard” prediction methods underpinned by classical, quantum, and statistical mechanics (e.g., molecular simulation). However, with so many prototypes, candidate applications, and operating conditions for each application to consider, it became clear that inherently faster prediction methods were needed. Not surprisingly, efforts to predict MOF properties via machine learning (ML) started to emerge soon after the first prominent efforts in MOF HTCS came to light.12–14
Along with other developments in artificial intelligence (AI), the success of AI/ML tools such as ChatGPT is arguably reshaping society, increasing awareness about AI/ML among the broader public, and creating a sense that maybe “anything” is possible with AI/ML. This “hope” surrounding AI/ML has also extended to the field of computational development of materials in general, and MOFs in particular. However, it is important to recognize the “special” circumstances around the development of ChatGPT. For instance, GPT-3 and GPT-4, i.e., the large language models (LLMs) under ChatGPT's “hood,” are believed to have been trained on (at least) 300B tokens (i.e., text-based “data points”) using a large cluster of GPUs and costing over millions of US dollars. This is a scale of data and resources that academic research labs do not routinely have access to. For instance, the most ambitious property prediction efforts in MOFs have usually hit a “wall” at around one million structures, even for relatively inexpensive properties to predict, such as methane adsorption or void fraction.15 In other words, while the development of AI/ML is “hungry” for data and resources, academic research labs in the MOF field (and across materials science in general) must adapt to circumstances of data and resource “scarcity.”
With the above in mind, let us note that this review does not aim for an exhaustive listing of the numerous ML efforts that have been reported to date in the MOF field. Rather, this critical review aims to highlight ML efforts in a way that showcases the lead up to current strategies to maximize data and resource utilization efficiency for MOF development. Broadly speaking, these strategies tend to impact one or more of three phases of the ML-based discovery pipeline: (i) the data processing phase, which pertains to the acquisition and preparation of data to be fed to the ML model, (ii) the model training phase, which pertains to the selection of model architecture and the training approach, and (iii) the materials discovery phase, which pertains to the utilization of the ML model to explore the MOF design space. Accordingly, Fig. 1 provides an overview of how the topics discussed in different sections in this review relate to these phases.
| y = a1x12 = f(x) | (1) |
transformed by a set of basis functions, e.g.:
![]() | (2) |
In the case where f(x) is given by eqn (1), knowledge that f(x) only depends on x1 – and what x1 looks like – is an example of feature engineering. On the other hand, knowledge that the functional form of f(x) depends on x2 is analogous to model architecture engineering. Either case is an example of inductive bias that narrows the range of possible ML model solutions, requiring domain knowledge to impose useful assumptions.
Over the last two decades, researchers have investigated how to design ML models for efficient screening of MOFs mostly based on property predictions related to gas storage (e.g., CH4, CO2, H2) and separations (e.g., CO2/H2, CO2/N2); interested readers can refer to in-depth reviews16,17 on this topic. As our focus is on the data efficiency gained through feature and model architecture engineering, we will limit our discussion to the context of CO2 adsorption predictions for MOFs, which is one of the prediction tasks that has remained active since the early MOF days until now. Since features and model architectures are inherently coupled, we distinguish ML models that focus on global statistics from those that focus on local (usually microscopic) statistics.
![]() | ||
| Fig. 2 (a) Examples of local and global features. The depicted local feature is the revised autocorrelation (RAC for startscopeZdepth) function that quantifies the discrete correlation of Z (electronegativity, nuclear charge, topology, covalent atomic radius, and identity for χ, Z, T, S, and I, respectively) between atoms separated up to depth l. The start index (ligand-centered, metal-centered, and full for lc, mc, and f, respectively) refers to the reference of the RAC summation and the scope index (axial, equatorial, and all for ax, eq, and all, respectively) refers to which neighboring ligand atoms are included in the summation. Reprinted with permission from ref. 24. Copyright 2017, American Chemical Society. The depicted global features show commonly used geometric descriptors. Adapted with permission from ref. 25. Copyright 2020, American Chemical Society. (b) Comparison of machine learning model performance using different architectures and features for CO2 adsorption using the (left) CoREMOF and (right) hMOF datasets. In general, increasingly complex models (from MLPs trained on geometric features (GEO_MLP) to transformers trained on chemical and positional encodings (Matformer)) that learn “long-range” spatial correlations between local features tend to improve model performance. Adapted from ref. 26. | ||
Across many studies,20,21,23 one common theme has been that the relationship between adsorption properties and global statistics is expectedly nonlinear, as evident by improved predictions using support vector regressors, decision trees (and related methods), and artificial neural networks (ANNs) compared to linear regression. However, while the fidelity of CO2 adsorption predictions tends to be high at the upper end of tested pressures (0.86 < R2 < 0.96 at pressures greater than 2 bar),18,19 predictions at lower pressures have had room for improvement (0.69 < R2 < 0.84 at pressures below 0.1 bar).19,22 One direction to improve ML performance is to focus on descriptors that characterize higher-resolution information about MOFs, which we describe next. Nonetheless, we emphasize that one benefit of global statistics features is for interpretability, which ultimately informs design principles (e.g., defining structure–property relationships) for MOFs.19,22
Along these lines, several researchers have aimed to leverage the molecular structure of MOFs in other ways to improve model performance, leading to the adoption of higher-dimensional features and more varied model architectures. The three-dimensional structure (and atomic properties) of MOFs can be voxelized into a three-dimensional (3D) discrete space, with local nonlinear correlations learned through a 3D convolutional neural network (3D-CNN). Froudakis and coworkers30 demonstrated this approach using a voxelized potential energy surface describing the Lennard-Jones interaction between a probe atom and the framework, also showing that the 3D-CNN required two orders of magnitude less data compared to a RF model with geometric features to achieve comparable performance. Relatedly, Lin and coworkers31 showed that 3D-CNNs trained using voxelized features containing Lennard-Jones parameters and partial charges are useful for CO2 adsorption screening.
Alternate model architectures have been proposed that still aim to leverage the structure of MOFs with reduced memory requirements. One approach32 is to featurize the MOF as an unstructured point cloud described by Cartesian coordinates and any atomic properties of interest (e.g., atomic number, electronegativity, van der Waals radius, etc.). Predictions are trained through the permutation-invariant PointNet architecture,33 which extracts point-wise features through MLPs before applying global pooling, and this approach has been shown to improve CO2 uptake predictions at low pressure compared to conventional geometric features. Others have opted to directly enforce local structural correlations by representing MOFs as graphs with atoms as nodes and bonds as edges. Reported graph neural networks, such as the crystal graph convolutional neural network (CGCNN)26 and the atomistic line graph neural network (ALIGNN),34 use atomic properties (e.g., electronegativity, valence electrons, covalent radius, etc.) as node features and bond distances as edge features, then learn how to predict properties via message passing along the graph topology. However, as shown by Cui et al.,26 GNN model learning is biased toward local structural characteristics and CO2 adsorption predictions can be enhanced through learning from global structural awareness (Fig. 2b), e.g., using the attention mechanism popularized by the transformer model35 (discussed further later). In summary, while increasing the input space dimensionality through local statistics is a promising strategy, it is important to also identify the proper model architectures that bias learning towards the types of feature relationships (e.g., spatial correlations) one believes is most relevant for the prediction task of interest.
In 2017, the first exploration of transfer learning for MOFs was reported by Ma et al.40 These authors studied to what extent transfer learning was possible with the prediction of H2 adsorption loadings at high pressure/temperature as the source task, and prediction of H2 adsorption loadings at high pressure/low temperature, CH4 adsorption loadings, and Xe/Kr selectivity as the target tasks. Five simple MOF textural traits were used as model inputs, and target task datasets were about ten times smaller than the source task dataset. All models shared the same MLP architecture, which consisted of two hidden layers. Transfer learning was formally done by keeping the parameters up to the first hidden layer of the target task model the same as in the source task model and optimizing the parameters of the second hidden layer and output layer (Fig. 3a). Indicative of the importance that the source and target prediction tasks are governed by similar MOF traits, the computer-engineered features emerging from the source task proved useful for the H2 and CH4 adsorption prediction target tasks, which averaged R2 values of 0.991 and 0.980, respectively, but not so for Xe/Kr selectivity prediction, for which R2 values averaged around −0.092.
![]() | ||
| Fig. 3 (a) Schematic representation of transfer learning. First (left), an artificial neural network is trained on a source task with a source dataset. Then (right), the parameters of the hidden layers are frozen except for the final hidden layer, which is trained using a target task and target dataset. Adapted with permission from ref. 40. Copyright 2020, American Chemical Society. (b) The pipeline of the self-supervised MOFormer model for representation learning. The tokenized MOFid representation is embedded and augmented with a positional encoding before entering the transformer encoder layers (see right for a schematic of these layers). The learned embedding of the first token is to be used in downstream prediction tasks. Adapted from ref. 42. (c) Pretrained models (here, MOF transformer and ChemBERT) are used as inputs to downstream prediction tasks. For the prediction of proton conductivity using neural networks, input pretrained representations are augmented with embeddings for temperature and relative humidity and only the neural network and embeddings are trained (the pretrained models are frozen). Adapted with permission from ref. 43. Copyright 2024, American Chemical Society. | ||
In 2023, Cooper and Colón41 further examined the efficacy of transfer learning between H2 and CH4 adsorption prediction tasks from the perspective of the similarity (based on either textural properties or topologies) between the MOFs in the source and target task datasets. Not surprisingly, transfer learning worked better (i.e., higher accuracy, smaller dataset size requirements) when the MOF datasets used for the source and target tasks were more similar, e.g., as measured by distance in principal component space. But more interestingly, these authors found CH4 adsorption (and some MOF datasets) to work better as the source task (and as the source MOF dataset) compared to that of H2 adsorption. Thus, their work underlines the importance of choosing source tasks and MOF datasets that are informative for the target tasks, although guidelines to accomplish this goal are not well-established.
Although in the previous examples, the “transfer of knowledge” was done sequentially and explicitly, this transfer can also occur simultaneously and implicitly through multitask learning (MTL). In MTL, which is usually done with neural networks, a single model is trained on various tasks. The first n layers of the model are shared by all the tasks, resulting in internally generated “shared” features that feed into subsequent independent layers, which take each prediction task to completion. In one recent example, Zhang et al.44 showed MTL to result in a more accurate CGCNN to predict various MOF stability metrics (e.g., water and thermal stability, among others) compared to any CGCNN (or any other model) trained on a single stability metric.
MOF atoms and bonds have been originally presented to transformers based on graph-like representations of the whole MOF (MOF transformer, Uni-MOF) or a representative MOF unit (MOFnet). Atom identities and bond topologies are also present in string representations, such as the SMILES of MOF secondary building units used in MOFid (MOFormer). Additionally, complementary global features meant to summarize pore structure have been added to the transformer either directly (e.g., MOFnet with void fraction, surface area, largest pore diameter, and other textural properties) or indirectly (e.g., MOF transformer with flattened representations of adsorption energy grids created via molecular mechanics calculations within the MOF unit cell).
Transformers are well-suited to create foundation models because they easily allow the creation of “data-abundant” self-supervised learning tasks that allow each MOF atom and/or bond feature, through the trainable attention operations, to focus on understanding the “context” in which they exist within the MOF. For example, the attention mechanism in the transformer could be trained to predict the identity and/or properties of a masked (i.e., hidden) atom given the identity and/or properties of other atoms in the MOF (as in MOFormer, see Fig. 3b). Nevertheless, supervised learning tasks can also be added to further influence what aspects of their environment atoms and bonds pay more attention. For instance, looking for the influence of MOF global structural aspects, MOF transformer used predictions of topology and void fraction as part of the transformer training, where the prediction of multiple properties by the model indicates the exploitation of the MLT approach discussed at the end of Section 3.1.
All the above transformers have shown promise as a starting point for new tasks. In 2024, Han et al.43 kept MOF transformer as-is in a new model (i.e., transfer learning), enabling predictions of proton conductivity about 8% more accurate than training standard ML models from scratch (Fig. 3c). This work additionally suggests a transfer learning strategy with a lot of potential but not much explored up to date. Namely, the transfer of knowledge from models trained on simulation data to those trained on experimental data. The premise here is that models trained on experimental data are much more appealing, but that generating a data point from experiments is generally more costly and time-consuming than generating one from simulation.
Transformer parameters can all undergo optimization (initialized with the original parameters) for a new task in what is referred to as fine-tuning. In 2024, the Uni-MOF transformer was used as part of a ML model to predict adsorption in multiple molecules. The authors showed that fine-tuning the Uni-MOF part (as opposed to training the whole ML model from scratch), led to about 18% increase in accuracy. Nonetheless, fine-tuning of the current MOF transformers can still be outperformed by training of standard ML models using wisely chosen input features, as recently shown by Mao et al.48 for predicting free energy in a set of polymorphic sulfur-based MOFs. This suggests there is still room for developing MOF transformers that generalize better upon fine-tuning. Additionally, some of the current MOF transformers require significant work/expertise/preprocessing to generate their inputs, which hinders their widespread use as a foundation model.
Thus, in 2022, Mukherjee et al.51 reported the first exploration of AL in MOFs, focusing on predicting CO2 and CH4 adsorption, respectively, in Cu-BTC. Then, in 2023, these authors expanded their efforts to the prediction of CO2/CH4, Xe/Kr and H2S/CO2 mixture adsorption, respectively, in the above MOF.52 One of the points made in these works was the influence of the initial dataset on final data savings. Interestingly, these authors reported boundary-informed sampling as the best way to choose the initial data points, which is a strategy where heuristics and human expertise can have a significant impact.
The potential data savings AL can achieve are apparent in the 2024 work by Osaro et al.,49 which was directed to the prediction of adsorption isotherms for multiple molecules in MOFs using a single ML model (Fig. 4a). These authors examined the data requirements to train a ML model that uses pressure along with MOF and molecule features to make the relevant adsorption predictions. These authors reduced the training dataset size by a factor of about 2 when using AL to select the most informative (pressure, adsorbate) combinations for each MOF. A further reduction by a factor of about 500 was reported when AL was used to select the most informative (pressure, adsorbate, MOF) combinations, albeit with some loss in prediction accuracy.
![]() | ||
| Fig. 4 (a) Schematic of an active learning workflow for alchemical adsorbates. A Gaussian process (GP) regression model is trained on an initial dataset to predict adsorption loading from five input features. The data point from the test set with the largest predicted GPR uncertainty is selected for adsorption calculation then added to the training dataset. The model is retrained and the loop continues until the uncertainty is below 0.05 mol kg−1. Adapted from ref. 49. Copyright Royal Society of Chemistry. (b) Schematic of the regression tree active learning (RT-AL) workflow. During each cycle of training, new samples are selected via regression tree leaves with high uncertainty (based on variance) and the ratio of unexplored data points. A separate random forest is trained using the tailored training set for MOF property prediction. Reprinted with permission from ref. 50. Copyright 2024, American Chemical Society. | ||
In all the above-mentioned works, the AL cycle (i.e., selection of training points) was driven by Gaussian processes (GPs), even if in some cases the final trained ML model was not itself a GP. Because GP predictions are inherently accompanied by a measure of uncertainty, GPs are a natural choice for AL in many fields. However, GP training becomes computationally intractable after a few thousand data points, which probably means that widespread application of AL in MOF research will require the exploration of GP alternatives that scale better with the number of training points.
The quantification of uncertainty by repeating predictions with the same input can be extended beyond NNs. Thus, Leverant et al.55 used AL towards the prediction of MD-calculated diffusion coefficients using RFs as the core ML model, and the variance of the predictions from the different trees as the measure of uncertainty. These authors observed the usual improvement in accuracy as training points were added. However, as a reminder that training datasets can be too small even for an AL framework, these authors ran out of training data before desirable accuracies were reached.
An alternative method coined regression tree AL (RT-AL) uses a regressor tree (as the core model) that divides the putative feature space into regions, each one associated with a tree leaf (Fig. 4b). The prediction uncertainty for (potential and extant) training points in a given region corresponds to the variance associated with the corresponding leaf. The acquisition function selects a region based on its associated uncertainty and proportion of unexplored points and then randomly draws points from it. As shown by Jose et al.,50 an advantage of RT-AL is that one can use the regressor tree to select training points, but then train a more powerful ML model (RFs for these authors) for the actual MOF property prediction task. Working on the prediction of band gaps and CO2 and H2 adsorption, these authors found RT-AL to usually outperform GPs and other AL methods to efficiently construct the training set. An interesting byproduct of this work was a clear demonstration that the most efficient features for AL (and ML model training) can depend not only on the property to be predicted but also on the size of the training set. This work serves as an important reminder that AL selects training points by navigating a feature space with an efficacy that (at this point) is still contingent on the chosen input features.
In 2015, Bao et al.59 introduced an EA to MOFs by evolving MOF linkers toward high CH4 adsorption, using reaction-mimicking genetic operations. In 2016, Collins et al.60 evolved MOF functionalization towards high CO2 adsorption, while Chung et al.61 evolved MOFs toward high CO2/H2 separation. The latter authors experimentally validated the high predicted performance of an EA-identified MOF, which was found by exploring less than 1% of the target search space. In all of the above studies, the fitness function was assessed using grand canonical Monte Carlo (GCMC) simulations, which can be a rate-limiting step that restricts the total number of generations explored. However, easily computed surrogate models that approximate fitness can dramatically improve throughput. To this end, in 2021, Lee et al.10 combined EA and ANN predictions (as a surrogate for fitness) to explore a presumed search space of 247 trillion MOFs towards high CH4 adsorption. Note that while EAs have hyperparameters, the above studies did not focus on their optimization, but rather on finding incrementally better MOFs than those reported at the time for the application of interest. Thus, there is significant room to improve the efficacy of EAs for MOFs.
Recently, exploring EA efficacy, Pham and Snurr56 studied hyperparameter effects on the search of MOFs for CO2/N2 separation (Fig. 5a). Indicative of the importance of balancing exploitation and exploration in EAs, these authors found the probability of mutation to drastically impact search efficiency. Additionally, supported by a 25-fold reduction in computational cost, these authors proposed the execution of parallel EA runs, each with different initial MOF populations, as a way to improve EA efficiency. Nevertheless, an unsolved issue in EAs for MOF search is the restrictive rules needed to avoid attempting to make nonsensical structures, which hinders pairing EA with on-the-fly MOF construction. A common source of “nonsense” is the incompatibility of EA-proposed building blocks and topology combinations. Thus, a common solution is to restrict EA runs to a particular topology10,56 or base structure.62 This creates inefficiency as MOF topology (a critical MOF trait) is not optimized by the EA, and also precludes the discovery of new MOF topologies.
![]() | ||
| Fig. 5 (a) Schematic of a genetic algorithm (a form of evolutionary algorithm) workflow. A series of candidate MOFs are constructed, each represented as a chromosome with labels for topology, edges, and inorganic/organic nodes. After each generation, new candidates are proposed using evolutionary rules (i.e., mutation, crossover, and tournament selection). The process is repeated until the specified objective (e.g., MOF performance) is achieved. Reprinted with permission from ref. 56. Copyright 2025, American Chemical Society. (b) Overview of Bayesian optimization strategy to maximize adsorption property f(x) of nanoporous materials. After evaluation of f(x) for the current candidate, a surrogate model with uncertainty (e.g., Gaussian process) is updated and the next candidate material is selected via an acquisition function, e.g., the candidate that maximizes the upper confidence bound of the surrogate model. Used with permission of Royal Society of Chemistry, from ref. 57. (c) Reinforcement learning framework for property-guided MOF generation using MOFGPT and MOFormer. The reward function assesses the quality of the generated MOF via validity, novelty, diversity, and proximity to the target property. The reward is also used to update the policy model, which then selects the next MOF candidate. Reprinted from ref. 58. | ||
In 2022, Taw and Neaton64 presented the first BO example in MOFs, showing that BO would have found the best MOFs for CH4 adsorption evaluating fewer than 1% of the target search space. However, standard BO (and standard EAs for that matter) may not account for all aspects relevant to MOF development. For instance, various (potentially conflicting) MOF properties may be important for a MOF application. Thus, in 2023 Comlek et al.65 presented a multiobjective BO framework for MOFs, looking to improve the Pareto front that highlights the tradeoff between CO2 uptake and selectivity. Their key modification was to use the expected maximin improvement (EMMI) acquisition function, which chose MOFs for evaluations seeking to improve one of the two objectives doing worse at decision time. Another consideration is that experimental testing of a presumed best MOF design can fail due to unaccounted for factors, e.g., the structure may not be stable or the prediction may be wrong. Thus, in 2024, Liu et al.66 developed Vendi BO, aiming to find MOFs with similar (presumed) optimal performance but with different structure and chemistry. Their key modification was to use the Vendi score (a measure of diversity)66 to iteratively eliminate parts of the search space that were too similar to the set of MOFs already evaluated.
As with EAs, there is significant room to improve the efficiency of BO in MOFs through hyperparameter choices53,67,68 (e.g., which acquisition function is used) or making the predictive ML framework (i.e., surrogate model) more accurate. Due to its robustness, a common acquisition function in MOF search is the upper confidence boundary (UCB), which balances uncertainty with the improvement of the predicted property by adding some of the (positive) uncertainty to the property prediction for a given MOF. But as the impact of acquisition function choice is underexplored, other functions may be more efficient. For instance, Aqib et al.53 showed expected improvement (EI) to outperform UCB. EI is a function that focuses more directly on improving the property as fast as possible, with the caveat of needing a highly reliable surrogate model. On the other hand, functions such as the previously mentioned EMMI and expected hypervolume improvement (EHVI)—which consider the pareto front of MOF properties—may facilitate multi-objective MOF optimization despite their higher computational cost.
As for the accuracy of the surrogate model, it is inherently tied to MOF feature choices but can also be improved by exploiting similar ideas to hierarchical screening, allowing the ML model to see more data. For instance, Gantzlet et al.69 applied multifidelity BO to the search of MOFs for Xe/Kr separation, training the ML framework with many cheaply acquired selectivities based on Henry's constants and fewer expensive selectivities based on adsorption loadings.
Leaning more into the ML side, reward-based methods such as Monte Carlo tree search (MCTS) and reinforcement learning (RL) are also emerging as alternatives to search MOFs. These methods seek sequences of MOF modifications that lead to optimal MOFs, with modifications that tend to result in higher “rewards” tending to be favored (some randomness is allowed to balance exploitation with exploration). Zhang et al.71 used MCTS to find hydrophobic MOFs for CO2 capture. In MCTS, each path through a tree represents a sequence of MOF modification decisions. To construct the trees (i.e., sequence of decisions), these authors predicted the reward (essentially a MOF performance metric) using a recurrent neural network (RNN) to process the SMILES strings used to define MOF linkers.
Kim et al.72 used RL to search MOFs for CO2 capture from air. In their RL framework, the MOFs were represented as a sequence of categorical variables (metal node and topology) and linker SMILES. Candidate MOF representations were generated by a transformer model, which along with a policy-gradient algorithm, acted as the decision-making agent. During the process, the agent decided on the strings to add to the MOF representation to maximize the corresponding predicted reward (either proportional to CO2 heat of adsorption or to CO2/H2O selectivity). During RL (Fig. 5c), through policy updates, the agent learns to make “good decisions.” The rewards were predicted by corresponding neural networks, each using as input the embedding of the MOF representation learned by the transformer. Promising MOF representations found to be “valid” were turned into actual MOF computational prototypes for which properties were calculated by molecular simulation. RL was clearly shown to propose increasingly better MOFs, with the caveat that the requirement of simulated property data for ∼30k MOFs (stated by the authors as necessary to have the predictor ready to initialize RL) may pose challenges for some properties.
As for experimental MOFs, a popular source are the CoRE MOFs, first reported by Chung et al. in 2014,80 and featuring ∼40k structures in a recent update.81 CoRE MOFs are processed versions of MOFs extracted from the more general CSD database82 (which includes its own subset of ∼10k computation-ready MOFs).83 CoRE MOFs are relatively diverse in topologies and inorganic nodes but are not systematically modified (e.g., in functionalization), which may result in “lots of classes but few examples per class,” hindering ML model generalizability. Still, experimental MOFs are appealing because of the (presumed) barrierless transition between computational screening and experimental testing, despite practical concerns such as their general instability (e.g., some authors estimate only 384 of the original CoRE MOFs are stable).84 Perhaps, “best-of-both-worlds” efforts aggregating hypothetical and experimental MOFs, such as in the ARCMOF database (Fig. 6a), are a wise strategy going forward.
![]() | ||
| Fig. 6 (a) The diversity of MOFs in databases is varied. The top panel shows the probability density of accessible volume fraction and gravimetric surface area across MOFs in each of the listed databases; the numbers indicate the size of each database. Adapted from ref. 89. The bottom panel shows the diversity in organic ligands (green bars) and in metal-centered substructures (pink bars) present in the ARC-MOF dataset. Adapted with permission from ref. 87. Copyright 2023, American Chemical Society. (b) Mining experimental data on solvent removal and thermal stability of MOFs from the literature, as implemented in the MOFSimplify framework. Sanitized MOF structures from the literature and filtered for featurizability, and their associated manuscripts are retrieved and prepared for natural language processing. Text mining is then used to extract mentions of solvent removal stability and thermogravimetric analysis (TGA) data, including digitization of TGA traces from documents containing relevant keywords. Reprinted from ref. 93. | ||
Although property data for experimental MOFs can come from computation or experiment, most of it is also computational adsorption data and DFT-calculated partial charges.88,89 Some efforts breaking with this common trend are the ∼20k QMOFs by Rosen et al.,90 which include DFT-calculated band gaps (among other electronic properties), and the NIST/ARPA-E database for experimental adsorption data.91 As typical with experimental data, the latter is less systematically varied but has a much wider diversity of “classes” covered than usually done by simulation, such as for more adsorbates, pressures, and temperatures.92 The NIST/ARPA-E effort is, however, an example of imminent low-cost opportunities to create repositories for other experimentally measured MOF properties by mining reported data from the MOF literature.
Earlier NLP efforts in MOFs were primarily rule- and pattern-based. For instance, in 2017, Park et al.96 developed a rule-based text mining algorithm that identified surface area and pore volume values by scanning for associated units like “m2 g−1” and “cm3 g−1.” Despite the simplicity of the rule, the method achieved ∼88% accuracy, with most errors stemming from inconsistent formatting or ambiguous naming conventions.
A recurrent, primarily rule-based, NLP tool is ChemDataExtractor,97 which was used recently to extract MOF synthesis data for the DigiMOF database,98 water stability information for the WS24 dataset,99 and synthesis procedures for ZIF-8.100 However, Glasby et al.98 only managed to extract synthesis data for 9705 MOFs out ∼15
000 MOF candidates, whereas Manning and Sarkisov100 extracted data from only ∼20% of the reports, despite their narrow focus on ZIF-8. Relatedly, Terrones et al.99 used the tool to identify candidate sentences in articles for 1092 MOFs out of 5489 articles tied to the CoRE MOF 2019 database, but had to perform manual review to assign water stability classifications to these 1092 MOFs. These cases collectively reflect the Achilles’ heel of rule-based NLP methods: the lack of standardized language in synthesis reporting.
Most recently, ML has been brought into NLP of MOF literature, recognizing the large variability in reporting language. The ML model tends to be in the form of RNNs or transformers, whose sequence-awareness and self-attention mechanisms, respectively, allow them to create context-aware representation of words/tokens. For instance, Nandy et al.93 used Stanza,101 an NLP toolkit based on RNNs, to help analyze nuances in sentences previously processed with ChemDataExtractor as containing information on stability to solvent removal, which was used to generate training data for an ML model predicting this MOF quality (Fig. 6b). In another instance, Park et al.102 complemented rule-based tools with the training of a named entity recognition (NER) model based on SciBERT,103 a transformer-based language model pretrained on scientific text, to mine data for some MOF synthesis aspects from 28
565 publications. However, this effort required extensive manual labeling of hundreds of literature paragraphs.
Literature extraction has heavily focused on MOF synthesis data. The appeal is the quantity of data (after all, every experimental MOF paper should report a synthesis procedure) and the potential use of the data to train ML models to anticipate synthesis outcomes,104 which is crucial to bridge the gap between computation and experiment. But as language variability is exacerbated in synthesis reporting, LLMs such as GPT-4 are emerging as powerful literature extraction tools. Thanks to their immense pre-training, LLMs are better positioned to recognize synthesis procedures with little or no fine-tuning.
To this end, Zheng et al.105 focused on prompt engineering, finding the “right way” to ask GPT-4, so that the LLM would accurately extract and organize synthesis data. Although the approach achieved high accuracy in extracting specific synthesis parameters (with F1 scores of 90–99%), it was intentionally limited to a fixed set of details (such as solvents, temperatures, and precursor amounts) formatted into tables, which constrained its ability to capture more nuanced or varied synthesis descriptions. Building on this prompt-driven approach, the L2M3 (large language model MOF miner) framework106 used a series of GPT-based models to extract a broad range of synthesis conditions and material properties from over 40
000 MOF articles. While it primarily relied on updating prompts to adapt to new tasks, L2M3 also incorporates light fine-tuning for specific tasks within its pipeline to improve performance. This combination improves consistency and task-specific accuracy across large-scale, multi-step extraction workflows, addressing limitations in robustness that pure prompting can face.
Despite persistent challenges with inaccurate or inconsistent reporting, NLP extraction has shown promise by producing unified, large-scale MOF datasets that have been actively used to train ML models predicting synthesis outcomes and material properties.
Other concerns with respect to computed data stem from the calculation method, whose choice is primarily driven by the goal of facilitating large-scale data generation. For instance, for adsorption data, the predominant use of generic force fields (e.g., UFF for MOF atoms) to describe adsorption interactions raises concerns, especially when the key adsorption interactions involve chemisorption. It has been possible to derive DFT-parameterized force fields to properly model particular MOF-adsorbate combinations,110–112 but approaches to correctly describe chemisorption interactions during HTCS are needed. For electronic MOF properties, the concern is tied to the use of DFT as the workhorse to generate data, because strictly speaking, DFT is not adequate to model MOF metals. Still, DFT may be acceptable for certain properties such as partial charges and adsorbate binding energies, but more worrisome for properties such as band gap, which DFT is well-known to underestimate (although somewhat systematically across similar materials).90 The case of MOF electronic properties truly underscores the data scarcity issue in MOFs. Accurate electronic calculations via quantum mechanical calculations are so expensive in MOFs that alternatives such as ML models are truly desired. Yet such ML models are not easily trainable because the training data is so expensive to obtain.
Relevant to literature extraction, experimental data is not free from concerns, which primarily arise from the variability in quality of both MOF samples and property measurement methods across labs. For example, the variability in reported Brunauer–Emmett–Teller (BET) surface areas for the same MOF may be reflective of material quality variations.113 But regardless of the reasons, variability in measured properties is apparent, for instance, when examining differences across experimentally measured isotherms for the same MOF.114 The obvious question is then: “what is the correct measurement to use for ML?” Empirical correction factors based on perceived MOF quality (as those used by some authors to fairly compare measured and simulated isotherms115,116) might be a first step towards unifying experimental data for a given MOF. But given the importance of mining experimental data from the literature to create low-cost datasets, efforts to standardize reported experimental measurements should be beneficial for the ML endeavor in MOFs.
Chemistry-agnostic models are also compatible with synthetic data generation, which can be used to bypass the data generation bottleneck incurred when one must first find the simulation model parameters that accurately describe a specific chemistry. For the adsorption case, instead of running expensive DFT calculations to fit a force field, one may focus on producing large simulation datasets with a variety of simulation parameters. Moreover, the decoupling from specific chemistry also allows one to choose parameters that are most informative to let the ML model learn more efficiently. Anderson et al.25 used this strategy to create “alchemical” molecules to train a ML model capable of predicting single adsorption isotherms for a variety of real molecules. Fanourgakis et al.122 extended this idea to the creation of artificial MOFs to train a ML model to predict CH4 adsorption. The accuracy achieved in the above works can be partly explained by the synthetic data boosting the interpolation capabilities of the ML models. Nevertheless, the generation of synthetic data beyond MOF adsorption properties is yet to be explored.
For instance, for models that use voxelization of a cubic MOF region as input to 3D-CNNs, this footprint increases as O(L3), where L is the length of the cube. In practice,31,120 the spatial resolution (i.e., voxel size) limits L up to around 3 nm as atomic information intuitively requires ∼1 Å resolution; larger unit cells (e.g., MOFs with lattice dimensions that can go up to ∼170 Å)123 will likely require loss in resolution. For models based on GNNs, this footprint increases as O(n,d2) where n is the number of nodes and d is the size of the feature vector embedded in each node. As nodes usually correspond to MOF atoms, unit cells can reach up to tens of thousands of atoms while embeddings usually have dimensionality on the order of tens to thousands. A look into reducing training costs of GNNs was given by Korolev and Mitrofanov,124 who trained coarse-grained GNNs to predict various MOF properties with promising accuracy. These authors coarse-grained the model by basing the MOF graph on the corresponding topological template and using the pre-established mol2vec embeddings of molecules to indicate which MOF building block was occupying a given graph node or edge.
An example of a generative model is variational auto-encoders (VAE), which are made of an encoder and a decoder. VAEs usually use a neural network as the encoder to learn a continuous representation of, say, MOFs as vectors in a so-called latent space while the decoder (also usually a neural network) is trained to reconstruct, in this case MOFs, from their representation in latent space. A ML property predictor can then be coupled with the encoder to learn the relationship between the latent vectors and the property of interest. With these elements in place, one can simply optimize the latent vector based on the property of interest and use the decoder to recover the corresponding optimal MOF. Yao et al.125 demonstrated the use of VAEs to optimize MOFs to separate CO2/N2 and CO2/CH4 mixtures. As input to the VAE, these authors used a MOF representation based on categorical variables to constrain topology and modular building blocks and self-referenced embedded strings (SELFIES) to represent connecting building blocks.
Denoising diffusion probabilistic models (DDPMs) are also generative models that learn to predict valid MOFs out of “noise.” To train DDPMs, one first iteratively adds (usually Gaussian) noise to valid MOFs until the original representation is reduced to pure noise. Then a neural network (SE(3)-equivariant) learns to reverse the process (i.e., denoise) to recover the valid MOFs. The trained DDPM can thus learn to generate valid MOFs out of randomly sampled noise. Although not used for inverse design, Park et al.126 used a DDPM to generate new linkers but constrained to the isoreticular MOF series, whereas Duan et al.127 expanded this approach to also generate nodes (although constrained to four topologies) and showed the validity of generated structures by synthesizing one of them.
Crucially, Fu et al.128 and Park et al.129 demonstrated the amenability of DDPMs for inverse design by conditioning the learning of the denoising process on a property of interest, which was leveraged to have DDPM generate MOFs with optimal values for said property. Fu et al.128 focused on CO2 adsorption (a numerical property), while Park et al.,129 by jointly training the diffusion model on conditional and unconditional tasks, showed that conditioning can also be done on categorical properties or text-input without training an external classifier (Fig. 7a). The inverse design of Park et al. was focused on “pore surfaces,” which were matched (and thus constrained) to pre-existing building blocks and topologies. On the other hand, Fu et al.128 used a coarse-grained representation of the MOF, in principle generating positions for building block centers unconstrained by topological templates, but still indirectly constrained by the validity of the structure once actual building blocks are denoised and mapped onto the building block center positions.
![]() | ||
| Fig. 7 (a) Inverse design of MOF via conditional diffusion model. Schematic of a general diffusion architecture that uses single and/or multi-modal conditioning to guide structure generation. The pair encoder–decoder learns to denoise MOF structures guided by the conditioning criteria while an external model transforms the MOF representation to a material structure. Adapted from ref. 129. (b) Machine learning force fields (MLFFs) bridge the accuracy of ab initio methods with the efficiency provided by classical force fields. This combination enables fast and reliable approximations of potential energy surfaces (PESs) to unlock the study of multiple phenomena (e.g., electronic effects, thermodynamics, and reactions). Adapted from ref. 131. (c) The prediction of MOF heat capacities using machine learning models. The left panel compares DFT-computed heat capacities (circles) to those from the classical universal force field (UFF, lines); the dashed lines are results from the metal-linker force constants in UFF scaled by the listed factors. The inconsistent heat capacities computed from the classical model motivate a machine learned model (middle panel) based on the contribution of each atom to the total heat capacity. The correlation between the machine learned model predictions and DFT calculations are shown in the right panel. Adapted from ref. 132. Copyright 2022, Springer Nature. | ||
Seeking higher data-efficiency by eliminating the large-scale pre-training phase of VAEs and DDPMs, Cleeton and Sarkisov,130 in what seems an evolution of the naïve approach of optimizing inputs in a property-predictive ML model, proposed deep dreaming (DD). Thus, these authors first trained transformer-inspired ML models to predict target properties. Then DD was applied by freezing the model parameters, reversing the propagation direction, and using gradient ascent to modify the input vector to maximize the target property, with their use of SELFIES-based inputs helping maintain the validity of the proposed input. Nonetheless, this effort was restricted to linker generation.
The key to MLFF training is to assign a force and a contribution to total energy to each atom in the system based on their atomic environment. To do so, most MLFFs decompose total energy into atom-centered contributions based on descriptors of each atom's surrounding environment, ensuring additivity and size extensivity, as initially demonstrated by Behler et al.133 in 2007 and later refined by DeepMD. Early models like DeepMD use local descriptors that are invariant to permutation, rotation, and translation (local symmetry functions or descriptor-based encodings) as shown in Zhang et al.,137 while more recent architectures such as NequIP and MACE employ message passing and equivariant neural networks to capture directional interactions and preserve these physical symmetries, as shown by Vandenhaute et al.138 and Elena et al.139 These architectures contribute to data efficiency, having demonstrated strong learning performance from relatively small training sets.
Indeed, while MLFF can address generation challenges for MOF property data, MLFF development can face data challenges itself. For instance, the large unit cells, structural flexibility, and hybrid metal–organic bonding in MOFs introduce challenges for both data generation and model transferability, as noted by Eckhoff et al.140 For instance, flexible MOFs with rotating linkers, as discussed by Dürholt et al.141 and Zhao et al.,142 or guest-induced transitions, as discussed by Bucior et al.,117 demand potentials that respect rotational symmetries and long-range interactions. Even with recent advances, applying MLFFs to MOFs still requires domain-specific strategies, such as training on nodes and linkers separately,140 using temperature-driven active learning to reduce DFT sampling, as demonstrated by Sharma et al.,143 and hybrid force fields that integrate classical physics with ML components, as presented by Wieser et al.144 These techniques aim to make MLFFs more than just DFT replacements, enabling them to simulate MOF flexibility, guest diffusion, and even decomposition under real-world conditions, as recently discussed by Castel et al.145
To overcome the scalability bottleneck of training MLFFs directly on full MOF unit cells, fragment-based strategies have emerged that treat chemically meaningful substructures, such as linkers and nodes, as independent learning units.145 This approach allows the development of transferable potentials with reduced data requirements while maintaining fidelity to periodic properties, as shown by Tayfuroglu et al.146 Recent efforts further integrate active learning with fragment selection to prioritize diverse and data-efficient training sets, as demonstrated by Shi et al.147 Although fragment-based models may underrepresent long-range coupling effects, they offer a practical route to generalizable and scalable MLFFs for large and flexible MOFs.
Hybrid ML/classical approaches also enhance data efficiency by embedding ML corrections, like learned charges or dispersion terms, into existing classical force fields to refine interactions without retraining entire potentials, as demonstrated by Thürlemann et al.148 This has been demonstrated in MOFs, where ML models correct non-bonded terms to achieve better electrostatics and van der Waals behavior, as shown by Korolev et al.149 Additionally, hybrid MLFFs that combine neural short-range potentials with classical electrostatics have been shown to achieve near-DFT accuracy for MOF relaxations and phonons.144
Inspired by advances in foundation models for molecules and materials, emerging efforts are exploring pretrained machine learning potentials for MOFs.138 These models are trained on diverse atomic environments to produce generalizable force fields that can be fine-tuned with minimal new data. For example, MACE MP MOF0, which combines pretrained MACE with targeted MOF fine-tuning, enables rapid adaptation to new MOF fragments and accurate phonon and thermomechanical predictions with very little data.139 Though no universal MOF MLFF yet exists, the strategy of pretraining on building blocks, such as nodes, linkers, or secondary building units (SBUs), followed by system-specific tuning has been demonstrated for both porous and flexible frameworks.
The impact of ML potentials extends beyond accurate potential energy predictions, as they can serve as core engines for simulations of dynamic MOF behavior. For example, MLFF-driven MD simulations have been used to explore guest diffusion in MOFs: a NequIP-like neural potential accurately modeled H2 binding and diffusion in open-metal-site frameworks, predicting kinetics and isotherms previously inaccessible via DFT, as recently reported by Liu et al.150 In flexible MOFs, where linker motion and node distortion critically influence framework behavior, MLFFs have been shown to reproduce temperature-driven structural and vibrational changes that classical force fields struggle to capture.138,143 As these models mature and benchmark data improve, MLFF-powered simulations could become indispensable for capturing the full complexity of MOF behavior in real-world scenarios.
To develop MOFs for applications beyond adsorption, prediction of properties beyond adsorption is obviously needed. For instance, diffusion properties are relevant for drug delivery, low thermal conductivity for thermoelectrics,161 high electrical conductivity for electrocatalysis and energy storage,162 high proton conductivity for fuel cells,163 and so on. One significant barrier to generating data for properties beyond adsorption is their usually higher simulation cost. This is obvious if quantum mechanical methods are needed (e.g., electronic structure, bond breaking/formation events), but it can also be the case with classical simulations. For instance, computing diffusion coefficients may require MD coupled with enhanced sampling,164 free energy may require coupling with thermodynamic integration,160 or thermal conductivity may require large supercells to mitigate finite size effects or extended simulations for convergence (e.g., via the Green–Kubo method159). For some MOF aspects, the adequate simulation method may not even be clear (e.g., MOF decomposition or formation) or has not been fully developed.165,166
Based on the above, a combination of simulation advances, literature extraction, and approaches for data-efficient training is likely needed to develop reliable ML models beyond adsorption. Encouragingly, where enough data has been generated by pushing simulation resources or through literature extraction, ML predictions beyond adsorption have emerged with promising results. From simulation data, models to predict mechanical stability,156 heat capacity,132 and diffusion coefficients167–169 have emerged (Fig. 7c). Similarly, the publication of the QMOF database has spurred a number of ML models trained to predict band gaps.42,90,170–172 Although in this case the training data is not fully accurate (i.e., based on DFT), these models could offer a starting point for transfer learning or fine-tuning once more accurate but probably scarcer band gap data emerges. Related to this strategy, Rubungo et al. showed that fine-tuning a ML model originally trained on low-cost strain energy was key to achieving accurate ML predictions for high-cost free energy.76 On the other hand, literature extraction has enabled data to train ML models to predict thermal93 and water stability,99,173 whereas a combination of literature extraction and fine-tuning enabled ML predictions for proton conductivity.43,174
On a final note, while MOF data has been dominated by adsorption, it has been specifically dominated by physisorption, although chemisorption is likely relevant to numerous target applications.175–177 Thus, efforts to extend data generation to chemisorption are needed. Since the simulation adsorption cost is usually not prohibitive for training data generation, the challenge here is an accurate description of interactions. Although accurate force fields have been parameterized for specific adsorbate-MOF cases,178–180 HTCS-compatible (i.e. transferable) accurate force fields to describe chemisorption interactions are necessary. Force fields aside, an adsorption case for which the simulation data generation is notoriously challenging is water,181–185 which will require significant simulation advances or reliance on experimental data as an alternative. Nonetheless, water is a case that merits special attention due to its ubiquitous presence in many applications and its direct relevance to applications such as water harvesting.186–188
Given the existence of SMILES/SELFIES, the string-based representation used as input to LLMs is also particularly amenable to MOF linker generation. For instance, by fine-tuning GPT-3, Zheng et al.191 generated new candidate linkers for water-harvesting. Other representations can facilitate other tasks. For instance, by conveying MOF information into textual document form, Zhang et al.192 used the unsupervised Doc2Vec model to create a MOF representation that was used to develop a MOF recommendation system. This recommendation strategy, which was introduced earlier by Sturluson et al.,193 and followed by Zhang et al.,194 was inspired by the Netflix movie recommendation system, and suggests promising (extant) MOFs for applications of interest by analyzing similarities to user-endorsed MOFs.
LLMs have also been shown to work as assistants coordinating and streamlining computational work. For instance, ChatMOF,195 which integrates GPT-3 and GPT-4 with more specialized ML models (e.g., for property predictions, MOF generation, etc.), has been shown capable of recommending MOF structures for properties of interest. Beyond predictive modeling, and more on the synthesis side, LLMs are increasingly being deployed as “interactive research assistants,” capable of orchestrating and accelerating complex experimental workflows. For instance, the GPT-4 Reticular Chemist196 integrates GPT-4 into a cooperative loop with human researchers, where the model proposes synthesis steps, receives outcome feedback, and adapts its guidance through prompt-based in-context learning. This iterative process allows GPT-4 to refine its recommendations much like an experienced chemist. Similarly, the ChatGPT Chemistry Assistant105 employed prompt engineering to automate text mining of MOF synthesis conditions across diverse literature formats, eventually leading to a ML model predicting crystallization outcomes with 87% accuracy. Expanding on these capabilities, the ChatGPT Research Group197 introduces a multi-agent framework comprising seven specialized LLMs responsible for tasks ranging from literature review and synthesis design to robotic control and data interpretation. By combining these agents with BO, the system rapidly identified optimal synthesis conditions, significantly accelerating materials development. These assistant-type applications demonstrate how LLMs can bridge diverse aspects of the scientific process, functioning not just as tools for analysis, but as collaborators in experimental strategy and execution.
A central theme observed throughout this review is the need to match model sophistication with the quality and diversity of available data. Models that incorporate inductive biases grounded in chemistry and physics often outperform black-box approaches in data-limited settings. For properties that are expensive or difficult to compute, hybrid workflows leveraging either literature mining or active learning combined with ML offer a promising path forward. At the same time, large foundation models and generative models are beginning to offer a pathway for generalized representation learning and de novo MOF prototype design. However, both of these approaches will be constrained by the scope of training data and care should be taken to expand the diversity of node/linker chemistries and topologies within these datasets.
Despite these advances, several key challenges remain. Currently, standardized benchmark datasets, similar to those seen in the small molecule development community, do not exist for MOFs, which makes it difficult to compare ML methodologies and critically assess progress over time. In addition, MOF property data and prediction tasks are dominated by gas adsorption whereas the promise of MOFs extends to far more application areas. Therefore, datasets containing transport properties (e.g., diffusion, thermal conductivity), stabilities, and free energies, to name a few, and methods to compute these properties accurately and efficiently are still needed. Nonetheless, for many MOF properties, the quality (accuracy) of the datasets needs to be improved, which creates an opportunity where ML force fields are called to make a significant impact. On the other hand, as interest in MOFs expands to those with increasing complexity (i.e., larger unit cells or flexible topologies), new strategies will be needed to address computational challenges related to data representation and scaling.
Looking forward, the integration of ML models with awareness of synthesis feasibility, simulation-informed priors, or human-in-the-loop design will transform ML pipelines from simply predictive tools into generative, decision-making partners, especially for inverse design. While the true potential of ML-aided MOF design has yet to be realized, the hope is that future ML-mediated workflows will enable the creation of MOFs that defy conventional human intuition, including those with previously unseen topologies, properties, and function. For instance, to our knowledge, only one MOF topology (nun) not already present in the RCSR database has been discovered in the past 20 years.123 Nonetheless, the foundation is now in place for ML to become a critical driver of innovation in MOF materials science.
While all the described methods enable more options to unbiasing explore the MOF design space, a latent challenge emerges strongly. These methods are not aware of the synthesis accessibility of the proposed structures and/or building blocks. Therefore, including efforts that guide the design along synthesizable structures is a key to unlocking the spread of this AI/ML-based inverse design approaches in MOFs.
| This journal is © The Royal Society of Chemistry 2026 |