Inverse design of metal-organic frameworks for direct air capture of CO 2 via deep reinforcement learning

The combination of several interesting characteristics makes metal-organic frameworks (MOFs) a highly sought-after class of nanomaterials for a broad range of applications like gas storage and separation, catalysis, drug delivery, and so on. However, the ever-expanding and nearly infinite chemical space of MOFs makes it extremely challenging to identify the most optimal materials for a given application. In this work, we present a novel approach using deep reinforcement learning for the inverse design of MOFs, our motivation being designing promising materials for the important environmental application of direct air capture of CO 2 (DAC). We demonstrate that the reinforcement learning framework can successfully design MOFs with critical characteristics important for DAC. Our top-performing structures populate two separate subspaces of the MOF chemical space: the subspace with high CO 2 heat of adsorption and the subspace with preferential adsorption of CO 2 from humid air, with few structures having both characteristics. Our model can thus serve as an essential tool for the rational design and discovery of materials for different target properties and applications.


Introduction
Metal-organic frameworks (MOFs) are a class of crystalline nanomaterials known for their high internal surface areas, tunable chemistries, and wide range of pore sizes. 1 The variety of features makes MOFs promising materials for a wide range of applications like carbon capture, 2 methane storage, 3 hydrogen storage, 4 photocatalysis, 5 drug delivery, 6 conductivity, 7 and so on.][13][14][15][16][17][18] From the perspective of material chemists, the holy grail is to design the most promising materials for a given application. 19urrently, the most commonly used strategy is to nd the best materials through brute force screening of these materials library.However, as we are interested in a small subset of topperforming materials, most of the effort in these brute-force methods is spent on computing properties of those materials that are not interesting.As the number of structures in these databases has grown signicantly, several groups have started taking a different approach in searching more efficiently within this innite chemical space of MOFs, using methods such as diversity-driven searches, 16,17,20 and active learning based searches. 19,21n alternative way to explore this enormous chemical design space is through the inverse design of materials possessing desired properties. 22,23The chemical design space of MOFs involves combining metal clusters and organic linkers with topologies, which gives us a highly complex design space.This complexity has led to a limited number of studies focusing on the inverse design of MOFs.One approach to tackle this challenge is a top-down strategy that employs an evolutionary algorithm, 15,[24][25][26][27] utilizing predened molecular building blocks and topologies derived from existing MOF databases, such as the CoRE MOF database. 9Alternatively, a joint top-down/ bottom-up approach can be adopted to construct MOFs with novel molecular building blocks.Yao et al. 28 developed a deepgenerative model with variational autoencoders for the inverse design of MOFs with desired properties by optimizing the latent space.The latent space optimization involved a reticular framework representation that incorporated a combination of SELFIES (for organic linkers) and categorical variables (for metal clusters and topologies).However, optimizing the materials with non-convex objective functions in such a high dimensional latent space can be challenging, 29 which may introduce the risk of generating invalid organic linkers.
In this context, deep reinforcement learning offers a promising solution to address the limitations associated with the joint top-down/bottom-up approach and capitalize on its advantages in generating novel MOFs with desired properties.The introduction of novel topologies and metal clusters is restricted in practical and chemical aspects when generating MOFs due to constraints such as the complex chemistry associated with metal clusters.Hence, the de novo design of organic linkers is crucial for effectively navigating the intricate and high-dimensional landscape of MOF design space.Deep reinforcement learning is particularly adept at optimizing string representations like SMILES and SELFIES for organic linkers by leveraging existing knowledge.This is a distinct advantage of reinforcement learning over methods like Bayesian optimization and genetic algorithms, which might struggle with the nuanced optimization of string-based chemical structures.Furthermore, the inherent capacity of reinforcement learning algorithms to navigate the trade-off between exploration and exploitation within chemical domains positions them as particularly advantageous for addressing the inverse design challenges associated with complex crystalline architectures such as MOFs.This capability provides a distinct advantage over alternative methods such as VAEs.In the eld of de novo drug design, [30][31][32] this approach allows learning how to design molecules that maximize a reward function using deep learning models such as recurrent neural networks and generative adversarial networks.
Apart from molecule generation, Pan et al. 33 demonstrated that the reinforcement learning approach could be extended to inorganic materials such as metal oxide.However, the complex crystal structure of MOFs, which typically contain more than 100 atoms per unit cell, presents a signicant challenge for this approach.In this work, we propose a deep reinforcement learning framework designed for large crystalline materials such as MOFs, which can tackle the complexity of crystalline systems through reinforcement learning.
We apply this reinforcement learning framework to design MOFs for direct air capture (DAC).DAC has been developed to reduce the CO 2 concentration in the atmosphere.7][38] Currently, a class of materials known as chemisorbents is used industrially to tackle this challenge.Chemisorbents can bind CO 2 very strongly, which is reected in its high heat of adsorption that can go up to 100 kJ mol −1 . 39,40However, very high heat of adsorption could also lead to increased difficulty and costs in regenerating the materials.Using physisorbents for DAC could have the advantage of lower regeneration costs.Yet, one still needs them to have a sufficiently strong binding to compensate for the low concentration of CO 2 in the air.
Although there is no agreed value of the CO 2 heat of adsorption in literature to demarcate physisorbents from chemisorbents, the former usually have a CO 2 heat of adsorption of <40 kJ mol −1 .There have been reports, though, of few physisorbent structures having CO 2 heat of adsorption higher than 50 kJ mol −1 . 41Based on the heat of adsorption, Findley et al. 37 computationally screened the CoRE2014 MOF database and several classes of zeolites to search for the best physisorbent MOF for DAC.However, from their analysis, they concluded that the materials they considered were not viable for DAC.This overall scenario motivated us to explore an important scientic question-can we design a library of physisorbent MOFs specically for DAC?
The main objectives of this work are thus: (1) to illustrate the use of a reinforcement learning framework to inverse-design physisorbent MOFs with desired properties; (2) to highlight the use of these inverse-designed MOFs for an extremely important application like DAC.It is important to note that when we refer to physisorption, we refer to systems in which the binding does not involve charge transfer.Charge transfer is not something that we can model with our classical force elds, and, therefore, we do not include the amines interacting with CO 2 through chemisorption in our screening study.
We rst show the working of our reinforcement learning framework by inverse designing MOFs with CO 2 heat of adsorption higher than 40 kJ mol −1 .To make this design even more challenging, we must consider the effect of H 2 O. Depending on the climate, there are different levels of water vapor in the atmosphere, and most materials that strongly bind CO 2 have a higher affinity for H 2 O. Hence, we must design MOFs that perform well in humid conditions. 36And for this reason, our second design criterion is that the material prefers CO 2 above H 2 O (i.e., CO 2 /H 2 O selectivity to be higher than 1).The successful inverse design of MOFs for a challenging application like DAC could signicantly advance the rational design and discovery of materials for a wide range of applications.

Data representation of MOFs for inverse design
The data representation of MOFs for inverse design is crucial for successfully developing our reinforcement learning framework, as illustrated in Fig. 1a.A unique combination of metal clusters, organic linkers, and topologies represents each MOF structure in our database.Given that metal clusters in MOFs usually have constraints due to complex chemistry (such as oxidation states), we featurized the metal clusters and topologies (represented in RCSR using three-letter codes) with categorical variables.For the representation of organic linkers, a string representation known as SELFIES, 42 which ensures the basic chemical rules such as SMILES, was adopted.The SELFIES representation was adopted because it has been shown to outperform SMILES within generative models such as VAE, 43 GAN 44 by generating more diverse and valid molecules.We used the building blocks and topologies of PORMAKE, 15 where there are libraries of building blocks (from the CoRE MOF database 9 ) and topologies (from the Reticular Chemistry Structure Resource (RCSR) Database 45 ).Additional details of the MOF structure representation are provided in the Methods section.Moreover, PORMAKE was employed to construct MOF structures from the representation, enabling the computation of the properties for the generated MOF representations through simulation tools.

Reinforcement learning framework
The overall schematic of the reinforcement learning framework is depicted in Fig. 1b, which comprises two key components: an agent and an environment.The agent, which serves as the generator, takes action by creating a data representation of a new MOF structure.At the same time, the predictor evaluates the action as the environment by predicting the property of the new MOF representation.The predictor, as a surrogate model, enables faster estimation of crucial properties, aiding in the efficient screening and iteration of structures during the reinforcement learning phase.This approach is key in highthroughput studies for the manageable and timely evaluation of a large number of generated structures.Based on the prediction, a reward is returned to update the agent for generating MOF structures to maximize the reward.The objective of reinforcement learning is to nd the weights of the agent that maximize the expected return obtained from the environment.The details of the framework are provided in the Methods section.
2.2.1 Generator.Acting as an agent, the generator requires a pre-training stage to learn the essential underlying chemistry of MOFs.This knowledge includes which combinations of topology, metal clusters, and organic linkers can be constructed for MOFs and how to generate organic linkers of MOFs.The generator architecture is based on the transformer 46 architecture, which consists of an encoder and a decoder, as shown in Fig. 2a.It is important to note that the number of connection points of metal clusters and organic linkers must be appropriately matched within a given topology while constructing MOFs.For instance, the pcu, a topology of IRMOF-1, requires a vertex comprising six connection points, whereas the tbo, a topology of HKUST-1, necessitates two types of vertices comprising 3 and 4 connection points.To address it, the encoder of the generator receives a metal cluster along with the number of connection points of an organic linker as inputs, and the decoder identies the suitable topologies based on their connection points.[49] More concretely, the decoder of the generator selects topologies and metal clusters sequentially and creates the SELFIES of organic linkers by retaining a batch of scaffolds employed as inputs for the encoder.The details of building scaffolds are explained in the Methods section.
2.2.2 Predictor.The predictor provides the estimated target properties, allowing for the calculation of the rewards.As illustrated in Fig. 2c, the architecture based on the Transformer encoder takes different embedding layers of topologies, metal clusters, and vocabulary of SELFIES.By adding a simple dense layer to the class token at the rst position, which is a learnable embedding layer, the predictor can predict the desired target properties.The performance of the predictor was evaluated by measuring the mean absolute errors (MAE) of CO 2 heat of adsorption and CO 2 /H 2 O selectivity, which were found to be 2.87 kJ mol −1 and 0.64, respectively (see ESI Fig. S1 †).It should be emphasized that the reward functions of the reinforcement learning framework were assigned based on the predicted targets of the predictor.

Exploration strategy.
In reinforcement learning, it is essential to consider the trade-off between exploration and exploitation to achieve optimal performance.Exploitation involves selecting actions that maximize immediate rewards and generate MOFs similar to those created by the agent in the pre-training stage.On the other hand, exploration involves selecting actions that explore the action space for long-term benets, generating novel MOF structures that have not been previously seen.Achieving this balance is critical to the success of the reinforcement learning algorithm.To address this challenge, we introduced two generators: one for exploration and one biased towards exploitation (see Fig. 2b).The generator was updated to maximize rewards through the reinforcement learning algorithm, while the weights of the biased generator were frozen to enable continuous generation of MOFs by the pre-training generator.During the reinforcement learning training, each token of the MOF representations was selected from either the biased generator or the generator, with the choice determined by a threshold l.A higher value of l favored exploitation over exploration.We conducted experiments across various values of l and determined that a value of 0.5 yields the best results for generating structures.This value effectively strikes a balance between exploitation and exploration.Otherwise, within the reinforcement learning framework, the generation process tends to either replicate structures similar to those used in pre-training to facilitate exploitation, or generates few good performing structures to facilitate exploration.Moreover, to improve the diversity of topology and metal clusters of the generated MOF structures, 50% of the metal clusters in the inputs were masked during the training stage of reinforcement learning, allowing the decoder of the generator to select topologies and metal clusters without considering the metal clusters used as inputs.

Results of reinforcement learning
We trained the reinforcement learning framework using the pre-training generator and predictors, with the aim of generating novel MOFs that exhibit high CO 2 heat of adsorption and CO 2 /H 2 O selectivity.The performance of the optimized generator by the reinforcement learning algorithm was evaluated using three metrics: validity, scaffold, and uniqueness.The validity metric evaluates if the generator can generate MOFs which (1) match connection points of metal clusters and organic linkers for a given topology and (2) produce chemically valid organic linkers.To evaluate (2), we converted the generated SELFIES to canonical SMILES using the RDkit sanitizer.The scaffold metric measures whether the generated organic linkers contain a batch of scaffolds used as encoder inputs.Finally, the uniqueness metric assesses the proportion of distinct organic linkers generated.The performance was tested on a test set of 10 000 data used during the pre-training stage, and the performance was evaluated using metrics and rewards, as summarized in Table 1.
As we have a relatively small number of structures with the desired properties, we carried out the reinforcement learning for three rounds.Aer each round, we used the top-performing MOFs to retrain the predictors for the next round.More details on the training of the reinforcement learning framework are provided in the Methods section.ESI Fig. S1 † shows the parity plots of the predictors for each round.In Fig. 3, the property distributions of MOFs generated by the pre-training generator (i.e., scratch) and generators optimized by reinforcement learning for three rounds are illustrated, where the estimated property values are based on the predictors.Notably, we observe that the average target values increase as the rounds progress, and also, the overall distribution of our target properties shis towards the desired high values.This also means that with each passing round, we get more structures that meet our target requirements.To the best of our knowledge, some of our reinforced structures have the highest calculated CO 2 heat of adsorption and CO 2 /H 2 O selectivity values ever reported for physisorbent MOFs (see Table 2).This, thus, indicates the success of the reinforcement learning approach in optimizing the generator to produce MOFs with improved target properties.ESI Fig. S3 † reveals that the property distribution improves signicantly as training progresses within a particular round.Table 1 shows that the overall scaffold metric decreased aer optimization compared to scratch because the optimized generator generated organic linkers without the scaffolds used as input to maximize rewards.Aer optimization, the metric for uniqueness also exhibits a decrease, suggesting that the generator favors generating organic linkers with desirable properties.Based on the predictors' estimations, the most frequently observed topologies, metal clusters, and organic linkers for rounds 1, 2, and 3 are presented in ESI Fig. S5, S6, and S7, † respectively.Also, ESI Fig. S4 † show the attention scores of topology, metal cluster, and organic linker for the two targets.These attention scores have been computed using the attention rollout method. 50We nd that the metal cluster has

Chemical space analysis of the MOFs
We performed a chemical space analysis of the training and generated structures for the two targets, with t-distributed stochastic neighbor embedding (t-SNE). 51To visualize the evolution of the generated structures in the same reduced dimension space, we used the round 3 predictor of the corresponding target to obtain the latent vectors for training structures and generated structures of all the three rounds in Fig. 3c  and d.Generated structures from three rounds with the heat of adsorption larger than 30 kJ mol −1 are highlighted in Fig. 3c.
The ratio of structures with high heat of adsorption values (represented by darker orange points) over the generated structures increases with rounds, which is in the same trend of right-shiing of the peak in Fig. 3a, even though some structures are discarded during the molecular simulations.A similar plot for CO 2 /H 2 O selectivity (for structures having values greater than 1) is shown in Fig. 3d.Similarly, the evolution in Fig. 3d is consistent with Fig. 3b.In Fig. 3e, we projected the structures for both targets in the same chemical space to investigate if there are structures that satisfy both criteria.We gathered the overlapping training structures and all generated structures for the two targets and obtained their latent vectors by concatenating representations from the two third-round predictors.Fig. 3e further highlights the two separate subspaces occupied by these generated structures for the two targets.This corroborates the fact that structures having high CO 2 heat of adsorption have characteristics quite different from structures having high CO 2 /H 2 O selectivity.Thus, they occupy relatively different subspaces in the MOF chemical space, with a small overlap.This fact is explained in more detail in the Discussion section.It is important to note that we still managed to generate a few structures simultaneously satisfying both targets (represented by the overlapping orange and green points).

Feasibility tests for the generated top-performing MOFs
To ensure that the structures generated from our reinforcement learning framework are reasonable, we employed different structure feasibility tests to narrow down the ideal MOF candidates for DAC (the workows for the same are summarized in ESI Fig. S2 †).For each of the three rounds, rst, from the 10 000 test set, the valid MOFs (the ones satisfying the validity metric) generated by the optimized generators with CO 2 heat of adsorption higher than 30 kJ mol −1 and CO 2 /H 2 O selectivity greater than 1 were selected.Since few structures have their predicted CO 2 heat of adsorption higher than 40 kJ mol −1 , we went ahead with 30 kJ mol −1 as a temporary threshold so as to obtain a higher number of structures for the training of the subsequent rounds of reinforcement learning.Then, to estimate the synthesizability of the organic linkers, we computed the synthetic accessibility (SA) score. 52The SA score is based on molecular complexity; molecules with a low SA score are less complex and are expected to have an easier synthesis route compared to those more complex molecules with a high SA score.The MOFs with an SA score of organic linkers higher than 6 were dismissed. 53In addition, the number of generated structures was restricted through the topological root mean squared deviation (RMSD) of the atomic positions between the building block node vectors and the target topology node vectors. 15Lower topological RMSD values indicate that the strain between the two vectors is low, and the resultant MOF structure is more stable.Given that the MOFs created by the PORMAKE are typically feasible when the topological RMSD is lower than 0.3, 15 the same constraint was adopted.Apart from these, the generated MOFs with more than 3000 atoms and higher than 60 Å cell lengths were also dismissed to avoid calculations on very large structures.Structure optimization and charge generation were then carried out to obtain further reasonable structures.The generated structures that passed all the above tests went through molecular simulations to estimate the respective "true" target values.The details of the molecular simulations are provided in the Methods section.Finally, aer all the tests, we obtained 409/497/999 structures with CO 2 heat of adsorption higher than 40 kJ mol −1 and 2215/2304/2426 structures with CO 2 /H 2 O selectivity higher than 1 based on RASPA simulations, in round 1/round 2/round 3, respectively.

Discussion
From a material chemist's perspective, it is interesting to know what makes a material good for DAC.Or in other words, what are the features or the genes of the top-performing materials for DAC?Our top-performing candidates' topologies, metal clusters, and organic linkers are shown in Fig. 4. When it comes to CO 2 heat of adsorption, we nd that the top-performing MOFs mostly contain the metal clusters Mn-based N131 and Eu-based N520 (see Fig. 4; the naming of all the metal clusters can be found in the PORMAKE paper. 15).The Mn-based N131 cluster has an open metal site that strongly attracts the CO 2 molecules towards the metal atom. 54Many structures built from this cluster have a CO 2 heat of adsorption value higher than 40 kJ mol −1 .One example of such a structure is shown in Fig. 5a.Other metal clusters within the high-performing MOFs include lanthanide metals (i.e., Nd, Sm, Dy) and transition metals (i.e., Ni, Co, and Zr).Apart from the metal cluster, the organic linkers of the top-performing MOFs have more abundant branches (i.e., functional groups) such as F, Cl, Br, and NH 2 .The top-performing candidates' topologies, metal clusters, and organic linkers for round 1 and 2 are summarized in ESI Fig. S8 and S9, † respectively.
The top-performing MOFs with respect to the high selectivity of CO 2 /H 2 O, however, have different characteristics from the ones having high CO 2 heat of adsorption, as shown in Fig. 4b.The metal clusters include metals frequently used for synthesizing MOFs such as Cu, Zn, Cd.In particular, metal clusters Cu-based N262, 55   a challenge to nd a MOF having a good balance between CO 2 heat of adsorption and CO 2 /H 2 O selectivity. 37n our model, the topology of a MOF is described by the three-letter codes as dened in RCSR. 45The topology guides the underlying network connectivity of the metal nodes and organic linkers in a MOF.From Fig. 4a., we nd that certain topologies such as dmp and tfz-d primarily dominate in structures with high CO 2 heat of adsorption.If we then look into the features of  the MOFs for high CO 2 /H 2 O selectivity, we nd the presence of a different set of dominant topologies such as bcg and reo.Furthermore, from the attention score plots (see ESI Fig. S4 †) we nd that our model also considers topology while making its decision on property prediction.However, it is to be noted that the surface characteristics of a MOF (like pore diameters, surface areas, and other pore geometry characteristics) are determined by the combination of the topology along with the node and linker used to construct the MOF.At very low concentrations of CO 2 such as in DAC, the chemistry of the MOF (type of metal nodes, organic linkers) usually plays a much stronger role than pore geometry characteristics in determining the CO 2 adsorption. 16,20,57ig. 5 highlights the structures of some of the topperforming MOFs for DAC application (inverse) designed in this work.Structure (a) has a high CO 2 heat of adsorption (∼62 kJ mol −1 ) but a low CO 2 /H 2 O selectivity (<1) whereas structure (b) has a high CO 2 /H 2 O selectivity (>1) but a low CO 2 heat of adsorption (∼29 kJ mol −1 ).And structure (c) has both high CO 2 /H 2 O selectivity (>1) and moderately high CO 2 heat of adsorption (∼40 kJ mol −1 ).
In addition, we have listed some experimental MOFs studied for DAC from reported literature 37,41 and some top-performing ones from our work in Table 2.The experimental cif les were obtained from the CoRE2019 MOF database 9 as treated in our previous work. 20The structures in this table have been selected with the aim of covering a range of values for the different metrics.We nd that for each of the considered metrics, we have managed to generate structures that compete strongly with the reported ones, with a few of our structures performing well across all the metrics.It is also important to note that structures having high CO 2 heat of adsorption also tend to have high CO 2 uptake at 400 ppm.
In a broader context, it is also interesting to note that all of the three categories of structures from Fig. 5 and Table 2 can be promising candidates for DAC, depending on the environment in which they are used.For example, if a particular industrial process conguration can have a dehumidier unit before the adsorption step, then structure 5(a) can be potentially interesting despite its low CO 2 /H 2 O selectivity.This structure can also be promising in regions with very low water vapor content in the atmosphere.The stronger binding of CO 2 to the MOF would become the more dominating factor in choosing this material.On the other hand, if there is no dehumidier unit, and in regions with very high water vapor content in the atmosphere, structure 5(b) can be an interesting choice because of its preferential adsorption of CO 2 in humid conditions.Structure 5(c), satisfying both requirements, is likely to be an interesting candidate in both scenarios: with and without humidity.Thus, it will be interesting to evaluate these materials one step further, on a process level, to get even more insights into these materials' performance and we are looking into it as future work.
It is important to note here that our reinforcement learning framework needs training data for at least ∼30 000 materials, which requires us to trade between computational expense and accuracy (i.e., Universal Force Field (UFF) and extended charge equilibration methods (EQeq)).The main aim of this article is to demonstrate that reinforcement learning can give us a library of structures with the desired properties.This illustration is, of course, independent of the details of the force eld.However, it is important to ensure the accuracy of these predictions before giving a follow-up of this work.But as this only needs to be done for the top-performing structures, one can use, for example, DFT-derived charges.

Conclusion
In this work, we have developed a deep reinforcement learning framework to inverse design MOFs.We illustrate this approach to design MOFs with important characteristics for direct air capture (DAC) of CO 2 .We successfully (inverse) designed a set of materials that have a high affinity towards CO 2 (CO 2 heat of adsorption >40 kJ mol −1 ).In addition, we generated a set of materials that preferentially adsorb CO 2 from humid air.Subsequent analysis of the chemical design space shows that the top-performing structures populated two separate subspaces concerning the two target properties.Yet, few of our structures satisfy both requirements.We show that the topperforming structures generated in this work compete strongly against the top-performing MOFs reported in the literature for DAC, thereby providing the research community with more potential options for further investigation.
The heat of adsorption is an important proxy for performance in a DAC process; it allows us to eliminate really poor structures and identify some of the most promising ones, thereby narrowing down the enormous MOF chemical search space effectively.
To gain further insights into the materials' performance in industrial setups, the next step of this work would be to evaluate the top-performing materials in a more detailed DAC process engineering design. 58

Details of MOF structure representation
The building blocks from PORMAKE are composed of vertices (i.e., abstractions with more than two connection points) and edges (i.e., abstractions with two connection points). 15In this work, however, they were modied more intuitively into metal clusters and organic linkers depending on the presence of metal atoms in the building blocks.Thereby, 486 vertices containing metal atoms were used for the metal clusters.For organic linkers, there are 103 vertices and 175 edges that do not contain metal atoms.Apart from building blocks, we used 97 topologies extracted from the CoREMOF database by MOFkey, 59 which is summarized in ESI Note S1. † It is important to note that data augmentation of the organic linkers is required to train with deep generative models, given the insufficient number of organic linkers.Hence, augmentation was implemented by merging the organic linkers, where the connection points of organic linkers were replaced with the other organic linkers.To this end, 30 642 merged organic linkers were generated, and their validity was examined by the RDkit sanitizer. 60It is to be noted that these data representations can be used to reconstruct the MOF structures using PORMAKE.

Computational details for molecular simulations
All structures for pre-training were optimized using the Universal Forceeld (UFF) 61 with LAMMPS 62 that can be implemented with multi-core processors, thereby facilitating calculations with a large number of structures.The input les for the same were generated using lammps_interface. 63,64The number of MOF structures generated via reinforcement learning was much less than the number used for pre-training, and these were optimized using the Universal Forceeld (UFF) 61 as implemented in the Forcite Module of Materials Studio 2019. 65he EQEq (extended charge equilibration) method 66,67 was used to generate the partial charges of the framework atoms of the MOFs.The lowest common oxidation states of the elements were chosen as their charge centers. 16,20,68he cif les of the experimental structures used for comparison in Table 2 were obtained from the work of Moosavi et al., 20 where the partial charges were also generated using the EQEq method.
The Henry coefficient and heat of adsorption at innite dilution, for CO 2 and H 2 O, were computed using Widom's test particle insertion algorithm. 69All of the calculations were performed at 298.15 K using RASPA. 70We computed the CO 2 /H 2 O Henry's selectivity as the ratio of the respective Henry coefficients: The CO 2 uptake at DAC conditions (400 ppm, 1 bar, 298.15 K) was derived from the CO 2 henry coefficient: where p is the partial pressure of CO 2 at DAC conditions, i.e. 0.0004 bar.
To prevent the insertion of gas molecules in inaccessible pores of the MOF during Monte Carlo simulations, articial blocking is necessary to avoid overestimation of adsorption values. 71,72For this purpose, blocking spheres were calculated using the Zeo++ soware. 73he framework atoms of the MOF structures were described by UFF. 61CO 2 and H 2 O molecules were described by the TraPPE forceeld 74 and TIP4P/2005 forceeld respectively. 75The gasframework interactions were modeled using Lennard Jones potential, truncated at 12 Å, with tail corrections. 76The Lennard-Jones interactions between dissimilar atoms were approximated using Lorentz-Berthelot rules. 77The coulombic electrostatic interactions were computed using Ewald summation.The DAC properties for the experimental MOF structures were obtained using the same simulation protocol as mentioned above.
The RASPA simulations were employed to compute the properties of the initial training dataset for the predictors and to evaluate the top-performing MOFs predicted by the predictors.

Details of reinforcement learning framework
In our reinforcement learning framework, the generator serves as the agent with the task of generating novel MOF representations.Each of these representations is dened as a sequence of state S with a maximum length T = 128, encapsulating the MOF structure.Specically, a state S in our context is dened as S(topology, metalcluster, organiclinker), where the topology and metal cluster are categorical variables, and the organic linker is represented by a SELFIES string, limited to 126 characters to ensure the total length does not exceed T. At each step t, (0 < t < T), the generator takes state s 1:t in previous steps as inputs and then determines probability distribution p of state s t .Therefore, the state s t is determined by sampling from the probability distribution of the previous step t − 1.At t = 0 and t = 1, the generator respectively determines topologies and metal clusters that are categorical variables.Then, it generates sequences of SELFIES for organic linker during 1 < t # T. The predictor takes the data representations created by the generator as inputs and estimates the expected reward r(s t ).To rene the generator's performance in generating MOF representations that maximize the expected reward, the weights of the generator q are updated.We employ a policy gradient method (REINFORCE algorithm) 78 aimed at maximizing the expected cumulative reward (see Section 5 5.5).The policy's objective function, parameterized by q, is expressed as: Regarding the initial state and the exploration strategy, at the commencement of training, q is initialized using the weights from a pre-trained model of the generator to leverage prior knowledge.As mentioned in the 'Exploration strategy' section, we employed diverges from the epsilon-greedy method by incorporating a dual-generator approach.One generator remains static (biased) to exploit known good strategies, while the other is trainable and tasked with exploring the state space.The exploration is governed by a stochastic process that decides whether to exploit or explore at each step, thereby balancing between the two for efficient learning.This approach allows for more directed exploration and potentially faster convergence by leveraging the strengths of both exploitation and exploration.

Training details for pre-training
In the pre-training step, a dataset comprising 646 907 MOFs, generated using building blocks from the PORMAKE database, was utilized for the generator.Since these building blocks originate from the CoRE MOF database, this approach enables the generator to learn the patterns and chemistry inherent in existing MOFs.For organic linkers, we initially created a set of 30 642 merged organic linkers, each validated using the RDKit sanitizer.This initial step is represented in Fig. 1a, where we show examples of small linker fragments used as building blocks.These fragments were sourced from the PORMAKE database and then merged to form the comprehensive set of linkers.In the second step, the merged 30 642 organic linkers were decomposed into scaffold fragments with the BRICS algorithm, which allows the splitting of molecules into chemically semantic fragments.The batches of scaffolds were employed to train generators in the pre-training stage.If the decomposed fragments by the BRICS algorithm were a subset of other fragments for each organic linker, the subsets were dismissed.And the maximum number of scaffold fragments was set to four.The smaller fragments were omitted if a molecule was decomposed into more than four scaffold fragments.The scaffold fragments were randomly combined and joined in SELFIES to build a batch of scaffolds.Finally, a dataset of 1 540 889/385 223/10 000 (train/validation/test) was created for the generator.For the predictor, an additional set of ∼35 000 MOFs were generated using the PORMAKE database, following the same methodology as the generator.Aer conducting the Widom insertion simulations with RASPA, a total of ∼33 000 MOF structures were utilized for training the predictor for CO 2 heat of adsorption, and ∼24 000 structures were used for the CO 2 /H 2 O selectivity predictor.Subsequently, these datasets were split in an 8 : 1 : 1 ratio for training, validation, and testing sets, respectively.Comprehensive information regarding the dataset used for training the predictor, including specic quantities and their detailed distribution, is summarized in ESI Table S1.† The architectures of the generator and predictor were derived from the original transformer paper. 46For the generator, the transformer encoder and decoder consist of 3 layers, 4 heads, and a hidden size of 256.The maximum length of the data representations of MOFs is 128.The model was trained with a batch size of 128 during 50 epochs and AdamW 79 optimizer with a learning rate of 10 −4 .The predictor, which is similar to the encoder of the generator, consists of 4 layers of the Transformer encoder.It was trained with a batch size of 128 during 100 epochs.The AdamW optimizer with a learning rate of 10 −4 and weight decay of 10 −2 was used for the predictor.The learning rate was warmed up during the rst 5% of the total epoch and then linearly decayed to zero for the remaining epochs.

Training details for reinforcement learning
The reward functions used in reinforcement learning were assigned by the estimated values provided by the predictor.The reward functions of CO 2 heat of adsorption and CO 2 /H 2 O selectivity are dened as eqn ( 4) and ( 5), respectively.It is important to note that these reward functions were calculated using two separate predictors, each independently trained to estimate these two different properties.
The policy gradient algorithm was trained with a batch size of 16 during 20 epochs where each epoch was constructed by randomly selecting 8000 data in the training dataset of the generator.The optimizer and scheduler were the same as those used in the training process of the generator.
The reinforcement learning was carried out for three rounds.Aer each round, the top-performing structures of that round were added to retrain the predictor for the next round.And, this updated predictor was then used in the training of the reinforcement learning for the next round.

Fig. 1
Fig.1(a) A schematic of data representation for MOFs used in this work.The MOF structures in the database are represented by their topologies, metal clusters, and organic linkers.To create a sufficiently large pool of linkers for training the deep generative model, the organic linkers were merged amongst themselves to create an augmented set of linkers.(b) Overall schematic of a reinforcement learning framework for the inverse design of MOFs.The agent (in this case, the generator) generates a MOF structure as an action.The predictor, as the environment, evaluates the action by predicting the property of the new MOF structure and returns a reward as an update to the agent.The agent then generates the next round of MOFs based on the received reward.This process is repeated iteratively until the agent generates MOFs with desirable properties.

Fig. 2
Fig. 2 (a) The architecture of the generator consists of a transformer encoder and decoder.The encoder takes metal clusters, the number of connection points of organic linkers, and a batch of scaffolds of organic linkers represented by SELFIES as inputs.Scaffolds refer to the core structures of the molecular framework.The decoder selects topologies and metal clusters, which are both categorical variables.The organic linkers are generated based on the scaffolds used as inputs.(b) The schematic illustrates the process of generating MOF representations by the generator and the biased generator (with frozen weights) to balance the trade-off between exploitation and exploration in reinforcement learning.The exploitation-to-exploration ratio is determined by the threshold parameter l.(c) The architecture of the predictor is based on the Transformer encoder.The predictor takes the MOF representations from the generator as inputs.A simple dense layer is added to the token at the first position (i.e., class token) to predict the target properties of interest.

Fig. 3
Fig. 3 Comparison of the distributions of (a) CO 2 heat of adsorption and (b) CO 2 /H 2 O selectivity for scratch and optimized MOFs based on the predictor's estimated values.The scratch distributions show the property distributions of the MOFs generated by the pre-training generator before being optimized by the reinforcement learning algorithm.The optimized distributions show the property distributions of the MOFs generated by the generator after being optimized by the reinforcement learning algorithm.After each round, the final top-performing MOFs of that round were selected and added to the next round for retraining the predictor.And this updated predictor was then used in the training of the reinforcement learning for the next round.The evolution of the generated structures for (c) CO 2 heat of adsorption and (d) CO 2 /H 2 O selectivity in the chemical space of MOFs.The gray points in (c) and (d) represent the training structures for CO 2 heat of adsorption (∼26 000) and CO 2 /H 2 O selectivity (∼19 000) respectively.Orange points in (c) represent the generated structures for CO 2 heat of adsorption (having values >30 kJ mol −1 ), green points in (d) represent the generated structures for CO 2 /H 2 O selectivity (having values >1).The color code represents the respective target properties as obtained from molecular simulations.Further details on the number of generated structures in each round are provided in ESI Fig. S2.† (e) The mapping of the overlapping training (∼15 000) and generated structures from all three rounds for the two targets.

Fig. 4
Fig. 4 Frequently occurring topologies, metal clusters, and organic linkers of the top-performing MOFs based on molecular simulations for (a) CO 2 heat of adsorption (b) CO 2 /H 2 O selectivity in round 3.

Fig. 5
Fig. 5 Structures of some of the top-performing MOFs designed for DAC.The corresponding metal node, organic linker, topology and some properties are shown/mentioned alongside.The MOF names, as used in this study, are: (a) v0 + hex + N131 + 225 (b) v1 + hex + N262 + 1184 (c) v1 + tfz-d + N692 + 2546 (Note: the MOF naming convention followed in this work is in the form: version + topology + metalNode + organicLinker.Version 0 refers to structures designed for high CO 2 heat of adsorption, and version 1 refers to structures designed for high CO 2 /N 2 selectivity.)C = light brown, H = white, O = red, N = light blue, Cl = green, Mn = dark violet, Cd = pink, Cu = dark blue, and the black atoms in the metal clusters refer to the points where the metal clusters and organic linkers get connected to each other.

Table 1
Performance metrics of the pre-training generator and the reinforcement learning for CO 2 heat of adsorption and CO 2 /H 2 O selectivity 16,20ighest attention score for both the targets, and this result agrees with previous ndings in the literature regarding the role of metal clusters in low-pressure CO 2 capture applications.16,20 and Zn-based N328 (ref.56) appear most frequently.Most metal nodes having open metal sites (like metal node N131) attract H 2 O more than CO 2 , and therefore the MOFs consisting of those metal nodes have low CO 2 /H 2 O selectivity.Also, the generated organic linkers of the topperforming MOFs, in this case, rarely include functional groups such as F, Cl, Br, NH 2 .This shows that it is indeed

Table 2
Comparison of DAC performance metrics (metrics for all structures computed using the same methodology: see Section 5 5.2) between some of the top-performing MOFs reported in the literature and ones generated in this work