Open Access Article
Hossein Mashhadimoslem
*ab,
Mohammad Ali Abdola,
Kourosh Zanganehc,
Ahmed Shafeenc,
Encheng Liud,
Sohrab Zendehboudi
e,
Ali Elkamel
afg and
Aiping Yu
*ab
aDepartment of Chemical Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. E-mail: hmashhadimoslem@uwaterloo.ca; aelkamel@uwaterloo.ca; aipingyu@uwaterloo.ca
bWaterloo Institute for Nanotechnology Department of Chemical Engineering, University of Waterloo, 200 University Ave. W., Waterloo, ON N2L 3G1, Canada
cNatural Resources Canada (NRCan), CanmetENERGY-Ottawa (CE-O), 1 Haanel Dr, Ottawa, ON K1A 1M1, Canada. E-mail: kourosh.zanganeh@nrcan-rncan.gc.ca; ahmed.shafeen@nrcan-rncan.gc.ca
dDavid R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
eFaculty of Engineering and Applied Science, Memorial University, St. John's, NL A1B 3X5, Canada
fCenter for Catalysis and Separations, Khalifa University, P. O. Box 127788, Abu Dhabi, United Arab Emirates
gDepartment of Chemical and Petroleum Engineering, Khalifa University, 1274 Abu Dhabi, United Arab Emirates
First published on 11th December 2025
This research focuses on efficiently collecting CO2 adsorption data using experimental metal–organic framework (MOF) porous materials from the scientific literature, addressing the challenges related to data classification and access to MOF synthesis methods. The aim is to organize, classify, and facilitate easy access to materials science information using artificial intelligence (AI). Using advanced large language models (LLMs), we developed a systematic approach to extract and sort MOF synthesis data for CO2 adsorption in a structured format. Using this method, we collected data from over 433 published experimental research papers and created a specific dataset to analyze the effects of metals, ligands, and carbon adsorption conditions on CO2 uptake performance. The correlations between the material structure, such as metal types, ligands, specific surface area, pore size, pore volume, synthesis conditions, and CO2 adsorption, under various process conditions were examined using the final database. We applied ChatGPT 4o mini as an AI assistant to text-mine all MOF information from different PDF file references. In addition to revealing the impact of each parameter on CO2 uptake and MOF structure before synthesis, the AI analysis findings indicated which ligand and metal groups should be altered to customize the MOF structure for improved CO2 capture.
Since solid adsorbent-based gas adsorption and separation methods offer the potential to be more adaptable and energy-efficient for a range of carbon capture applications, they have been extensively studied.4 Metal oxides,5 polymers,6 zeolite,7 ceria oxide,8 porous carbon, and activated carbon9 are the types of common adsorbents. Metal–organic frameworks (MOFs) have been extensively investigated as potent and promising CO2 capture adsorbents in recent years due to their porous structure, large specific surface area (SBET), high capacity, excellent selectivity, and structural tunability.10 As a result, an exceptionally wide range of MOF materials can be synthesized through broad design strategies. Engineering the selection of framework components should enable precise control over the internal pore surface's affinity for CO2 adsorption.11 This will allow for the customization of MOF material properties tailored to specific CO2 capture and separation processes as well as to various operating conditions. Significant progress has been made recently to enhance the carbon capture and separation performance of MOFs.12 Furthermore, some studies that evaluate the potential of these materials for use in CO2 capture systems in the industrial chemical and energy sectors are emerging.10
For assessing and choosing adsorption methods, scientists identify the best combination of adsorbent structures for carbon capture processes. Discovering materials that have already been synthesized is one strategy, while another is to use simulation approaches to assess MOFs before synthesis.11 The concept of developing an artificial intelligence (AI)-based chemical assistant has opened up previously unheard-of possibilities to revolutionize materials research, particularly for MOFs used in carbon capture. Today, time-consuming and difficult operations, such as data analysis, chemical screening, and library searches, can be processed quickly by utilizing AI expertise across multiple disciplines.13 Employing a fixed vacuum swing adsorption (VSA) process, Burns et al.14 evaluated over 1600 MOFs for post-combustion CO2 capture by combining atomic/molecular simulation and process modeling. Only 500 of the above MOFs fulfilled the U.S. Department of Energy's standards for 90% purity and 95% recovery of CO2. To facilitate this search, they created a machine learning (ML) algorithm that detects promising materials with an accuracy of over 90% and rapidly determines which materials should satisfy the purity and recovery requirements. Therefore, it is essential to employ AI/ML algorithms to accelerate the selection and optimization of the desired MOF structure for customization in CO2 adsorption processes. One of the most significant challenges in chemistry and materials science research has been determining chemical compound information and material compositions, including optimal synthesis conditions as well as physical and chemical properties. A fundamental and crucial stage in the materials discovery process is to obtain a thorough summary of chemical information taken from literature sources, including articles and patents, and then store it in an orderly database.15
In general, scholars are particularly interested in effectively extracting vast volumes of material structure information from published scientific articles and existing literature. Natural language processing (NLP) models, which can quickly read and understand the words and information in published papers, are currently one of the most widely used methods in this field.16,17 Large language models (LLMs), especially the GPT series, are emerging nowadays, and the fields of materials science and chemistry are undergoing a significant revolution because of these language models.18
One of the primary objectives is to use published data and screen them to extract required information and features for the design of MOF structures for various CO2 adsorption processes. This valuable information can provide insights and promising opportunities for materials design researchers, seeking to synthesize and design a new generation of MOF compounds for CO2 capture. Understanding the impact of each of the fundamental MOF properties, such as modifications to ligands, metals, and component functional groups, as well as the adsorption process conditions (temperature and pressure), is essential for the innovative design of MOFs for CO2 separation from gases. The selection of the component structure or the synthesis method can be significantly influenced by the MOF synthesis techniques and the impact of each of the previously listed criteria.19
In the present study, we performed text data mining of scientific experimental data published in reputable journals on the synthesis of MOFs, specifically for CO2 capture, using the LLM (ChatGPT) model. Compared to other generative AI platforms or other open-source models, we selected the ChatGPT 4o mini20 because of its universal accessibility, computational correctness, task-based performance, and token limitations. First, one of the easiest generative AI systems to use is OpenAI's ChatGPT family. Unlike alternatives that require an application programming interface (API), interacting with a GPT model using a web user interface (UI) does not require any hardware or installation. Second, the GPT-4 series also demonstrates stronger performance in chemical science and engineering, including improved accuracy in tasks such as structure elucidation and reaction prediction.21–23 These features create a new opportunity to use data analysis for accelerating research in this field. The findings from text mining on the synthesis methods of different MOFs for CO2 capture were linked to process adsorption conditions and MOF performance. Finally, an analytical investigation was conducted to examine the relationships among synthesis techniques, structural components of MOFs, functional groups, and CO2 capture performance. It provides a useful tool for decision-making and synthesis strategies, demonstrating that linking various MOF components with their corresponding synthesis techniques based on published papers is feasible. Furthermore, we used details collected from the synthesis conditions and types of MOFs for CO2 adsorption to create a recommendation system for synthesis conditions. This method offers an asset for various MOF methods of synthesis by recommending customized MOF synthesis conditions based on specific metal and ligand types and a direct correlation with the amount of CO2 adsorption. This research study demonstrates how LLMs can aid in chemistry and accelerate MOF customization to build high-efficiency CO2 adsorbents.
Fig. 1 presents a procedure for chemical text mining instructions, including how to extract all the information and name the MOF groups. The goal is to extract all data/information on MOF synthesis, including the name of compounds, metal source, ligand, solvent, and reaction time. Simultaneously, the CO2 adsorption rate at different temperature and pressure conditions, along with pore size, pore volume, and SSA of MOFs, are other keywords. The details of the dataset used for chemical text mining are listed and compiled in the SI file. Fig. 1 is a schematic of the data mining workflow using ChatGPT for text mining and extracting information on MOF synthesis conditions and key parameters for CO2 capture from a number of published research articles. In the first step, the articles' PDF files are considered as input data; during the initial evaluation, we take into account all the expected data displayed in the blue box that have an impact on CO2 adsorption (see Fig. 1). The white or black OpenAI logo signifies the use of the ChatGPT platform, and the entire article review process is performed according to ChatGPT 4o-mini20 language patterns. We created prompts for ChatGPT to guide its reading of the articles. By specifying keywords, the language model concentrates solely on the titles and terms we identified at the outset. This technique swiftly eliminates all parts unrelated to the article's keywords. By establishing this method, we enhance the processing and text-mining speed of our desired articles. In this approach, ChatGPT reads all selected keywords, along with the associated numbers and organizes them in a table as reference data (see Table S1). The texts of the articles are thoroughly scanned, and a table containing the MOF synthesis information, along with the CO2 adsorption data and MOF specifications mentioned in the articles, is created. The data-gathering text mining procedure that employs ChatGPT4o-mini to collect information on MOF synthesis conditions and important CO2 capture parameters from several published research articles is presented as a flowchart in Fig. S1 of the SI file. Instead of using individual, time-consuming chats with web-based ChatGPT to process text from numerous research papers, OpenAI's GPT-4o-mini, which is the same as the one that powers the ChatGPT product, allows for a more effective method because it has an API that allows text from a large number of papers to be processed in batches.
![]() | ||
| Fig. 2 Pseudo-code of the GPT-4o mini prompts with the LLM algorithm interaction for the MOF parameters. | ||
Fig. 2 shows the pseudo-code and addresses the corresponding effects of these parameters using an LLM-based model developed to establish a meaningful relationship between the parameters. Given the limited availability of CO2 selectivity data, it was not possible to find a robust relationship between this parameter and the other parameters. All statistical data for pressure, CO2 adsorption temperature, SSA, and oxidation number shown in Fig. S8 to S10 in the SI have been reported in studies that focused on adsorption pressure, adsorption temperature, reaction time, and oxidation number. The data cover oxidation numbers from 0 to 4, temperatures up to 325 K, and adsorption pressures up to 15 bar. Fig. S7 in the SI document shows the scatter of the data across the top ten ligands, metals, and MOF groups.
As shown in Fig. 1, textual data about different MOFs, such as MOF types, ligands, metals, synthesis method, solvent, physical states, oxidation number, reaction time, MOF structure characteristics, SSA (m2 g−1), pore size diameter (nm), pore volume (cm3 g−1), CO2 adsorption capacity (mmol g−1), CO2 selectivity, adsorption temperature (K), adsorption pressure (bar), and MOF group name, which are present in the SI, were extracted as the data required for implementation of LLMs. Comprehensive details about MOF group characteristics, synthesis methods, and CO2 adsorption rates are provided in Table S1 of the SI. By carefully extracting the data and applying one-hot coding to the MOF group names and related features, appropriate weighting was performed. The results in Fig. S4 reveal the relationship between the mentioned parameters in the SI. Pearson correlations between all the considered parameters for the top five metals, along with the top six MOFs, structural variables, and CO2 adsorption characteristics, are plotted in Fig. S4 of the SI.
All MOF structural characteristics, including the SSA (m2 g−1), pore size diameter (nm), pore volume (cm3 g−1), and CO2 adsorption capacity (mmol g−1), of the top five metals, and ligands, which were obtained from the data text mining approach, are shown in Fig. 3–6. The top six ligands (H3BTC, H2BDC, H4DOBPDC, 2-methylimidazole, H3BTB, and NH2-BDC), metals (Cu, Zn, Mg, Cr, Zr, and Co), MOF groups (UiO-66, MIL-101, MIL-100, HKUST-1, MOF-5, and MOF-74), and top five supergroups (MOF, MIL, ZIF, UiO, and metal/dobpdc) are listed in (Fig. 4, 6, 8 and 9). This means that the statistical data on the top parameters had more repeatability and more data. The execution method and instructions for these prompts are presented in Fig. 2. To understand how the GPT-4o mini20 prompt interacts with the LLM algorithm, the pseudo-codes are demonstrated in Fig. 2 and the S1 document. The data extraction process began by analysing individual paragraphs in each article, after first separating the articles into independent PDF files. To coordinate interactions between the different LLMs, we used the OpenAI API to perform data mining on the documents. To read the PDF documents and convert them into text, PyPDF2 version 3.0.1 was utilized. All data manipulation, analysis, and processing were performed by employing Panda's library version 1.4.3. Diagrams and figures were plotted by the Plotly library version 5.13.1. The workflow enables fully automated data extraction, which includes three main steps: namely classification, feature consideration and inclusion, and finally extraction. The automation was performed manually with modifications such as monitoring token length. Table S2 illustrates the accuracy achieved for the classification and inclusion of features. Fig. S1 in the SI document depicts the data mining path classification and data cleaning procedure involved in this process. In Table S2, where the evaluation results are specified, the column “Description by human reviewer” lists the observed discrepancies, and the results of the review of the articles that had discrepancies are listed in the column “Status”.
![]() | ||
| Fig. 3 (a) CO2 adsorption capacity (mmol g−1) data range for the top five ligands, along with the SSA (m2 g−1), (b) pore size (nm), and (c) pore volume (cm3 g−1), obtained for selected MOF. | ||
![]() | ||
| Fig. 4 Relationship between the CO2 adsorption capacity data range in Fig. 3 for the top ten ligand types introduced for (a) ten physical states, (b) ten solvents, and (c) ten MOF synthesis methods. | ||
![]() | ||
| Fig. 5 CO2 adsorption capacity (mmol g−1) data range for the top five metals versus (a) SSA (m2 g−1), (b) pore size (nm), and (c) pore volume (cm3 g−1), for considered MOF. | ||
![]() | ||
| Fig. 6 Relationship between the CO2 adsorption capacity data ranges in Fig. 5 for the top ten metal types introduced for (a) ten solvents, (b) ten ligands, and (c) ten MOF synthesis methods. | ||
After extracting the identified data and establishing the appropriate data format, a validation process was performed by comparing the extracted data with the original articles to minimize errors in 10% of the dataset (see SI Excel, Table S2). The text and keywords were correctly detected in over 86% of the ChatGPT readings. To check the accuracy of the data extracted by ChatGPT, 10% of the articles were randomly selected (using the random function) and were reviewed manually and by a human reviewer. The results of this review are given in Table S2 in the SI. We decided to remove CO2 selectivity from the results in the table because it did not show a significant relationship with the other parameters. Furthermore, as indicated in Table S2, inconsistencies in how selectivity was reported/written across studies (combined with the limited amount of available data) made meaningful analysis difficult. Thus, this parameter was excluded from further consideration.
Fig. S4(a) highlights the key correlations between CO2 adsorption parameters, metals, pore volume, specific surface area (SSA), and adsorption pressure with other parameters. Fig. S4(b) shows that SSA and pore volume strongly influence the adsorption rate of metals Cr and Cu, while adsorption temperature exhibits the strongest correlation with Cr and Mg. Fig. S4(c) displays the significant structure-property relationships, where MIL-101 shows the strongest SSA–SSA-adsorption correlation, MOF-74 demonstrates the highest adsorption capacity correlation, and MIL-100 shows the strongest temperature correlation.
We should focus on reducing the illusion created by the comparison table between real data extraction and the fabricated or misleading content from ChatGPT. Examples of what is or is not in the text (Table S1) and how ChatGPT presents it in a fabricated state can be seen in the SI Excel file (Table S2). Therefore, the issue of illusion is an important concern, and we need to analyze and evaluate the obtained data so we can control how we retrieve data while designing ChatGPT commands. The CO2 adsorption rates (mmol g−1) with related MOF structure parameters such as SSA (m2 g−1), pore size (nm), and pore volume (cm3 g−1) using the top five ligands listed in the literature are illustrated in Fig. 3(a–c). After investigating the data, ref. 24–31 are consulted to verify the accuracy of these items/factors. The findings indicate that the H2BDC ligand has the highest porosity, and the H3BTC ligand has the greatest CO2 adsorption capability. Fig. 4(a–c) displays the relationship diagram of the top ten ligands in terms of physical state, solvent, and synthesis method parameters, respectively. The connections between each ligand and the synthesis parameters are evident in these figures. The findings offer researchers a broad framework for synthesis and can serve as useful guidelines when using different MOF synthesis techniques for CO2 adsorption application. The amount of CO2 adsorption, along with the corresponding SSA, pore size and pore volume for particular MOF ligands (frequently reported in the literature), and their synthesis methods, serves as evidence of the reproducibility of the information obtained. This indicates that data gathered from the literature can be used for synthesizing MOFs targeted for CO2 capture.
The hexanuclear [Zr6O4(OH)4] units that make up the hydroxylated form of UiO-66 have µ3-O and µ3-OH groups alternately capped on the triangular faces of the Zr6 octahedron. To create a cubic 3-D framework, the Zr6 polyhedra are joined along their edges by carboxylate groups from twelve 1,4-benzenedicarboxylate (BDC) linkers.32 Experimental and simulation studies have demonstrated that the addition of functional groups like –COOH, –SO3H, and –NH2 to the BDC ligand significantly enhances UiO-66's capacity to adsorb and separate CO2.33 Since UiO-66 structures often have smaller pore diameters and surface areas, adding functional groups generally reduces the adsorbent's SSA, which may harm the CO2 adsorption capacity of porous adsorbents.34
The same strategy is repeated in Fig. 5–9 for the top five metals and MOF groups identified, along with their relationship to physical conditions, solvents, metals, ligands, and synthesis methods. The relationships between the top five metals and synthesis parameters, along with the CO2 adsorption rate and its relationship to SSA, pore size, and pore volume, are evident in these figures. Using these findings, the identification of the top five metals and their interactions with the CO2 adsorption rate (mmol g−1) results and the structure of the MOF in terms of porosity, SSA, and pore size can be understood. By combining this information with the various methods of MOF synthesis, researchers can make more informed decisions regarding MOF selection, targeted adsorption performance, and desired porosity characteristics. Organizing and categorizing these data together provides a clearer research roadmap and makes potential objectives more achievable. For example, Cu metal created the porous structure with the highest SSA (m2 g−1) and CO2 adsorption capacity (mmol g−1) among all metals. Fig. 4, 6, and 8 illustrate another aspect of the top five metal interactions with ligands and MOF groups, as well as synthesis details, including the solvent type and synthesis methods.
![]() | ||
| Fig. 7 CO2 adsorption capacity (mmol g−1) data range for the top five MOF groups, in terms of (a) SSA (m2 g−1), (b) pore size (nm), and (c) pore volume (cm3 g−1), obtained for considered MOF in Fig. 8. | ||
![]() | ||
| Fig. 8 Relationship between the CO2 adsorption capacity data ranges in Fig. 7 for the top ten MOF group types introduced for (a) nine solvents, (b) ten metals, (c) six ligands, (d) nine synthesis methods, and (e) seven physical states. | ||
The sources involving UiO-66 and MIL-101 were investigated further, and the text mining findings were in agreement with every result reported in the relevant papers.35,36
Fig. 3–6, concerning the top five ligands and metals, confirm the results of Fig. S11 (in the SI file) and the relationship between the aforementioned parameters. This approach allows for flexibility in selecting different synthesis conditions among the top ten ligands and metals. Furthermore, by providing an overview of the synthesis methods and the rationale behind solvent selection, researchers are better equipped to anticipate and understand the study approach. The first step is to establish a relationship between all synthesis factors, such as the choice of solvent, physical state, ligand, metal, and synthesis technique, with the CO2 adsorption rate and porosity of the final MOF type. The next step is to observe and classify the MOF types and their grouping. An overview of these relationships is clearly illustrated in Fig. 3–8. Based on the statistical data from published articles, Fig. 7 and 8 further identify MOF-5, MIL-101, UiO-66, and HKUST-1 as the most effective MOF groups for CO2 adsorption. The selection of a specific MOF group provides a framework for researchers to investigate the effects of key variables, such as the metal type, ligand, synthesis method, and solvent used, on overall CO2 capture performance. An evaluation of room-temperature adsorption diffraction data implies that a secondary adsorption site contributes to the adsorption behavior of many of these materials. Fig. 8 and 9 show the relationship between ligands, metals, solvents, synthesis methods, and MOF type. Indeed, the collected data may be easily linked to the experimental results, serving as a useful reference for future study. In all Fig. 4, 6, 8, 9a, and 9b, direct relationships between top metal groups, top ligands, top solvents, top synthesis methods, and top MOF groups that had the greatest effect on CO2 adsorption are shown.
It is worth noting that the roles of each metal and ligand in relation to the synthesis methods can be observed. For example, Cu demonstrates a progressively stronger influence in MOFs as the pore volume varies. The synthesis routes, extracted from text mining and illustrated in Fig. 6–9, provide further insight into these relationships. In addition, Fig. S8 and S9 depict statistical data distributions of other key parameters influencing CO2 adsorption capacity, including the top five ligands, metals, and MOF groups. The results confirm the impact of the top five ligands as well as the top five metals on CO2 adsorption. Cu and Co metals, along with H2BDC and NH2BDC ligands from MOF-74 and MIL-101, will lead to a high specific surface area and enhanced CO2 adsorption (see Fig. S11). Therefore, based on data obtained using solvothermal, hydrothermal, spray drying, and post-synthetic modification methods and utilizing DMF, MeOH, and H2O solvents in the presence of H2BDC/NH2-BDC ligands and Cu, it is possible to achieve MOF composites with greater CO2 adsorption capacity. The results reveal that text mining facilitates data-driven decision-making to optimize the desired MOFs for CO2 uptake or synthesis method selection. By empowering researchers to choose synthesis methods, solvents, ligand types, and metals, and to understand how these factors interact to shape the MOF structure and performance, we move closer to developing a smart assistant. This assistant, supported by text mining, will enhance researchers' ability to select the best synthesis techniques, solvents, ligand types, and metals, as well as to understand how these features interact to create the optimum MOF structure with high performance.
As shown in Fig. 3–9, the textual data analysis highlights the influence of different metals (e.g., Cu and Zn) and ligands (e.g., BDC) on adsorption efficiency. Variations in metal centers and ligand types significantly affect the porosity of the adsorbent and the resulting MOF structure. Recent studies indicate that Zn-based MOFs, particularly those incorporating Cu2BDC2 frameworks, exhibit strong and rapid CO2 uptake.37 Furthermore, the effect of pressure on MOF-based adsorbents, as extracted from the textual data, provides valuable guidance for process optimization before experimental synthesis. As an example of validating our results against published studies, we assessed the MOF-74 sample and compared the data obtained through text data mining with atomic simulation data and experimental data for transition-metal MOF-74 variants. It is noticed that the results of text mining, which are based on experimental data, would support the findings of this study (see Fig. 10). Queen et al.38 found that the Cu analogue has an axial strain that causes the ligand O2 atom to be positioned in such a way that CO2 cannot approach the open metal site. Since the CO2 site's occupancy is similar, it is clear that Cu2(dobdc) possesses two adsorption sites with identical binding strengths. The results obtained from text data mining are in strong agreement with the experimental results39 of the role of Cr and ligand for tuning the pore size and SSA for the synthesis of amine-functionalized MIL-101(Cr)-NH2 with a particle size less than 20 nm, SSA above 2800 m2 g−1, and CO2 adsorption up to 3.4 mmol g−1 (see Fig. 10 (f) and (g)).
![]() | ||
| Fig. 10 Density distribution plots of the metals of the MOF-74: (a) Mn-MOF-74, (b) Co-MOF-74, (c) Ni-MOF-74, and (d) Cu-MOF-74 model clusters interacting with a CO2 molecule, and (e) CO2 adsorption isotherms for Mn, Co, Ni, and Cu-MOF-74, utilizing universal force field (UFF) in conjunction with the localization of properties and density-derived electrostatic and chemical (cyan circles) point charges, calculated at a temperature of 298 K. Reprinted with permission from ref. 40 Copyright 2015, the American Chemical Society. Experimental isotherms at 298 K are displayed for comparison (black curves, data extracted from ref. 38 Copyright 2014 with permission from the Royal Society of Chemistry). (f) Schematic of the MIL-101(Cr)-NH2 structure39,41 and (g) TEM images of MIL-101(Cr)-NH2 nanoparticles. Reprinted with permission from ref. 39 Copyright 2020, the American Chemical Society. | ||
Our strategy, with the help of the ChatGPT-4o-mini LLM assistant, aims to quickly review a variety of synthesis methods, including green methods, to facilitate routine experimental work. Using the extensive knowledge previously published by researchers, we enable rapid searching and evaluation of various synthesis parameters by utilizing AI algorithms. The standard criteria of gas adsorption by porous materials depend on various parameters, such as (i) variations in size and/or shape (molecular sieve effect); (ii) variations in the interactions between the adsorbent molecule and adsorbent surface (thermodynamic equilibrium effect); (iii) variations in diffusion intensities (kinetic effect or partial molecular sieve action); and (iv) quantum effects.42 The interaction between the gas and the MOF surface becomes a key parameter in determining the quantity of adsorption of each component when the MOF adsorbent's pore size is large enough to allow all gas components to pass through. Additionally, the characteristics of the adsorbent, such as polarizability, permanent dipole moment, quadrupole moment, as well as the features of the adsorbent surface, influence the interaction intensity.43 When carboxylate ligands (1,4-benzenedicarboxylates (BDC)) are combined with high-valence metal cations (Cr3+), the MOF stability is improved in the presence of water. However, CO2 adsorption, which produces a quadrupole moment, is significantly affected by strong polarizing groups such as carboxylic acid.44 Furthermore, increasing metal valence causes an increase in the electrical difference between the CO2 molecule and the adsorbent surface, which improves CO2 adsorption. For CO2 adsorption, the adsorbent's pore diameter (3.3 Å) must be greater than the kinetic diameter of CO2 molecule. Because of the different interactions between the adsorbent and its surface, CO2 gas is adsorbed, and the adsorption process will be at thermodynamic equilibrium.12 Consequently, it is feasible to anticipate increased CO2 adsorption and tune the pore size diameter of MOFs, by carefully selecting the appropriate ligands and metals.
Fig. 11 provides an overview of the creation of keywords and classification of the MOF group for CO2 capture. Based on the classification generated by the LLM, MOFs can be organized into groups and supergroups, and the relationships among the MOF types used for CO2 capture are shown in Fig. 9. In fact, the relationship of the ligand and metal types is one of the most important foundations of the MOF structure. By observing the type of MOF in the separated subgroups, the results of the mentioned diagrams can be easily used. Using the current ChatGPT-based assistant, researchers can gain an overview and a roadmap of how different MOF or composite types are synthesized and perform. This allows them to analyze the data effectively and make informed decisions when selecting an MOF type or synthesis method. New data can be integrated into the existing dataset to update the model and results using the attached GitHub code. The approach is easily expandable by incorporating additional data and refining the outputs accordingly. One of the main motivations behind this work was to promote the green synthesis of new MOFs by avoiding the use of toxic solvents such as DMF and reducing synthesis costs.45 The resulting MOF cleaned database (Table S3, Excel file in the SI), including DOI links to the original articles, can be used asdata for training ML models and for evaluating various features.
![]() | ||
| Fig. 11 Schematic of the classified types of supergroups, MOF groups, and MOF types obtained from text mining. | ||
Based on human analysis, a significant percentage of correct data from the articles was identified and extracted by ChatGPT with appropriate accuracy and presented an acceptable true-positive ratio. A major problem with the data in the articles is that ChatGPT fails to identify many instances of false negatives, particularly in review articles, where it only extracts a small number of relevant data items from a vast amount of information. This issue with CO2 selectivity stopped us from adding this parameter to the database. Another issue with ChatGPT is its ability to identify data in large tables, as it extracts a limited number of items from them. Therefore, the results show that research article data is more practical and it is easier to read the text correctly by ChatGPT. Incorporating more research articles to obtain new data and creating a larger database will facilitate broader and more accurate research in materials science, especially on MOFs for certain applications. To improve the quality of LLM-based models, several key challenges need to be addressed. First, the literature is biased towards successful results, while unsuccessful experiments are rarely reported, which introduces uncertainty in the results. Second, text mining with LLMs is still in its early stages, and as the scientific literature continues to evolve, LLM results will change, potentially leading to inconsistencies. Therefore, it is imperative to address this concern and develop more linguistic models on specific topics, such as the synthesis of porous materials, with further validation. Therefore, one should not immediately expect accurate results from LLMs. Instead, they should be used with caution as an assistant to observation and data mining, material design, and laboratory synthesis to simplify and accelerate the review of past research. To improve validation, we recommend that researchers include the obtained data in a separate table in note format when publishing research results to allow for careful review and secondary validation. This research study not only demonstrates how LLMs can revolutionize the development of porous MOF materials for CO2 capture, but also provides a useful guide for the design of high-performance MOF materials in the more general areas of designing and fabricating adsorbents and catalysts with efficient synthesis schemes. This practical tool can play a fundamental role for researchers in synthetic strategies and in providing roadmaps. Expanding this research method beyond adsorbent and catalyst studies would be encouraging. Crystal-structured adsorbents, such as zeolites and metal oxides, have high potential for future research.
| N | Number of datasets for training [-] |
| R2 | Correlation coefficient [%] |
| SBET | Specific surface area [m2 g−1] |
| AI | Artificial intelligence |
| ANN | Artificial neural network |
| AARD | Average absolute relative deviation [%] |
| AAD | Average absolute deviation [%] |
| API | Application programming interface |
| CCS | Carbon capture and sequestration |
| CBM | Carbon-based materials |
| GHG | Greenhouse gas |
| IPCC | Intergovernmental panel on climate change |
| MSE | Mean square error |
| MOF | Metal–organic framework |
| ML | Machine learning |
| MLP | Multi-layer perceptron |
| POP | Porous organic polymers |
| RMSE | Root means square error |
| SSA | Specific surface area [m2 g−1] |
| TEM | Transmission electron microscopy |
| VSA | Vacuum swing adsorption |
https://github.com/ai4mat-lab/GPT_MOF_Project
Further details on usage can be found in the repository's documentation. A supplementary information (SI) data repository is available on Zenodo (https://doi.org/10.5281/zenodo.17619285), providing the datasets and codes used in this research study.
Supplementary information: detailed information about the dataset is summarized in Tables S2 (evaluated GPT results), and S3 (MOF database cleaned SI (Excel files)). Table S1 displays all extracted data from the PDF files. The results of the validation process between the ChatGPT text mining and comparison with the original article's experimental data are presented in Table S2. The dataset, which includes the exact paper's DOI and parameter values, can be found in Table S3. See DOI: https://doi.org/10.1039/d5dd00446b.
| This journal is © The Royal Society of Chemistry 2026 |