Jingyuan
Zhu†
a,
Yizhou
Wang†
a,
Imanuel
Rava†
a,
Shihong
Chen
a,
Zhiyan
Zou
a,
Yong
Huang
*a,
Zhenyang
Lin
*a and
Haibin
Su
*ab
aDepartment of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, China. E-mail: yonghuang@ust.hk; chzlin@ust.hk; haibinsu@ust.hk
bIAS Center for AI for Scientific Discoveries, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, China
First published on 14th October 2025
Since the 1970s, nickel has proven to be an exceptionally efficient catalyst for cross-coupling reactions, particularly in the activation of C–O bonds, which serves as an environmentally friendly alternative to organic halides. The relentless exploration by chemists of the synthetic methodologies and mechanisms of this field has progressively fostered the emergence of an increasingly mature yet intricate discipline. Despite its apparent complexity, the core patterns remain hidden within some significant works. The development of large language models (LLMs) has provided unprecedented opportunities to navigate this complex landscape and uncover hidden patterns. Here, we introduce GPT-NiCOBot, a modular platform that integrates LLMs with chemistry-specific tools to autonomously extract reactions and identify key patterns in reagents and catalysts from peer-reviewed papers. Moreover, by combining the core citation network with in-depth chemical knowledge, this platform constructs a more effective and comprehensive research assistance framework. This system demonstrates the potential of LLMs to accelerate research in nickel catalysis and suggests broader applications in other chemical subfields.
Traditionally, cross-coupling reactions involving halide electrophiles are catalyzed by palladium, which commonly operates in the 0 and +2 oxidation states. As an Earth-abundant and cost-effective alternative to palladium, nickel offers diverse catalytic pathways and unique open-shell reactivity, thanks to its wide range of accessible oxidation states from Ni(0) to Ni(IV).6 The smaller atomic size of nickel enhances its nucleophilicity, enabling efficient activation of sulfonates, esters and less reactive electrophiles.7 Research exploring nickel as a catalyst in C–O bond activation has intensified in recent decades. Hence, it has become a general platform to synthesize C–C bonds. Studies have explored a wide array of carbon–oxygen electrophiles, ranging from active sulfonates8 to ether insertion using nickel catalysis.9,10 With the initial proof-of-concept achieved in 1979,11 progress in this field has significantly accelerated throughout the 21st century. The surge in publications within this domain has resulted in a wealth of data, enhancing our comprehension of reaction mechanisms, ligand effects, and substrate scopes (Fig. 1a).
The emergence of data-driven modeling has revolutionized the analysis of chemical data, aiding in reaction classification,12 product prediction,13 reactivity/yield estimation,14,15 and synthesis planning.16,17 A general workflow of data-driven methods used in organic chemistry is illustrated in Fig. 1b, and includes several key steps: dataset, representation, model and output. Traditionally, strategies relied on linear free energy relationships that connect a single parameter with the chemical reactivity of interest.18,19 Over time, multi-parameter methods have been introduced to correlate specific chemical inquiries with customized molecular descriptors. For example, the high relevance of the cone angle, buried volume, and electrostatic potential of ligands to predict the yield of nickel-catalyzed Suzuki reactions using a regression model was demonstrated.20 The emergence of advanced artificial intelligence (AI) and progress in high-throughput experiments has enabled the use of increasingly large datasets and more advanced algorithms to describe and predict the outcome of more complex catalyst systems. A neural network model utilized a comprehensive library of computed atomic, molecular, and vibrational descriptors as inputs to predict yields of Buchwald–Hartwig amination reactions.14 Structure-based fingerprints were also developed to predict the yield using a random forest model trained on the same dataset for the Buchwald–Hartwig reaction.21
Recently, OpenAI has demonstrated significant breakthroughs in the field of large language models (LLMs) through extreme scaling,22 successfully applying these models to chemical research23,24 and paving the way for data-driven techniques in organic chemistry. To enhance the cross-domain performance of LLMs, a viable approach is to build AI agents that utilize specialized external tools or plugins to improve their overall performance and applicability.25 The conceptual framework for an LLM-based agent includes three components: the brain, perception, and action.26 Serving as the controller, the brain module undertakes basic tasks such as memorizing, thinking, and decision-making. The perception module perceives and processes multimodal information from the external environment, while the action module uses tools to execute operations.
Inspired by the successful application of LLM agents in multimodal reasoning and action,27,28 we here present GPT-NiCOBot, an integrated platform that iteratively reflects on chemical tasks and refines responses (Fig. 1c). GPT-NiCOBot receives and processes various forms of input data (such as text, PDF files, and images). The brain module utilizes built-in databases, citation networks, and chemical patterns for in-depth data analysis/reasoning, and supports decision-making and planning. In the action phase, based on the prior analysis, it strategically employs appropriate chemical tools, a data retrieval unit (DRU), or research modes to acquire the necessary information to complete specific tasks. GPT-NiCOBot highlights the transformative potential of LLMs in chemistry research, serving both as an assistant to experts and as a gateway for non-experts through a user-friendly interface for accessing chemical knowledge.
For the DRU, the critical steps involved in the pipeline include content mining from images, tables, and text. Leveraging the vision-language multimodal capabilities of GPT-4, we streamlined the content mining process for chemical data. First, an initial prompt is given to the DRU to determine whether the current page contains data of interest, such as tables or scheme names, based on the user-provided information. This step effectively filters out irrelevant pages, thereby saving tokens needed for further processing. Next, considering that GPT-4-Vision29 accepts input in image format, the DRU will digitize the relevant pages into images for subsequent image mining. At the same time, the original text format for GPT-4o30 will be retained to conduct table and text mining. For image mining, the DRU extracts detailed information on relevant chemical reactions, including reactants, products, nickel pre-catalysts, nickel pre-catalyst loading, ligands, ligand loading, bases, solvents, temperature, time, and yield, ensuring that this information is returned in a structured format. Given the limitations of GPT-4-Vision in recognizing molecular structures in images, the DRU only returns the labels of the molecules in the literature, without converting them into compound names or SMILES. Considering that tables or text might lack essential reaction information, data extracted from images will serve as prompts to assist in the processing of the tables and text. Then, the DRU integrates the extraction results from tables, text, and images and outputs a preliminary dataset. A name-to-SMILES converter tool is used to transform some of the molecules into SMILES representations, while returning the labels of molecules that cannot be converted to the user. During the user interaction phase, users can review the preliminary results and make suggestions for improvements based on their needs, such as supplementing data for specific entries or pointing out overlooked footnotes. Users can also upload screenshots of molecules that cannot be converted to SMILES, and the DRU will invoke the MolGrapher31 model for conversion. Through this interactive process and iterative data updates, the DRU continuously refines the information until the user is satisfied, and ultimately outputs structured data.
To evaluate the accuracy of content mining in the DRU, we conducted a comprehensive analysis of the entire result dataset. Specifically, we manually recorded the ground truth values for all the 13 reaction parameters across 1051 reactions, utilizing these data to assess the accuracy of the DRU outputs. Each reaction parameter was assigned to one of three labels: true positive (TP, where the DRU correctly identified the reaction parameter), false positive (FP, where the compound was incorrectly assigned to the wrong reaction parameter or irrelevant information was extracted), or false negative (FN, where the DRU failed to extract certain reaction parameters). Considering that the DRU is semi-automated, specific classification and processing are applied to evaluate its performance. Any results that required manual intervention are automatically categorized as FP or FN. Since the accuracy of converting molecular names or images into SMILES largely depends on the chosen tools or models, the conversion step is not included in the performance evaluation. Such an assessment method ensures a focused and accurate evaluation of the DRU's capabilities in content mining and reaction condition extraction, eliminating any potential interference due to the accuracy of external tools or models. The distribution of the TP labels for the 13 reaction parameters extracted across the 1051 reactions from all the 148 papers is shown in Fig. S2. It should be noted that not all reaction conditions require the reporting of all 13 reaction parameters; for example, most Kumada reactions and Negishi reactions did not involve an external base, and some reactions did not use additives. In such cases, the DRU assigns N/A to the corresponding reaction parameters. The precision, recall, and F1 scores (the harmonic mean of precision and recall) for each reaction parameter, as shown in Fig. S3, demonstrate high precision (>94%), recall (>86%), and F1 scores (>90%). Notably, the entire workflow is characterized by minimal coding and an efficient prompt engineering system, ensuring exceptional performance. This approach relieves researchers from the arduous task of sifting through numerous papers within their field, enabling them to swiftly assimilate the latest domain knowledge.
Prior to 2004, research largely focused on pairings of highly reactive partners, such as aryl ethers coupled with Grignard reagents or organoboron nucleophiles with sulfonate electrophiles (Fig. 3b(1) and (2)). After 2008, several research groups expanded the scope to less activated C–O electrophiles—including carbamates, carbonates, and carboxylates—and milder nucleophiles such as organoboron and organozinc reagents, as well as pro-nucleophiles accessed via heterolytic C–H cleavage.32,33 In 2008, Chatani and co-workers reported Ni-catalyzed Suzuki–Miyaura couplings of aryl ethers (Fig. 3b(4)),9 while the Shi34 and Garg35 groups independently demonstrated couplings of aryl boron reagents with aryl carboxylates; Garg and co-workers also extended the scope to aryl carbamates and carbonates.36 Shi and co-workers further pioneered Ni-catalyzed Negishi couplings of aryl/alkenyl pivalates that year (Fig. 3b(5)).37 Subsequently, in 2012, Itami and co-workers disclosed C–H/C–O couplings of azoles with arenol derivatives, including carboxylates and triflates (Fig. 3b(6)).38
Among aryl/alkenyl electrophiles, intra-class similarity increases after 2008, consistent with methodological generalization beyond a specific substrate; this consolidation is reflected in clusters of related electrophiles in Fig. 3a. Additionally, a further trend after 2008 is the emergence of C(sp3) electrophiles, notably benzylic substrates. For example, in 2011, the Jarvo group made a significant breakthrough with a stereospecific nickel-catalyzed C(sp3)–C(sp3) cross-coupling employing alkyl Grignard reagents (Fig. 3b(7)).39 Furthermore, the Jarvo group showcased the transformative impact of Suzuki–Miyaura cross-coupling reactions with benzylic esters, carbonates, and carbamates alongside aryl-boronic esters.40 Concurrently, the Watson group introduced a related strategy, employing benzylic pivalates and aryl boroxines to produce diarylalkenes and triarylmethanes.41
This citation network plays a crucial role in understanding the background of the field. Compared to directly summarizing and analyzing all the collected literature using LLMs, providing structured data that explicitly clarifies the citation relationships between papers enables the model to more accurately identify milestone work in the field. This approach avoids redundancy and highlights key advancements. The background module of GPT-NiCOBot is designed based on the concept of the citation network (Fig. S12). It includes the citation relationship between papers and constructs a dataset with concise citation content to describe the core contributions of the cited works, leveraging the strengths of language models in text comprehension and generation. To provide high-quality background information, we extracted key information from 148 papers, including titles, abstracts, and keywords related to electrophiles and nucleophiles. Additionally, with the assistance of GPT-4o, we conducted preliminary summaries of each paper, evaluating aspects such as the introduction of new reaction types, the range of applicable substrates, the mildness of reaction conditions, and the potential for application and practicality. This structured, citation-based approach optimizes resource utilization and significantly enhances the effectiveness and accuracy of background information in the research process.
We initially utilized Tree MAP (TMAP)42 to correlate electrophiles with various nucleophiles in experimentally validated combinations (Fig. 4a). Within the TMAP framework, sub-trees categorize the electrophiles into two distinct regions: unactivated C–O bonds (such as ethers and esters) and activated C–O bonds (including phosphonates and sulfonates). The patterns exhibited by different types of electrophiles show a range of clustering behaviors. Csp2 electrophiles, primarily aryls and alkenyls, are the most frequently studied. In contrast, Csp3 electrophiles are less common and are mostly limited to benzyl and allyl derivatives. This can be attributed to the propensity of these benzyls and allyls to readily undergo oxidative addition by forming η3-complexes.43 Furthermore, the carbon hybridization of electrophiles exhibits preferences for certain leaving groups. Allyl and benzyl electrophiles are typically paired with unactivated leaving groups, such as ether and ester groups, whereas aryl electrophiles are more commonly associated with better leaving groups such as sulfonates and phosphates.
Ligand selection is a critical aspect of transition metal catalysis, particularly during the oxidative addition and transmetallation steps. When reaction substrates are altered, it is essential to reoptimize ligands by tuning their steric and electronic properties. In this study, density functional theory (DFT) calculation was also employed to generate molecular descriptors for electrophiles, nucleophiles, and ligands, enabling subsequent structure–property relationship investigations. Notably, some ligands are underrepresented due to the scarcity of low/negative outcomes and the bespoke nature of certain scaffolds (for example, N-ligands or some structurally complicated and rare ligands). We compiled statistics on various types of ligands based on their donor atoms. Fig. 4b shows that phosphine ligands, particularly monodentate and bidentate phosphines, are the most employed. With their strong binding affinity to metals and highly tunable sterics and electronics, phosphine ligands dominate in metal-catalyzed reactions.44 N-Heterocycle carbenes (NHCs) are also observed, albeit less frequently, showcasing their distinct reactivity, such as strong σ-donating ability and steric bulkiness.45 Conversely, nitrogen ligands, commonly employed in Ni-catalyzed radical reactions,45 are seldom encountered in our database. This scarcity may be attributed to their relatively weak σ-donation characteristics and potential π-accepting abilities, which can lead to electron-deficient nickel complexes, potentially limiting their efficacy as ligands in the polar process of oxidative addition to C–O bonds.
To explore the relationship between the ligand and electrophile, clustering analysis was conducted using t-distributed stochastic neighbor embedding (t-SNE Fig. S5). The results demonstrate that Csp2 sulfonates and phosphates can accommodate a wider range of ligands, whereas less reactive Csp2 esters and ethers exhibit a preference for electron-rich ligands (with Tolman's electronic parameters,46 TEP, of 2064 cm−1 or greater). For Csp3 electrophiles, such as benzyl and allyl groups, electron-poor ligands (with TEP less than 2064 cm−1) are commonly employed. To better understand the ligand choice for ester/ether substrates, a quantitative analysis correlating the calculated bond dissociation energy (BDE) of the electrophile C–O bond and the TEP of the ligands is shown for ester/ether substrates (Fig. 4c). The correlation plot demonstrates a staircase-like distribution of data points. Relatively weak and medium C–O bonds can tolerate phosphine ligands with a wide range of TEP values (Fig. 4c, region I). In contrast, C–O bonds with higher BDEs are more associated with ligands with low TEP values (Fig. 4c, region II), which underscores the importance of C–O bond cleavage in the catalytic cycle. For instance, where Grignard's reagents serve as the nucleophile, ligands with high TEPs are infrequently matched with less reactive electrophiles11,32 (Fig. 4c region II). This may be attributed to the role of Mg as a Lewis acid, aiding in the activation of C–O bonds.3,47 The hybridization of the electrophile carbon also exhibited a different preference for TEPs. Allyl/benzyl electrophiles tend to cluster towards the left-hand side, while aryl/alkenyl electrophiles are primarily located on the right-hand side. This positioning implies that the π-electrons of aryl/alkenyl substrates play a role in affecting the oxidative addition, requiring more electron-rich ligands for efficient C–O bond activation.
Another consideration for ligand selection is the relationship between the ligand denticity requirement and reactivity of the nucleophiles. This is illustrated in Fig. 4d using t-SNE with DFT-based descriptors. Mono- and bidentate ligands showed different distributions across different nucleophile classes. In the category of highly reactive nucleophiles such as Grignard and organozinc reagents on the left, mono- and bidentate ligands appear at similar frequencies. This phenomenon can be attributed to the high reactivity of these nucleophiles, requiring little help from the ligand in the rapid transmetallation process. Conversely, less reactive nucleophiles exhibit a propensity for a specific ligand denticity to facilitate smoother transmetallation. For RNu–[B] nucleophiles, the cluster of monodentate ligands is considerably bigger than that of bidentate ligands. This preference likely arises from indirect transmetallation,48 where Lewis base activation of the organoboron reagents is needed. Monodentate phosphines are better suited to this process due to their ease of dissociation. A statistical analysis was performed to further investigate the choice of ligands for organoboron nucleophiles. Fig. 4e shows that monodentate ligands are the most prevalent choice overall, while bidentate ligands are only occasionally used for boronic acids or borate anions. Interestingly, bidentate phosphine ligands are predominantly used in the reactions involving pro-nucleophiles (RNu–[H]). For unactivated C–H bonds, the nickelation step is typically slow and is sometimes the rate-determining step.49 Bidentate ligands are favored in these cases by stabilizing the C–Ni intermediates.
The role of an external base in promoting transmetallation has been widely accepted.50,51 To further understand the synergy between the type of base and nucleophiles, we performed a quantitative analysis correlating pKa values in water with the strength of the nucleophile, as shown in Fig. 4f. In this plot, Δq represents the relative reactivity of the nucleophile. These data are determined by comparing the charge of the nucleophilic carbon atom in the original nucleophile (RNu–[M]) with the charge of the carbon atom when bonded to a hydrogen atom (RNu–[H]). A more negative Δq reflects a stronger nucleophile. A staircase-like distribution is observed. For strong nucleophiles, such as RNu–[Mg] and RNu–[Zn] reagents, an external base is typically not needed, except in scenarios where the electrophiles contain reactive hydrogen like –OH.52,53 In such instances, a strong base may be introduced to convert the alcohol into its corresponding salt.53 For RNu–[B] reagents, weak or moderate non-nucleophilic bases are often used (CO32−, PO43−, and tBuO−). Reactions involving RNu–[H] reagents typically require the addition of external base, except when nucleophiles with highly acidic C–H bonds are utilized.54 The selection of the base for RNu–[H] reagents is influenced by the mechanism of the deprotonation. For the direct deprotonation of a high pKa C–H, strong organic bases can be employed.55 Instead, for C–H activation processes, inorganic bases, such as CO32−, are often used to facilitate the concerted metalation-deprotonation step.51 Furthermore, in reactions involving olefin reagents (Heck couplings), amines are sometimes also employed to deprotonate Ni-hydrides in the catalytic cycle.56 The correlation of the base with different substrates underscores the nuanced and multifaceted nature of the base.
In the interactive research mode, the background and reaction recommendation modules are the two core components (Fig. 5), both operating on the retrieval augmented generation framework. This framework expands the LLM's data access by incorporating external data sources. Preliminary summaries of individual papers are compiled with the assistance of GPT-4o. Additional descriptors from our database are also imported. Meanwhile, to enable descriptor-based analysis for unknown substrates or ligands, this mode utilizes RDKit59 for initial calculations. Two transformer-based models are employed to predict the TEP and Boltzmann-weighted buried volume for monodentate phosphine ligands. For a certain number of target substrates or ligands, it is recommended to organize their descriptors into a structured table and upload it to GPT-NiCOBot. After automatic embedding processing, GPT-NiCOBot can analyze the data and provide responses based on the external database. In addition to the citation networks discussed above, a dataset containing concise citation content is also constructed. Using GraphRAG60 for data retrieval, the background module can effectively provide accurate answers supported by experimental data. The research module allows input in SMILES, IUPAC names, and even images of structures. It can qualitatively recommend reaction conditions (ligand, solvent, base, temperature, etc.) from a set of substrates.
The reaction recommendation module assesses substrate compatibility and proposes optimal reaction conditions. Based on subtle functional differences, this module is further divided into the human to machine mode and machine to human mode. The human to machine mode is designed for users who have clear needs and provides assistance through Q&A functionality. When a user wants to inquire about suitable reaction conditions for a specific electrophile, it gradually recommends nucleophiles and the corresponding conditions. This step-by-step interaction enables users to swiftly identify the optimal reaction and its parameters. By using reaction conditions as search prompts, the module retrieves the most pertinent reactions from the embedded dataset for the user. If the backend deems the inputs infeasible, it offers a list of suggested reactions, together with a rationale and references to relevant literature. The machine to human mode is led by GPT-NiCOBot and is primarily used to test a user's understanding of the specific field. Unlike the human to machine mode, this mode focuses more on evaluation. GPT-NiCOBot generates questions based on the reaction database and evaluates the user's answers by integrating the background module, chemical tools, and established chemical patterns. The system can determine whether a user's response is reasonable and whether it adheres to chemical principles. It then provides corrections and feedback to help users deepen their understanding of the field.
Supplementary information (SI): data analysis and GPT-NiCOBot (.docx); collected data and calculated descriptors (.xlsx). See DOI: https://doi.org/10.1039/d5qo00947b.
Footnote |
| † These authors made equal contributions to this article. |
| This journal is © the Partner Organisations 2026 |