An automated evaluation agent for Q&A pairs and reticular synthesis conditions

Nakul Rampal; Dongrong Joe Fu; Chengbin Zhao; Hanan S. Murayshid; Albatool A. Abaalkhail; Nahla E. Alhazmi; Majed O. Alawad; Christian Borgs; Jennifer T. Chayes; Omar M. Yaghi

doi:10.1039/D5DD00413F

An automated evaluation agent for Q&A pairs and reticular synthesis conditions

Nakul Rampal,

†^abc Dongrong Joe Fu,

†^c Chengbin Zhao,

^abc Hanan S. Murayshid,^d Albatool A. Abaalkhail,^e Nahla E. Alhazmi,^f Majed O. Alawad,^g Christian Borgs,

*^ch Jennifer T. Chayes*^chijk and Omar M. Yaghi

*^abcg

Author affiliations

* Corresponding authors

^a Department of Chemistry, University of California, Berkeley, California 94720, USA
E-mail: yaghi@berkeley.edu

^b Kavli Energy Nanoscience Institute, University of California, Berkeley, California 94720, USA

^c Bakar Institute of Digital Materials for the Planet, College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
E-mail: jchayes@berkeley.edu, borgs@berkeley.edu

^d Artificial Intelligence & Robotics Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia

^e Center of Excellence for Advanced Materials and Manufacturing, King Abdulaziz City for Science and Technology (KACST), Saudi Arabia

^f Hydrogen Technologies Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia

^g KACST-UC Berkeley Center of Excellence for Nanomaterials for Clean Energy Applications, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia

^h Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA

ⁱ Department of Mathematics, University of California, Berkeley, California 94720, USA

^j Department of Statistics, University of California, Berkeley, California 94720, USA

^k School of Information, University of California, Berkeley, California, USA

Abstract

We report an automated evaluation agent that can reliably assign classification labels to different Q&A pairs of both single-hop and multi-hop types, as well as to synthesis conditions datasets. Our agent is built around a suite of large language models (LLMs) and is designed to eliminate human involvement in the evaluation process. Even though we believe that this approach has broad applicability, for concreteness, we apply it here to reticular chemistry. Through extensive testing of various approaches such as DSPy and finetuning, among others, we found that the performance of a given LLM on these Q&A and synthesis conditions classification tasks is determined primarily by the architecture of the agent, where how the different inputs are parsed and processed and how the LLMs are called make a significant difference. We also found that the quality of the prompt provided remains paramount, irrespective of the sophistication of the underlying model. Even models considered state-of-the-art, such as GPT-o1, exhibit poor performance when the prompt lacks sufficient detail and structure. To overcome these challenges, we performed systematic prompt optimization, iteratively refining the prompt to significantly improve classification accuracy and achieve human-level evaluation benchmarks. We show that while LLMs have made remarkable progress, they still fall short of human reasoning without substantial prompt engineering. The agent presented here provides a robust and reproducible tool for evaluating Q&A pairs and synthesis conditions in a scalable manner and can serve as a foundation for future developments in automated evaluation of LLM inputs and outputs and more generally to create foundation models in chemistry.

Digital Discovery

An automated evaluation agent for Q&A pairs and reticular synthesis conditions

Abstract

Supplementary files

Article information

Download Citation

Permissions

An automated evaluation agent for Q&A pairs and reticular synthesis conditions

Social activity

Search articles by author

Spotlight

Advertisements