Distilling and exploiting quantitative insights from Large Language Models for enhanced Bayesian optimization of chemical reactions

Roshan A Patel; Mingxuan Li; Chin-Fei Chang; Louis de Lescure; Saeed Moayedpour; Paul  Chauvin; Alan Cherney; Sven Jager; Yasser Jangjou

doi:10.1039/D6DD00052E

You do not have JavaScript enabled. Please enable JavaScript to access the full features of the site or access our non-JavaScript page.

Distilling and exploiting quantitative insights from Large Language Models for enhanced Bayesian optimization of chemical reactions

Roshan A Patel, Mingxuan Li, Chin-Fei Chang, Louis de Lescure, Saeed Moayedpour, Paul Chauvin, Alan Cherney, Sven Jager and Yasser Jangjou

Abstract

Machine learning and Bayesian optimization (BO) algorithms can significantly accelerate the optimization of chemical reactions. Transfer learning can bolster the effectiveness of BO algorithms in low-data regimes by leveraging pre-existing chemical information or data outside the direct optimization task (i.e., source data). Large Language Models (LLMs) have demonstrated that chemical information present in foundation training data can give them utility for processing chemical data. Furthermore, they can be augmented with and help synthesize potentially multiple modalities of source chemical data germane to the optimization task. In this work, we examine how chemical information from LLMs can be elicited and used for transfer learning to accelerate the BO of reaction conditions to maximize yield. Specifically, we show that a survey-like prompting scheme and preference learning can be used to infer a utility function which models prior chemical information embedded in LLMs over a chemical parameter space; we find that the utility function shows modest correlation to true experimental measurements (yield) over the parameter space despite operating in a zero-shot setting. Furthermore, we show that the utility function can be leveraged to focus BO efforts in promising regions of the parameter space, improving the yield of the initial BO query and enhancing optimization in a majority of the datasets studied. Overall, we view this work as a step towards bridging the gap between the chemistry knowledge embedded in LLMs and the capabilities of principled BO methods to accelerate reaction optimization.

Download options Please wait...

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

DOI: https://doi.org/10.1039/D6DD00052E
Article type: Paper
Submitted: 30 Jan 2026
Accepted: 29 Mar 2026
First published: 06 Apr 2026
This article is Open Access

Download Citation

Digital Discovery, 2025, Accepted Manuscript

Permissions

Request permissions

Distilling and exploiting quantitative insights from Large Language Models for enhanced Bayesian optimization of chemical reactions

R. A. Patel, M. Li, C. Chang, L. de Lescure, S. Moayedpour, P. Chauvin, A. Cherney, S. Jager and Y. Jangjou, Digital Discovery, 2025, Accepted Manuscript , DOI: 10.1039/D6DD00052E

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Digital Discovery

Distilling and exploiting quantitative insights from Large Language Models for enhanced Bayesian optimization of chemical reactions

Abstract

Transparent peer review

Article information

Download Citation

Permissions

Distilling and exploiting quantitative insights from Large Language Models for enhanced Bayesian optimization of chemical reactions

Social activity

Search articles by author

Spotlight

Advertisements