Intermediate knowledge enhanced the performance of the amide coupling yield prediction model

Abstract

Amide coupling is an important reaction widely applied in medicinal chemistry. However, condition recommendation remains a challenging issue due to the broad condition space. Recently, accurate condition recommendation via machine learning has emerged as a novel and efficient method to find suitable conditions to achieve the desired transformations. Nonetheless, accurately predicting yields is challenging due to the complex relationships involved. Herein, we present our strategy to address this problem. Two steps were taken to ensure the quality of the dataset. First, we selected a diverse and representative set of substrates to capture a broad spectrum of substrate structures and reaction conditions using an unbiased machine-based sampling approach. Second, experiments were conducted using our in-house high-throughput experimentation (HTE) platform to minimize the influence of human factors. Additionally, we proposed an intermediate knowledge-embedded strategy to enhance the model's robustness. The performance of the model was first evaluated at three different levels—random split, partial substrate novelty, and full substrate novelty. All model metrics in these cases improved dramatically, achieving an R2 of 0.89, MAE of 6.1%, and RMSE of 8.0% in the full substrate novelty test dataset. Moreover, the generalization of our strategy was assessed using external datasets from reported literature, delivering an R2 of 0.71, MAE of 7%, and RMSE of 10%. Meanwhile, the model could recommend suitable conditions for some reactions to elevate the reaction yields. Besides, the model was able to identify which reaction in a reaction pair with a reactivity cliff had a higher yield. In summary, our research demonstrated the feasibility of achieving accurate yield predictions through the combination of HTE and embedding intermediate knowledge into the model. This approach also has the potential to facilitate other related machine learning tasks.

Graphical abstract: Intermediate knowledge enhanced the performance of the amide coupling yield prediction model

Supplementary files

Article information

Article type
Edge Article
Submitted
09 May 2025
Accepted
27 May 2025
First published
05 Jun 2025
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry
Creative Commons BY-NC license

Chem. Sci., 2025, Advance Article

Intermediate knowledge enhanced the performance of the amide coupling yield prediction model

C. Zhang, Q. Lin, C. Yang, Y. Kong, Z. Yu and K. Liao, Chem. Sci., 2025, Advance Article , DOI: 10.1039/D5SC03364K

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements