Chemical language models for natural product discovery

Abstract

Covering: up to 2026

Natural products are an important source of medicines, yet their discovery can be a slow and laborious process. The recent development of chemical language models (CLMs), which process string-based molecular representations, is reshaping the field of natural product science. This review provides an overview of the role of CLMs in natural product drug discovery, tracing their evolution from early neural networks to modern large-scale Transformers. We describe how these models accelerate discovery timelines by predicting bioactivity, biosynthetic pathways, and spectral data. Furthermore, we cover their use in proposing novel, natural-product-like scaffolds that expand the computationally explored chemical space. The review also addresses persistent challenges, including the limited availability of natural product data and the need for model interpretability. Finally, we discuss future directions, outlining the current status and prospects for CLM-enabled natural product science.

Graphical abstract: Chemical language models for natural product discovery

Article information

Article type
Review Article
Submitted
12 Jan 2026
First published
12 May 2026
This article is Open Access
Creative Commons BY license

Nat. Prod. Rep., 2026, Advance Article

Chemical language models for natural product discovery

K. Sakano, K. Furui, A. Kengkanna, Y. Kikuchi and M. Ohue, Nat. Prod. Rep., 2026, Advance Article , DOI: 10.1039/D6NP00002A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements