Chemical language models for natural product discovery
Abstract
Covering: up to 2026
Natural products are an important source of medicines, yet their discovery can be a slow and laborious process. The recent development of chemical language models (CLMs), which process string-based molecular representations, is reshaping the field of natural product science. This review provides an overview of the role of CLMs in natural product drug discovery, tracing their evolution from early neural networks to modern large-scale Transformers. We describe how these models accelerate discovery timelines by predicting bioactivity, biosynthetic pathways, and spectral data. Furthermore, we cover their use in proposing novel, natural-product-like scaffolds that expand the computationally explored chemical space. The review also addresses persistent challenges, including the limited availability of natural product data and the need for model interpretability. Finally, we discuss future directions, outlining the current status and prospects for CLM-enabled natural product science.

Please wait while we load your content...