Issue 7, 2024

Extension of multi-site analogue series with potent compounds using a bidirectional transformer-based chemical language model

Abstract

Generating potent compounds for evolving analogue series (AS) is a key challenge in medicinal chemistry. The versatility of chemical language models (CLMs) makes it possible to formulate this challenge as an off-the-beaten-path prediction task. In this work, we have devised a coding and tokenization scheme for evolving AS with multiple substitution sites (multi-site AS) and implemented a bidirectional transformer to predict new potent analogues for such series. Scientific foundations of this approach are discussed and, as a benchmark, the transformer model is compared to a recurrent neural network (RNN) for the prediction of analogues of AS with single substitution sites. Furthermore, the transformer is shown to successfully predict potent analogues with varying R-group combinations for multi-site AS having activity against many different targets. Prediction of R-group combinations for extending AS with potent compounds represents a novel approach for compound optimization.

Graphical abstract: Extension of multi-site analogue series with potent compounds using a bidirectional transformer-based chemical language model

Article information

Article type
Research Article
Submitted
10 Jun 2024
Accepted
15 Jun 2024
First published
17 Jun 2024

RSC Med. Chem., 2024,15, 2527-2537

Extension of multi-site analogue series with potent compounds using a bidirectional transformer-based chemical language model

H. Chen, A. Yoshimori and J. Bajorath, RSC Med. Chem., 2024, 15, 2527 DOI: 10.1039/D4MD00423J

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements