Issue 3, 2020, Issue in Progress

Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions

Abstract

We consider retrosynthesis to be a machine translation problem. Accordingly, we apply an attention-based and completely data-driven model named Tensor2Tensor to a data set comprising approximately 50 000 diverse reactions extracted from the United States patent literature. The model significantly outperforms the seq2seq model (37.4%), with top-1 accuracy reaching 54.1%. We also offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced chemists select and analyze the “wrong” predictions that may be chemically plausible but differ from the ground truth. The effectiveness of our model is found to be underestimated and the “true” top-1 accuracy reaches as high as 64.6%.

Graphical abstract: Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions

Supplementary files

Article information

Article type
Paper
Submitted
18 Oct 2019
Accepted
25 Dec 2019
First published
08 Jan 2020
This article is Open Access
Creative Commons BY license

RSC Adv., 2020,10, 1371-1378

Retrosynthesis with attention-based NMT model and chemical analysis of “wrong” predictions

H. Duan, L. Wang, C. Zhang, L. Guo and J. Li, RSC Adv., 2020, 10, 1371 DOI: 10.1039/C9RA08535A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements