Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

Amol Thakkar; Thierry Kogej; Jean-Louis Reymond; Ola Engkvist; Esben Jannik Bjerrum

doi:10.1039/C9SC04944D

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain†

Amol Thakkar,

*^ab Thierry Kogej,^a Jean-Louis Reymond,

^b Ola Engkvist^a and Esben Jannik Bjerrum

*^a

Author affiliations

* Corresponding authors

^a Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
E-mail: esben.bjerrum@astrazeneca.com

^b Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
E-mail: amol.thakkar@dcb.unibe.ch

Abstract

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

This article is part of the themed collections: Most popular 2019-2020 physical and theoretical chemistry articles and Accelerating Chemistry Symposium Collection

Supplementary files

Article information

DOI: https://doi.org/10.1039/C9SC04944D
Article type: Edge Article
Submitted: 01 Oct 2019
Accepted: 05 Nov 2019
First published: 05 Nov 2019
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2020,11, 154-168

Permissions

Request permissions

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

A. Thakkar, T. Kogej, J. Reymond, O. Engkvist and E. J. Bjerrum, Chem. Sci., 2020, 11, 154 DOI: 10.1039/C9SC04944D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Chemical Science

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

Social activity

Search articles by author

Spotlight

Advertisements