Jump to main content
Jump to site search


Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

Author affiliations

Abstract

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

Graphical abstract: Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

Back to tab navigation

Supplementary files

Publication details

The article was received on 01 Oct 2019, accepted on 05 Nov 2019 and first published on 05 Nov 2019


Article type: Edge Article
DOI: 10.1039/C9SC04944D
Chem. Sci., 2020, Advance Article
  • Open access: Creative Commons BY license
    All publication charges for this article have been paid for by the Royal Society of Chemistry

  •   Request permissions

    Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

    A. Thakkar, T. Kogej, J. Reymond, O. Engkvist and E. J. Bjerrum, Chem. Sci., 2020, Advance Article , DOI: 10.1039/C9SC04944D

    This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. Material from this article can be used in other publications provided that the correct acknowledgement is given with the reproduced material.

    Reproduced material should be attributed as follows:

    • For reproduction of material from NJC:
      [Original citation] - Published by The Royal Society of Chemistry (RSC) on behalf of the Centre National de la Recherche Scientifique (CNRS) and the RSC.
    • For reproduction of material from PCCP:
      [Original citation] - Published by the PCCP Owner Societies.
    • For reproduction of material from PPS:
      [Original citation] - Published by The Royal Society of Chemistry (RSC) on behalf of the European Society for Photobiology, the European Photochemistry Association, and RSC.
    • For reproduction of material from all other RSC journals:
      [Original citation] - Published by The Royal Society of Chemistry.

    Information about reproducing material from RSC articles with different licences is available on our Permission Requests page.

Search articles by author

Spotlight

Advertisements