“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models

Philippe Schwaller; Théophile Gaudin; Dávid Lányi; Costas Bekas; Teodoro Laino

doi:10.1039/C8SC02339E

You do not have JavaScript enabled. Please enable JavaScript to access the full features of the site or access our non-JavaScript page.

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models†

Philippe Schwaller,

‡*^a Théophile Gaudin,‡^a Dávid Lányi,^a Costas Bekas^a and Teodoro Laino^a

Author affiliations

* Corresponding authors

^a IBM Research, Zurich, Switzerland
E-mail: {phs,tga,dla,bek,teo}@zurich.ibm.com

Abstract

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset.

This article is part of the themed collection: Most popular 2018-2019 physical and theoretical chemistry articles

Download options Please wait...

Supplementary files

Supplementary information PDF (989K)

Article information

DOI: https://doi.org/10.1039/C8SC02339E
Article type: Edge Article
Submitted: 28 May 2018
Accepted: 20 Jun 2018
First published: 22 Jun 2018
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2018,9, 6091-6098

Permissions

Request permissions

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models

P. Schwaller, T. Gaudin, D. Lányi, C. Bekas and T. Laino, Chem. Sci., 2018, 9, 6091 DOI: 10.1039/C8SC02339E

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Social activity

Fetching data from CrossRef.
This may take some time to load.

Chemical Science

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models†

Abstract

Supplementary files

Article information

Download Citation

Permissions

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models

Social activity

Search articles by author

Spotlight

Advertisements