Issue 6, 2022

Molecular set transformer: attending to the co-crystals in the Cambridge structural database


In this paper we introduce molecular set transformer, a Pytorch-based deep learning architecture designed for solving the molecular pair scoring task whilst tackling the class imbalance problem observed on datasets extracted from databases reporting only successful synthetic attempts. Our models are being trained on all the existing molecular pairs that form co-crystals and are deposited in the Cambridge Structural Database (CSD). Given any new molecular combination, the primary objective of the tool is to be able to select the most effective way to represent the pair and then assign a score coupled with an uncertainty estimation. Molecular set transformer is an attention-based framework which learns the important interactions in the various molecular combinations by trying to reconstruct its input by minimizing its bidirectional loss. Several methods to represent the input were tested, both fixed and learnt, with the Graph Neural Network (GNN) and the Extended-Connectivity Fingerprints (ECFP4) molecular representations to perform best showing an overall accuracy higher than 75% on previously unseen data. The trustworthiness of the models is enhanced by adding uncertainty estimates which aims to help chemists prioritize at the early materials design stage both the pairs with high scores and low uncertainty and pairs with low scores and high uncertainty. Our results indicate that the method can achieve comparable or better performance on specific APIs for which the accuracy of other computational chemistry and machine learning tools is reported in the literature. To help visualize and get further insights of all the co-crystals deposited in CSD, we developed an interactive browser-based explorer ( An online Graphical User Interface (GUI) has also been designed for enabling the wider use of our models for rapid in silico co-crystal screening reporting the scores and uncertainty of any user given molecular pair (

Graphical abstract: Molecular set transformer: attending to the co-crystals in the Cambridge structural database

Supplementary files

Article information

Article type
28 Jun 2022
29 Sep 2022
First published
29 Sep 2022
This article is Open Access
Creative Commons BY license

Digital Discovery, 2022,1, 834-850

Molecular set transformer: attending to the co-crystals in the Cambridge structural database

A. Vriza, I. Sovago, D. Widdowson, V. Kurlin, P. A. Wood and M. S. Dyer, Digital Discovery, 2022, 1, 834 DOI: 10.1039/D2DD00068G

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity