Synthesis and Machine Learning Techniques to Enable Data-Driven Investigation of Supramolecular Host-Guest Interactions
Abstract
The availability of large datasets such as the Protein DataBank and ChEMBL have allowed for rapid progress in developing machine learning tools for predicting the biological activity of organic small molecules. The binding between supramolecular hosts and their desired guests are governed by the same forces that drive protein-small molecule interactions, and yet this field has seen dramatically less application of machine learning. In this contribution, we demonstrate that the production of easily diversified building blocks can allow a single laboratory to generate a dataset that is sufficient to engage with modern machine learning approaches. A range of methods were evaluated against our single-laboratory dataset, with a graph neural network featuring an attention mechanism providing meaningful performance in this data-sparse arena.
Please wait while we load your content...