The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot

David M. Andrews; Laura M. Broad; Paul J. Edwards; David N. A. Fox; Timothy Gallagher; Stephen L. Garland; Richard Kidd; Joseph B. Sweeney

doi:10.1039/C6SC00264A

The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot†‡

David M. Andrews,

§*^a Laura M. Broad,

^b Paul J. Edwards,^c David N. A. Fox,^a Timothy Gallagher,

^b Stephen L. Garland,

^d Richard Kidd

^a and Joseph B. Sweeney

Author affiliations

* Corresponding authors

^a Royal Society of Chemistry, Thomas Graham House, Science Park, Milton Road, Cambridge, UK
E-mail: david.andrews@astrazeneca.com

^b School of Chemistry, University of Bristol, Bristol, UK

^c Scicate Limited, Mendip Court, Bath Road, Wells, Somerset, UK

^d NQuiX Ltd, Causeway House, Dane Street, Bishops Stortford, Hertfordshire, UK

^e Department of Chemical Sciences, University of Huddersfield, Huddersfield HD1 3DH, UK

Abstract

We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70% were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5–80% depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18% of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C6SC00264A
Article type: Edge Article
Submitted: 19 Jan 2016
Accepted: 22 Feb 2016
First published: 23 Feb 2016
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2016,7, 3869-3878

Author version available

Download author version (PDF)

Permissions

Request permissions

The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot

D. M. Andrews, L. M. Broad, P. J. Edwards, D. N. A. Fox, T. Gallagher, S. L. Garland, R. Kidd and J. B. Sweeney, Chem. Sci., 2016, 7, 3869 DOI: 10.1039/C6SC00264A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Chemical Science

The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot†‡

Abstract

Supplementary files

Article information

Download Citation

Author version available

Permissions

The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot

Social activity

Search articles by author

Spotlight

Advertisements