Tracing compound pathways using chemical space networks†
Abstract
Similarity-based compound networks are used as coordinate-free representations of chemical space. In so-called chemical space networks (CSNs), nodes represent compounds and edges pairwise similarity relationships. Nodes can be annotated with activity information, which enables visualization of structure–activity relationship (SAR) patterns. A major determinant of CSN structure and topology is the way in which similarity relationships are determined. Using different similarity measures, a number of CSN variants have been generated previously. Herein, we report a new type of CSN with an asymmetric similarity metric based upon the maximum common substructure of compound pairs. While CSNs have thus far mostly been used for SAR visualization, the new CSN variant was designed for another medicinal chemistry application, i.e. the identification of compound pathways in data sets. In this CSN, pathways consisting of structurally related compounds with increasing size can be systematically traced, which represent models of compound optimization paths. Compound series forming such paths can be extracted from the CSN. The network-based identification of hit-to-lead or lead optimization series in compound data sets is intuitive and thought to provide valuable information for medicinal chemistry.