Defining and navigating macrocycle chemical space†

Macrocyclic compounds (MCs) are of growing interest for inhibition of challenging drug targets. We consider afresh what structural and physicochemical features could be relevant to the bioactivity of this compound class. Using these features, we performed Principal Component Analysis to map oral and non-oral macrocycle drugs and clinical candidates, and also commercially available synthetic MCs, in structure–property space. We find that oral MC drugs occupy defined regions that are distinct from those of the non-oral MC drugs. None of the oral MC regions are effectively sampled by the synthetic MCs. We identify 13 properties that can be used to design synthetic MCs that sample regions overlapping with oral MC drugs. The results advance our understanding of what molecular features are associated with bioactive and orally bioavailable MCs, and illustrate an approach by which synthetic chemists can better evaluate MC designs. We also identify underexplored regions of macrocycle chemical space.

that N corresponds to the number of atoms that provide the shortest path around the ring that returns to the atom at which the count was started. In compounds containing multiple rings, the main MC ring was defined as the largest ring, as determined using this shortest-path method. Thus, for example, the non-oral MC drug oritavancin, shown below, was defined as a system of three fused macrocycle rings, of sizes 16, 16 and 12, rather than as a single 34membered macrocyclic ring with internal bridging atoms. Consequently, for the purposes of this study, oritavancin is considered to be a 16-membered macrocycle.
In cases where a compound contained two or more "largest rings" of equal size, examination of the assignment made by ChemAxon to specific structures suggests that in such cases the more central -i.e. the one with the greatest number of other large rings attached -was considered the main MC ring (see oritavancin, above). The other equally sized or smaller rings were considered to be substituents attached to the main MC ring. While this distinction doesn't affect the assigned value of the MC ring size, N, it is important because it potentially affects the values of other descriptors that capture the structure and composition of the ring and its substituents and peripheral groups.
A small number of compounds in the study contained four large rings that included two equally central rings of the same size (e.g. dalbavancin, shown below left, which by our definition comprises a 14-16-16-12 system). In these cases, examination of the assignment made by ChemAxon to specific multi-ringed structures suggests that in such cases the ring with the largest rings attached was considered to be the main MC ring. Thus, for dalbavancin, the 16-membered ring indicated with an asterisk, which is directly fused to a 16-and a 14membered ring, is considered paramount over the other 16-membered ring which is fused to a 16-and a 12-membered ring. In cases of this kind, the entire fused ring system containing the 16-and 12-membered rings is considered as a single large substituent off the main

Molecular Descriptors
Descriptor numbers cited in the main text refer to the numbers used in the following list.  For the purpose of this descriptor, a "bridge" is defined as a feature of fused rings, wherein the two connection points to the MC are separated by one or more atoms (see figure). Atoms in the main MC ring that lie between the connection points of the bridge are designated "bridging atoms". For clarity, the example of a bridged structure shown below right has a single bridging atom.
Pseudocode for identifying bridges: -Cycle through unique branches

Ratio of MC Ring Nitrogens to MC Ring Oxygens ((RingN+1)/(RingO+1)):
Ratio of nitrogen to oxygen atoms in the main macrocycle ring. Note: 1 is added to both numerator and denominator to avoid division by zero errors.

Molecule Total Degree of Unsaturation (DOU):
Count of the number of atomic valences on the molecule's heavy-atom framework that are not occupied by hydrogen or halogen. Primary, secondary, tertiary and quaternary carbon atoms are considered to have, respectively, 3, 2, 1 or 0 available positions for attachment to hydrogen/halogen; primary, secondary and tertiary nitrogen atoms are respectively considered to have 2, 1, and 0 positions; and primary and secondary oxygen atoms are considered to have 1 or zero positions. Sulfur atoms, which are rare among the chemotypes included in the analysis, are ignored. DOU is thus given by the following formula:

Where C = carbons; H = hydrogens; Hal = halogens; N = Nitrogens.
For example, using this formula, an alkane will have DOU = 0, while a cycloalkane will have DOU = 1 because the ring closure represents a degree of unsaturation. A value of DOU = 1 is therefore the lowest value possible for a macrocycle, and would indicate that the compound has an exclusively sigma-bonded framework on which all remaining valences are saturated through boding to H or a halogen. The maximum possible value is DOU = C + N/2 + 1, corresponding to a situation in which all valences are engaged in sigma or pi bonds connecting the heavy atom framework of the compound, with no hydrogen of halogen atoms present.
Total halogens (numHalogens) in the molecule are calculated as the sum of total fluorines, chlorines, bromines and iodines as counted using methods described above.

Standard Deviation of Gap Sizes between Substituents (Gap Size st dev):
The population standard deviation of gap sizes, as described in MolD_72. A large standard deviation means the substituents' connection points are distributed irregularly around the macrocycle ring, whereas a zero standard deviation means the substituents are evenly spaced (see Figure  below for examples).
The standard deviation of all gap sizes is calculated during the gap analysis algorithm described above (72).

Maximum Gap Size between Substituents, Normalized (Max Gap Size/N):
The maximum gap size, from MolD_72, normalized to the size of the macrocycle ring (see Figure  below MolD_79 for examples). The resulting value represents the largest fraction of the main macrocycle ring circumference that is free of substituents. Bridging atoms: Count of how many atoms in the main macrocycle ring separate the attachment points of a connecting ring. The attachment points themselves do not count. Note, the shorter path between attachment points is considered the "bridge". Thus, in the figure below the upper structure is considered as a 14-membered macrocycle (black) with two fused rings attached (blue) and a total of three bridging atoms (red). The lower structure is considered to be a 16-membered macrocycle, also with two fused rings attached, and with five bridging atoms. The total number of bridging atoms (numBridgeAtoms) is alculated using an algorithm that cycles through each fused ring substituent (as identified above), calculating the longest and shortest path (in MC atoms) between the two MC atoms which serve as connection points for the fused substituent. If the shortest path between these connection points is zero atoms, there are no bridging atoms. If the shortest path between these connection points is one or more atom, each of these intervening atoms are flagged and counted as bridging atoms.
normalized to the size of the macrocycle ring. Restricted bonds include: § Bonds that are part of a ring fusion § Amide bonds § Double, triple and aromatic bonds Bonds espousing two or more of the above qualities are only counted once.
The restricted fraction (restrictedFraction) is identified using an algorithm which cycles through the MC bonds checking for bond order using the getBondType() method from the class TopologyAnalyserPlugin, amides flagged during the algorithm described below (84) and fusions flagged during the algorithms described above (36).
See the Figure below for  amide. The sum is reported here, and the "amide" flag is further used in restricted/unrestricted fraction algorithms described above.

Ring Complexity without heteroatoms (rComplx):
Fraction of valences of carbon atoms in the main macrocycle ring that bonded to a substituent, a peripheral group, or involved in a π-bond with an adjacent MC ring atom, as a measure of how densely the main macrocycle ring is decorated. Each carbon atom in the MC ring can engage in two bonds in addition to the minimum of two sigma bonds that connect it to its neighboring MC ring atoms. The complexity of decoration can therefore be quantified as the number of MC ring carbon valences that are not bonded to hydrogen atoms, divided by the total number of ring carbon valences (= 2x the number of ring carbons), as per the following equation: Where rC is the number of carbon atoms in the main macrocycle ring, and rCH is the number of hydrogen atoms directly bonded to ring carbons (identified using standard connectivity methods)

Ring Complexity with heteroatoms (rComplxHet):
Fraction of valences of all atoms in the main macrocycle ring that bonded to a substituent, a peripheral group, or involved in a πbond with an adjacent ring atom, as a measure of how densely the main macrocycle ring is decorated. Differs from MolD_85 in including ring nitrogen and sulfur atoms as potential attachment points for substituents or peripheral groups, or as potential partners in π-bonds with other ring atoms. Each carbon atom in the ring can form two bonds in addition to the minimum of two sigma bonds that connect it to its neighboring ring atoms. Each nitrogen atom in the ring can form up to one such additional bond. We considered that each sulfur atoms in the ring might form up to two additional bonds (with oxygen, e.g. in a sulfoxide or a sulfone). The complexity of decoration, taking all of these possible attachment points on all atoms of the main macrocycle ring, can therefore be quantified as the number of ring carbon and nitrogen valences that are not bonded to hydrogen atoms, plus the number of S=O bonds involving ring sulfur atoms, all divided by the total number of ring atom valences (= 2x the number of ring carbons + 1x the number of ring nitrogens, plus 2x the number of ring sulfurs), as per the following equation: