C–H functionalisation tolerant to polar groups could transform fragment-based drug discovery (FBDD)

We have analysed 131 fragment-to-lead (F2L) examples targeting a wide variety of protein families published by academic and industrial laboratories between 2015–2019. Our assessment of X-ray structural data identifies the most common polar functional groups involved in fragment-protein binding are: N–H (hydrogen bond donors on aromatic and aliphatic N–H, amides and anilines; totalling 35%), aromatic nitrogen atoms (hydrogen bond acceptors; totalling 23%), and carbonyl oxygen group atoms (hydrogen bond acceptors on amides, ureas and ketones; totalling 22%). Furthermore, the elaboration of each fragment into its corresponding lead is analysed to identify the nominal synthetic growth vectors. In ∼80% of cases, growth originates from an aromatic or aliphatic carbon on the fragment and more than 50% of the total bonds formed are carbon–carbon bonds. This analysis reveals that growth from carbocentric vectors is key and therefore robust C–H functionalisation methods that tolerate the innate polar functionality on fragments could transform fragment-based drug discovery (FBDD). As a further resource to the community, we have provided the full data of our analysis as well as an online overlay page of the X-ray structures of the fragment hit and leads: https://astx.com/interactive/F2L-2021/


Defining growth vectors
We recognise that defining nominal growth vectors is somewhat subjective, so we created a set of guidelines to try to ensure consistency (Supplementary Information, Figure S1).


Nominal growth vectors are highlighted as red bonds, when it is not synthetically sensible to highlight the observed change as nominal growth, a synthetically viable bond is instead highlighted in cyan, (e.g. Figure  S2, 2015-17)  A growth vector is defined as being where a new group has been added to the fragment, even if this group is small e.g. ArC-H → ArC-Me ( Figure S1, 2015-2)  If a pre-existing group is modified only slightly (e.g. homologation/ dehomologation) and does not engage any additional protein interactions, this is not counted as a growth vector e.g. nPr → Et ( Figure S1, 2015-6)  If a ring or heterocycle has been changed or expanded, without changing the pharmacophore, this is not defined e.g. pyridine → pyrazole ( Figure S1, 2015-7), 6-→ 7-membered ring expansion ( Figure S1, 2015-4)  Groups removed from a fragment are not highlighted e.g. ArC-Cl → ArC-H ( Figure S1, 2015-2)  In some cases, a fragment atom was changed to enable a growth vector, this has been highlighted e.g. pyridyl-N → phenyl-CH ( Figure S1, 2015-2)  If a heteroatom has been added to the initial fragment scaffold, this is highlighted in red even if this is not a growth vector ( Figure S1, 2019-1), we have done this to highlight the breadth of different heterocycles encountered in FBDD  The type of bond being formed when growing from the fragment is defined irrespective of the starting fragment atom e.g. the C(sp 2 )-N segment includes cases where a nitrogen is added to a fragment-C(sp 2 ) atom and where a C(sp 2 ) atom (e.g. arene or alkene) is added to a nitrogen atom located on the fragment For the majority of the cases in Table S1, defining nominal growth vectors under the constraints listed above was relatively straightforward, however, some cases were more challenging and Figure S2 Figure S2) shows an example where the approximate designation of growth clearly conflicted with what was synthetically viable. Here, nominal growth is observed to be double alkylation of the amide N-H (shown with red arrows), however amide bond formation is synthetically straightforward and would permit a greater scope of analogues accessible in SAR exploration. In instances like this, the synthetically viable, rather than the strictly nominal, growth vector has been defined (Table S1 & Figure S2, cyan bonds).
In our analysis, we also found examples requiring both the designation of a strictly nominal (red bond) and a more synthetically viable growth vector (cyan bond). This is highlighted in the case of 2017-14 ( Figure S2), where ArC-F → to ArC-OAr growth is nominal (red bond), however, the nominal growth vector of the sulfonamide is observed to be from the CH of the methyl group. Considering the robustness of sulfonamide chemistry and the challenge of methyl C-H activation, we have defined the synthetically viable bond between the aniline and the sulfur as being the growth for this case ( Figure S2, cyan bond).
We have also encountered more complex examples when defining growth vectors in this dataset, such as 2019-16 ( Figure S2). In this example, though the change of an aromatic ethyl to a phenyl can be defined as a simple nominal growth vector, designating the other vectors proved more difficult due to inverted stereochemistry between the fragment and the lead, in addition to the change in linking atom within the fragment scaffold. In this case, we have defined the ArC-N → ArC-O as a synthetically viable growth vector but have also highlighted the methyl → benzyl switch at the stereogenic centre as this comprises both the nominal growth and a change in stereochemistry from the initial fragment ( Figure S2).

Figure S1
Illustrates the guidelines we used to define nominal growth vectors. Fragment and corresponding lead showing the fragment polar binding groups (blue circles) and the nominal fragment growth vectors (red arrows). The new binding groups added onto the lead during fragment elaboration represent hypothetical synthetic bonds (red or cyan bonds). Guidelines for defining growth vectors are summarised in the final column.

Fragment
Lead Table Entry Guidelines The overlay pages provide a curated view of a series of protein-ligand structures. The structures can be explored and displayed through the heirarchical menus in the right hand panel.
Structures have some basic top-level controls: checkboxes and colour pickers to control the protein, ligand, waters and simple molecular surfaces.
Expanding a structure displays further controls for different display styles and controls to turn on electron density maps (where available). The maps are often clipped to the immediate vicinity of the ligand to minimize file sizes.

Keyboard Shortcuts
The following keyboard shortcuts are available when the NGL Viewer has focus (i.e. after you click on the viewer area).