Structural variation of protein-ligand complexes of the first bromodomain of BRD4

The bromodomain-containing protein 4 (BRD4), a member of the bromodomain and extra-terminal domain(BET) family, plays a key role in several diseases, especially cancers. With increased interest in BRD4 as atherapeutic target, over 300 X-ray crystal structures of the protein in complex with small molecule inhibitorshave become publicly available over the recent decade. In this study, we use this structural information toinvestigate the conformations of the first bromodomain (BD1) of BRD4. Structural alignment shows a highlevel of similarity between the structures of BRD4-BD1, regardless of the bound ligand. We employ WONKA,a tool for detailed analyses of protein binding sites, to compare the active site of over 100 crystal structures,with a focus on the highly conserved network of water molecules, which line the binding pocket of BRD4-BD1.The analysis presented in this work helps guide the selection of the best structure of BRD4-BD1 to use instructure-based drug design, an important approach for faster and more cost-efficient lead discovery. The bromodomain-containing protein 4 (BRD4), a member of the bromodomain and extra-terminal domain (BET) family, plays a key role in several diseases, especially cancers. With increased interest in BRD4 as a therapeutic target, many X-ray crystal structures of the protein in complex with small molecule inhibitors are publicly available over the recent decade. In this study, we use this structural information to investigate the conformations of the ﬁrst bromodomain (BD1) of BRD4. Structural alignment of 297 BRD4-BD1 complexes shows a high level of similarity between the structures of BRD4-BD1, regardless of the bound ligand. We employ WONKA, a tool for detailed analyses of protein binding sites, to compare the active site of over 100 of these crystal structures. The positions of key binding site residues show a high level of conformational similarity, with the exception of Trp81. A focused analysis on the highly conserved water network in the binding site of BRD4-BD1 is performed to identify the positions of these water molecules across the crystal structures. The importance of the water network is illustrated using molecular docking and absolute free energy perturbation simulations. 82% of the ligand poses were better predicted when including water molecules as part of the receptor. Our analysis provides guidance for the design of new BRD4-BD1 inhibitors and the selection of the best structure of BRD4-BD1 to use in structure-based drug design, an important approach for faster and more cost-eﬃcient lead discovery.

domain(BET) family, plays a key role in several diseases, especially cancers. With increased interest in BRD4 as atherapeutic target, over 300 X-ray crystal structures of the protein in complex with small molecule inhibitorshave become publicly available over the recent decade. In this study, we use this structural information toinvestigate the conformations of the first bromodomain (BD1) of BRD4. Structural alignment shows a highlevel of similarity between the structures of BRD4-BD1, regardless of the bound ligand. We employ WONKA,a tool for detailed analyses of protein binding sites, to compare the active site of over 100 crystal structures,with a focus on the highly conserved network of water molecules, which line the binding pocket of BRD4-BD1.The analysis presented in this work helps guide the selection of the best structure of BRD4-BD1 to use instructure-based drug design, an important approach for faster and more cost-efficient lead discovery.

Introduction
The bromodomains and extra-terminal domain (BET) family of proteins recognize acetylated N-terminal tails of histones through interactions of their bromodomains (BDs) and acetylated lysine residues. 1,2 BET proteins play a crucial role in regulating gene expression and are an important target to develop inhibitors to regulate protein expression by inhibiting these epigenetic interactions. The most extensively studied member of the BET family is the bromodomain-containing protein 4 (BRD4), because of its promise as a therapeutic target for diseases such as cancer, neurodegenerative disorders, inflammation and obesity. [3][4][5][6][7] Increased interest in BRD4 and its potential inhibitors has resulted in a wealth of X-ray crystal structures, providing atomistic insight into its binding site interactions. However, this also means that it is increasingly difficult to choose which crystal structure is preferred as the starting point of any in silico study. Within  Supplementary Information (ESI) available: A CSV file has been provided with a full list of PDB codes of X-ray crystal structures of BRD4-BD1 used in this study. The details of active site crystallographic water molecules in each PDB is also included as a CSV file and PDF. Input scripts for the molecular docking experiments and absolute free energy calculations are also provided. RMSD distributions from the docking analysis are provided. See DOI: 00.0000/00000000. this work, we use the structural information, which is publicly available, to help understand the flexibility of BRD4 and examine the effects of small molecule inhibitors, with a focus on the BD1 binding pocket. This analysis provides guidance in selecting crystal structures and other features, such as crystallographic water molecules.
Like its family members, BRD2, BRD3 and BRDT, BRD4 consists of two N-terminal BDs (BD1 and BD2) and an extra C-terminal domain (ET). 8 Each BD is composed of four helices (αZ, αA, αB and αC), which are connected by the ZA loop and BC loop, creating the acetyl-lysine binding site (Fig. 1a). BRD4-BD1 recognises histone H4, which is anchored by a hydrogen bonding interaction between the carbonyl oxygen atom on the acetylated lysine and an asparagine residue Asn140 on the BC loop of the receptor (Fig. 1b). 9 A second interaction is formed through a water mediated hydrogen bond between the acetyl lysine and tyrosine residue Tyr97 on the ZA loop. Additional binding site residues create a deep hydrophobic cavity, with Trp81 and Met149 also considered key residues in H4 and small molecule inhibitor binding. 10 Over the last decade, there have been many small molecule inhibitors published, with some of them reaching human clinical trials. [11][12][13][14][15][16] Contained within these inhibitors is a wide chemical diversity with core motifs encompassing azepines, 3,5-dimethyl isoxazoles, pyridones, triazolopyridines, tetrahydroquinolines, 4acyl pyrroles and 2-thiazolidinones. 17,18 Each of these structural classes contains a unique warhead, which competes with H4 to replicate the interactions with Asn140 and Tyr97. An additional common feature of small drug-like compounds bound to BRD4-BD1 is a lipophilic group, which can extend into the binding pocket and interact with the hydrophobic WPF shelf (Trp81-Pro82-Phe83).
The consideration of water molecules is important for ligand binding, as they can stabilize a complex by acting as a bridge for hydrogen bonds and their displacement by a ligand can result in a decrease in binding affinity of that ligand. 19 However, it is also possible for displacement to be associated with a favorable gain in entropy with the release of a well-ordered molecule into the bulk solvent, resulting in an increase in ligand binding affinity. It is therefore crucial in drug design to understand which water molecules mediate protein-ligand interactions and which can be targeted for displacement. At the base of the BRD4-BD1 binding pocket, there is a network of highly conserved water molecules, which has been found to be important in ligand binding and stabilising the protein structure. 10,20,21 Zhong et al. 22 investigated the structural and thermodynamic properties of the crystallographic water molecules. They found that it is energetically unfavourable to displace a water molecule with a small drug-like compound, as this would require a large amount of energy to compensate breaking the hydrogen bonding network. Furthermore, molecular dynamics simulations demonstrated a high occupancy for several of the water sites. Using these findings, Zhong et al. developed a docking based virtual screening protocol to identify novel inhibitors towards BRD4. This study highlighted the importance of the water network in structure-based drug design.
The pool of BRD4 inhibitors continues to grow, with most identified through fragment or structure-based drug design based on properties of known BRD4 inhibitors. [23][24][25][26][27] Additionally, a recent review 28 identified three novel strategies in targeting BRD4, including bivalent BRD4 inhibitors, proteolytic targeting chimeric molecules and re-purposing of kinase inhibitors. The selectivity of inhibitors is also being targeted. [29][30][31] Initially many compounds were developed as BRD4 inhibitors. However, due to the structural similarity across the BDs of BET proteins, many of these were pan-inhibitors which could cause adverse side effects such as dizziness and nausea. 32,33 Gilan et al. 29 used structure-based design to discover compounds that interact specifically with either the BD1 or BD2 of BET proteins, providing new insights into improving therapeutic strategies with fewer toxic side effect. Speck-Planche et al. 31 have also developed the first multi-target quantitative structure activity relationship (mt-QSAR) model, which can predict BET inhibitor potency against BRD2, BRD3 and BRD4.
As in many drug discovery campaigns, computational methods remain an important tool in finding BRD4 inhibitors. 23,25,34,35 Structure based virtual screening methods, such as docking and 3D-QSAR, can facilitate high-throughput approximations of binding affinities. Molecular dynamics simulations expand on static representations of protein-ligand complexes by providing a more dynamic view and develop our understanding of structural patterns and interactions, which lead to high potency and selectivity. 36,37 As a crucial goal of drug development is designing a compound that binds competitively and strongly, alchemical free energy calculations are becoming increasingly important as, when done well, they can provide accurate estimations of binding free energies and present a way to minimize the number of compounds that are made in the laboratory. Pan et al. 23 discovered a BRD4-histone deactylase (HDAC) inhibitor, a promising therapeutic strategy for colorectal carcinoma, through high-throughput rigid molecular docking of fragments of known BRD4 inhibitors, embedded into the fragment-like library of ZINC. 38 A flexible docking method was then implemented on the top 200 scoring fragments. Following this docking, 24 compounds were synthesised and tested based on the 10 top scoring fragments, resulting in a promising lead compound. The extensive research already conducted on BRD4 and its inhibitors also makes it an excellent test case for the development of novel computational workflows. For example, Fusani et al. 39 merged active learning with the comparative binding energy (COMBINE) method and demonstrated its performance using a BRD4 dataset. Active learning was used to introduce an uncertainty estimation component to the COM-BINE method, which is a powerful tool in studying the structural information of protein-ligand complexes and deriving QSAR for structurally similar series of compounds.
A fundamental component of structure based in silico methods is the use of X-ray crystal structures. It is therefore important to be able to rely on the starting conformations of proteins in order to make accurate predictions. For example, different active site conformations may lead to different binding poses, which can severely impact predicted binding affinities. 40,41 While molecular dynamics simulations, combined with enhanced sampling methods, can be used to find the most stable binding pose, these methods come with a computational expense and it is still sensible to choose the starting structure with care. In a study on the T4 lysozyme L99A, Lim et al. 42 demonstrated that predicted relative free energy values are sensitive to initial protein conformation, even when using enhanced sampling. In our study, we examine the X-ray crystal structures of BRD4-BD1 complexes in the PDB and perform structural clustering, to identify the variation of conformations and the best static representative structures of the receptor. To compare the binding site of multiple complexes and identify ligands which cause structural variation, we use WONKA. 43,44 WONKA is a tool for ligand-based, residue-based and water-based analyses of protein-ligand structural ensembles. The advantage of using WONKA over other visualisation and analysis tools, such as PyMOL 45 , is its ability to identify trends within a data set. It can identify patterns between structure and individual ligand complexes, and these observations are displayed on a web based graphical user interface. WONKA also analyses water displacements and relates which ligands displace conserved water molecules. Therefore, we are able to explore the extent of the conservation of crystallographic water molecules in the BRD4-BD1 binding pocket and highlight functional groups present in the ligands, which displace the usually highly conserved water network. Molecular docking is also used to assess the accuracy of predicted binding poses, with and without the water network present.

Structural Data
A survey of the PDB reveals, at the time of this study, 323 Xray crystal structures of BRD4 in complex with a variety of ligands. To identify the common sequence, multiple sequence alignment was performed using the Clustal Omega 46 alignment tool in Chimera 47 . Sequence alignment shows that 26 of the ligands are co-crystallised with BD2 and are therefore discounted. In total, 297 BRD4-BD1 complexes are taken forward for further analysis.

Structural Clustering
To prepare the receptor structures for clustering, the ligands were first removed, multiple sequence alignment was performed and the common sequence across all structures (Fig. 1c) was retained, with the remaining tails removed. Protein structural clustering was performed using ClusCo 48 , a software tool for the clustering and comparison of protein models. This utilizes an open source K-means 49 code for Hierarchical Agglomerative Clustering. To identify a sensible number of groups to cluster the structures into, all-vs-all pairwise root-mean-square deviation (RMSD) values were calculated. RMSD was based on Cα atoms. The centroid of the whole ensemble was established by clustering with the cluster number set to one.
Although there is no specified upper limit to the number of crystal structures that WONKA can analyse, we found that a cutoff of 100 structures is preferable for the software to perform smoothly. To ensure that effects of a wide range of ligand activity and structural diversity were captured, the crystal structures studied us-ing WONKA were chosen based on structural clustering of the co-crystallised compounds. Out of the 297 crystal structures of BRD4-BD1, there are 266 unique ligands in complex with the receptor, which have accessible experimental data. Compounds that show little or no activity (pIC 50 < 5) were discounted, leaving 175 compounds to be clustered. Ligand structural clustering was performed in DataWarrior 50 , using Fragp descriptors and Tanimoto similarity ( ). Two structures were considered to be similar if > 0.8. Representative compounds from each cluster, and their respective crystal structures, were chosen for analysis using WONKA.

Molecular Docking
The representative compounds from ligand-based clustering were selected for molecular docking. Ligand coordinates were extracted from their original crystal structures and docked against the centroid crystal structure of BRD4-BD1 (PDB 4BJX), with and without the water network included as part of the receptor. All other crystallographic water molecules were removed in both cases. To prepare the crystal structure for docking, the receptor generation software as part of the OpenEye docking toolkit 51,52 was used. The co-crystallised compound with ID 73B was assigned as the ligand and therefore did not interact with docked molecules. A box centred around the original ligand with sides of length 15.7 Å x 20.7 Å x 19.0 Å was situated to cover the BRD4-BD1 binding cavity, giving a total receptor volume of 6151 Å 3 . Constraints were applied to ensure a heavy atom contact with Asn140. Compounds to be docked were protonated according to physiological pH and prepared using OpenEye OMEGA. 52 Con-formers were generated using a truncated form of the MMFF94s force field 53 with a maximum energy difference of 20 kcal mol −1 set from the lowest energy conformer. A maximum of 1000 conformers was allowed and those within 0.5 Å of any others were considered as duplicates and removed. Docking was performed using OpenEye FRED. 51 Compounds were docked using the high resolution setting with rotational and translational step sizes of 1 Å. To measure the accuracy of the docking with and without the water network present, heavy atom RMSD was calculated between the docked poses and the crystallographic poses of each of the ligands.

Absolute Free Energy Calculations
Absolute free energy perturbation (FEP) simulations were performed to calculate binding free energies. To illustrate the impact of starting structure, two calculations were performed, each starting from a different binding pose of the same ligand. Initial coordinates were obtained from the molecular docking study, where the ligand (PDB 83T) was docked to the centroid structure of BRD4-BD1 (PDB 4BJX), with and without active site crystallographic water molecules. This binding site water network was retained for the FEP calculation in the case where it had been involved in the docking. System preparation and the simulations were performed using GROMACS 2020.3 54 and the CHARMM force field. 55 Ligand parameters were generated using CGenFF. 56 The complexes were solvated in a dodecahedral box with an edge distance of 3 nm, to construct an explicitly modelled solvent consisting of around 32,000 TIP3P water molecules 57 and two Cl − ions to give a net neutral charge. In absolute FEP simulations, ligand binding affinity is estimated by calculating the difference between the free energy change of decoupling the ligand from solution and decoupling from the receptor. Therefore, a system was also prepared with the ligand free in solution. Upon setup, systems were minimized for 200 ps using a steepest descent algorithm. Equilibration was performed for 200 ps in the NVT ensemble with harmonic position restraints applied to the heavy atoms with a force constant of 1000 kJ mol −1 nm −2 . Temperature coupling was achieved by using velocity rescaling with a stochastic term and a reference temperature of 298 K. A further equilibration was performed for 4 ns in the NPT ensemble with a Berendsen pressure and temperature coupling scheme. The decoupling of the ligands from the receptor was split into 30 lambda windows, where ligand restraints were applied and van der Waals and Coulomb interactions were gradually turned off. The relative positions of the ligands with respect to the receptor were restrained by one bond, two angles and three dihedral harmonic potentials. To account for these restraints, a correction is applied to the free energy of binding. For the free ligand in solvent, van der Waals and Coulomb interactions were decoupled over 20 lambda windows. Each lambda window consisted of 1 ns of equilibration in the NPT ensemble, followed by 2 ns of data collection. Free energy changes were evaluated with the Bennett Acceptance Ratio (BAR) method 58 as implemented in GROMACS. Fig. 2 shows the distribution of the resolution of the crystal structures identified in the PDB; they have an average resolution of 1.60 Å. Overall, we can consider the structures to be highresolution and have confidence in the quality of the crystal structure data. Structural differences within the ensemble of crystal structures were measured using pairwise RMSD. The median and mean pairwise RMSDs are 0.56 Å and 0.58±0.22 Å respectively. These RMSD values are small and suggest high similarity between most protein structures within the ensemble. The structural similarity between the superimposed structures can be observed in Fig.  3a. The maximum pairwise RMSD is 1.70 Å between PDB entries 5KU3 and 6V1L. The overlay of these structures (Fig. 3b) suggests the largest structural variance occurs in the tail leading to the N-terminus. There is a small amount of deviation in the ZA loop. Given the narrow distribution of RMSD values, it is sensible to group the structures into five clusters. Fig. 4 shows an overlay of the representation crystal structures of the five structural clusters. The position of the binding site residues further demonstrates the similarity between different structures of the receptor. The centroid of the whole ensemble is the crystal structure with PDB code 4BJX, which has a resolution of 1.59 Å. With such a high number of crystal structures available for BRD4, these results can aid the selection of the most representative structures to use for the computational study of BRD4-BD1.

Ligand Based Structural Clustering
Clustering the co-crystallised ligands, which have experimental pIC 50 values of > 5, based on structural similarity resulted in 101 groups. The distribution of ligand activity for the whole data set and the 101 representative compounds from each cluster is shown in Fig. 5. The representative compounds have a pIC 50 range of 5.0 to 8.8 and cover a relatively wide span of activity.  Therefore, no further filtering of the compounds was performed and the complexes containing these ligands were taken forward for analysis using WONKA.

Structural Diversity of the Binding Site
WONKA enables the identification of trends in the position of active site residues for multiple crystal structures of the same protein. Fig. 6 shows the superimposition of the key binding site residues in BRD4-BD1 for each of the co-crystal structures, identified by ligand based structural clustering. WONKA clusters a particular residue's conformations into different groups based on an all-vs-all heavy atom RMSD of 2.5 Å between like-residues in a structural ensemble. Asn140, Tyr97, Met149, Trp81, Pro82 and Phe83 show 5, 3, 6, 9, 4 and 2 clusters respectively.
Visual inspection reveals that all key binding site residues, with the exception of Trp81, have a high level of conformational similarity across all of the crystal structures, regardless of the structure or activity of the bound ligand. There are three structures of Trp81 that show dissimilar conformations, highlighted in blue, red and green in Fig. 6. There are multiple factors that could play a role in the observed disorder of Trp81. It is important to recognise that these crystal structures provide information on only one static conformation. A protein is flexible in solution, and it is possible that the positions of these Trp81 residues are more ordered  As these values were measured using different biological assays, we should be cautious about directly comparing them with each other and the remaining dataset. 59 However, for a potent compound, we would expect a pIC 50 value upwards of 7.5, regardless of the assay conditions. A possible reason for these Trp81 positions could be that any hydrophobic interactions formed by the ligand with Trp81 do not outweigh the stability gained by polar ligand atoms forming solvent interactions. No additional binding site residue interactions are observed, in place of an interaction with Trp81. Therefore, compounds that result in this disorder of Trp81 are not desirable and do not correlate with higher binding Fig. 7 The three crystal structures and corresponding ligands, which contain Trp81 conformations most dissimilar from the whole PDB ensemble.
affinity. Furthermore, it would be sensible to use structures with a 'regular' Trp81 position for the basis of computational studies.

Active Site Water Molecules
It is well documented that a network of conserved water molecules plays an important role in BRD4-BD1 ligand binding [20][21][22]29,60,61 . Including crystallographic water molecules is an important consideration for computational studies, and there has been many tools developed to locate water molecules in protein binding sites. [62][63][64][65] Crystallographic water molecules can drastically change the binding mode in protein-ligand docking 22 and also provide stability to the system in more advanced methods such as binding free energy calculations. 66 As part of our exploration of X-ray crystal structures, we used WONKA to analyse the occupancy of the water network, which lines the BRD4-BD1 binding pocket, as shown in Fig. 8. From the positions of the water molecules, we can identify that the water molecule at site 2 mediates a hydrogen bond between Tyr97 and bound ligands. The size of the red spheres, which represent crystallographic water molecules, reflect how conserved they are across the ensemble of crystal structures. For example, the sphere at site 1 is the largest as there is only one crystal structure, which does not include a water molecule at this position. All water clusters within 8 Å of the ligand are displayed and there is a maximum distance of 1.5 Å between a point in a cluster and the cluster centre.
Using WONKA, we can easily identify the ligands which displace the water molecules in the sites where they are not present. The only crystal structure with no water molecules at site 1 or 2 is PDB 6MH1 (Fig. 9). Water molecules at sites 3 and 4 are also displaced. Divakaran et al. 67 acknowledge the reorganisation of the usually conserved water network and attribute it to the fluorophenyl group of the ligand. A moderate pIC 50 value of 5.77 was measured for this compound. However, increased selectivity over other BET receptors was observed and it was hypothesised that this was in part due to the displacement of the water molecules. In our ensemble of crystal structures, which were analysed using WONKA, there are no other fluorine containing groups which occupy the same region of the binding site, which perhaps explains why sites 1, 2, 3 and 4 are occupied by water molecules for the majority of remaining complexes.
Beside the structure previously discussed, there is one other crystal structure, PDB 5I88, which does not show water molecules at sites 3 and 4. The butenyl group on the ligand displaces the water network and induces a rearrangement. The addition of a butenyl group to the compound corresponds with a reduction in activity as the butenyl containing compound has a pIC 50 of 6.43, while its equivalent without the butenyl group has a pIC 50 of 7.04. Crawford et al. 21 suggest that, while there may be multiple parameters that contribute to the decreased potency, the position of the active site water molecules may play a role.
Site 5 is occupied by a water molecule in all crystal structures, with the exception of PDB 4GPJ and 5DLZ. The displacement of this water molecule does not correlate with high activity ligands. Furthermore, there are ten crystal structures in our data set, which do not contain a water molecule at site 6. The corresponding ligands for these crystal structures have a pIC 50 range of 5.30 to 7.30. The remaining water molecules depicted in Fig.  8 are present in ≤ 80% of the crystal structures (see supporting information for a full list of PDB codes and the presence of active site crystallographic water molecules in each). While we have not Fig. 9 The structure of the compound that displaces crystallographic water molecules at sites 1-4 is shown on the left. The structure on the right displaces water molecules at sites 3 and 4.
found a correlation between the displacement of specific water molecules in the network and the activity of the co-crystallised compounds, we have demonstrated the extent of their conservation. We expect this analysis to be a useful tool in selecting the best crystal structure and number of crystallographic water molecules to retain in computational studies of BRD4-BD1.

Molecular Docking
To demonstrate the importance of the conserved binding site water network in modelling an experimentally accurate system, molecular docking was performed with and without the presence of the water network. Ligands from 101 crystal structures of BRD4-BD1 complexes were docked against structure PDB 4BJX. To compare the two setups, the RMSD values between the docked poses and the crystallographic poses were calculated. On average, the improvement in RMSD when including the water molecules was 1.52 Å (distributions of each data set can be found in the Supporting Information). Furthermore, 82% of the ligand poses were better predicted when including water molecules as part of the receptor. For example, Fig. 10 shows a large difference in bound conformation of one of the compounds. The docked pose has a RMSD of 0.21 Å when water molecules are included and 1.54 Å when docked without water molecules. Different functional groups occupying different regions of the active site, such as in this example, can lead to large differences in predicted activity when using more involved, but more accurate, methods such as free energy calculations. The accuracy of the binding poses when docking with the conserved water molecules indicates that these water molecules should be retained in computational studies of BRD4-BD1. Furthermore, this aids the design of new inhibitors. Compounds should be designed with interactions with these solvent interactions in mind, while all other crystallographic water molecules are likely to be able to be displaced.
The receptor used in this study was the centroid of the ensemble of 297 PDB crystal structures of BRD4-BD1. Regardless of the inclusion of crystallographic water molecules as part of the receptor, all docked compounds show a good similarity to their original crystallographic conformations. This indicates that crystal structure 4BJX is a suitable starting conformation for the in silico study of BRD4-BD1.

Binding Free Energies
To further illustrate the impact of binding site water molecules, the free energy of binding was calculated for the two docked poses shown in Fig. 10. The orange structure is the co-crystallised binding conformation of the compound. The structure in light blue is the docked pose when including the active site water network and the green pose is the docked pose obtained without including the water molecules. The experimental binding free energy for the compound is -8.9 kcal mol −1 . 68 Absolute FEP resulted in a predicted binding free energy of -11.2 ± 0.6 kcal mol −1 for the light blue binding pose and a value of -1.5 ± 0.5 kcal mol −1 for the green pose. There are a number of factors that can affect the accuracy of FEP calculations, such as the quality of the small molecule force field parameters and the number of lambda windows used. However, the only difference in procedure between our two calculations was the inclusion of the active site water molecules in the generation and setup of the simulation starting conformations. The large difference between our predicted free energy values demonstrates the importance of starting structure in these types of calculations. Furthermore, the inclusion of water molecules resulted in a free energy of binding which is closer to experiment. This supports our conclusion that these water molecules are crucial for accurately modelling a BRD4 system. ligand. Clustering algorithms identify PDB 4BJX as the centroid of the ensemble and clustering into five groups gave structures 5WA5, 4PS5, 5NNE, 5D0C and 5U28 as the representative structures of each cluster.
To achieve a more detailed view of the binding site, we used WONKA to compare the conformations of individual residues that are important for histone and small molecule binding. In this analysis the positions of Asn140, Tyr97, Met149, Trp81, Pro82 and Phe83 in 101 X-ray crystal structures were compared. With the exception of a handful of Trp81 conformations, the positions of these residues were extremely similar. This shows the size and shape of the BD1 cavity remains unchanged with different ligands bound, highlighting the importance of the chemical features needed in a potential inhibitor. A polar group at the head of the ligand is necessary to form both the interaction with Asn140 and a water mediated interaction with Tyr97. Simultaneously, a lipophilic group is needed to extend into the hydrophobic cavity of BD1 and strengthen ligand binding.
Water molecules also play an important role in BRD4 ligand binding. Therefore, we examined the conservation of crystallographic water molecules in the binding site. Analysis in WONKA showed that the majority of crystal structures contain the four or five water molecules generally considered important for ligand binding. In total, there are up to 11 water molecules within 8 Å of the bound ligands, which are largely conserved across the ensemble. While there have been previous studies on this highly conserved water network 21,22 , ours is the first to consider such a high number of experimental structures. Our work demonstrates the extent of the conservation and, through molecular docking and absolute FEP calculations, highlights the importance of retaining binding site water molecules in computational studies. Through this review of BRD4-BD1 crystal structures, we have provided a quantitative basis to facilitate the selection of structures in future computational studies.

Conflicts of interest
There are no conflicts to declare.