Open Access Article
Sara
Puglioli
a,
Sebastian
Oehler
a,
Luca
Prati
a,
Jörg
Scheuermann
b,
Gabriele
Bassi
a,
Samuele
Cazzamalli
a,
Dario
Neri
*ab and
Nicholas
Favalli
*a
aPhilochem AG, R&D Department, 8112 Otelfingen, Switzerland. E-mail: nicholas.favalli@philochem.ch; dario.neri@pharma.ethz.ch
bDepartment of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH Zürich), Zürich, Switzerland
First published on 17th October 2023
DNA-encoded chemical libraries (DELs) are powerful drug discovery tools, enabling the parallel screening of millions of DNA-barcoded compounds. We investigated how the DEL input affects the hit discovery rate in DEL screenings. Evaluation of selection fingerprints revealed that the use of approximately 105 copies of each library member is required for the confident identification of nanomolar hits, using generally applicable methodologies.
The applicability of DEL technology as a powerful drug discovery tool has been demonstrated by recent examples of lead candidates which progressed into clinical trials for different pharmaceutical indications.12 Among the most advanced drug prototypes discovered thanks to the DEL technology and that are currently studied in phase I or II clinical trials, it is worth mentioning ligands of Receptor Interacting Protein Kinase 1 (RIPK1; discovered by GSK),13 of soluble Epoxide Hydrolase (sEH; discovered by GSK),14 and of autotaxin (ENPP2; discovered by X-Chem).15
Moreover, a variety of preclinical lead compounds isolated from DELs are being developed, such as PAR2 (AstraZeneca/X-Chem),6 Wip1 (GSK),16 BCATm (GSK),17 and DDR1
18 (Roche) binders, and OncoFAP-11 (Philochem).7
The success rate of a DEL-based drug discovery screening campaign is influenced by multiple factors, including the library chemical purity, encoding fidelity and quality of the protein targets.19–21 Furthermore, affinity capture protocols should be optimized to guarantee reliable selection outcomes. The number of copies per library member used during selections (input) directly impacts on the success rate of DEL screening campaigns.22–24 The definition of a minimum number of input copies for each library member (threshold) represents an important experimental parameter, especially when libraries of very large dimensions (e.g., those containing billions of compounds) are used. It is likely that the threshold for efficient selections may be library-dependent, but only a few studies have addressed this aspect of DEL technology.22–24
We had previously reported that an input threshold of 105 copies per library member was required for the efficient identification of Carbonic Anhydrase IX-binding fragments (sulfonamide derivatives) in one specific DEL.22 Here we present a methodology to define input thresholds on two different well-characterized DELs: NF-DEL (iodo-phenylalanine based library) and SO-DEL (4-aminopyrrolidine-2-carboxylic acid library).25,26 Both yielded novel nanomolar hits for Carbonic Anhydrase IX (CAIX), Human Serum Albumin (HSA) and Non-Structural Protein-14 (NSP14).25,26 NF-DEL and SO-DEL were screened against the targets at different inputs, ranging from 10 million copies to 100 copies per library member per selection.25,27 A threshold of approximately 105 copies per library member was required in order to successfully identify binding fragments (lines in the fingerprint) and unique building block combinations (singletons in the selection fingerprint) against all screened proteins. This finding has an impact on the experimental design of DEL selection campaigns and may influence the screening procedures for very large encoded-compound collections.
735
936 and 670
752 compounds, respectively. Previous screening campaigns performed with both libraries had resulted in the identification of hit compounds for a variety of pharmaceutically relevant target proteins (Table 1).25,26
The two DELs were diluted to a concentration of 10 million copies per compound dissolved in 10 μL (the “selection input”, see also the ESI, Section 3†), and serially diluted to a final concentration of approximately 100 copies of each compound per selection (Fig. 1B).
Subsequently, libraries were screened at different selection inputs (from 107 to 102 copies per compound) in duplicate experiments (ESI, Section 5†) performed against a panel of immobilized target proteins, such as Carbonic Anhydrase IX (CAIX), Human Serum Albumin (HSA) and the SARS-CoV-2 Non-Structural Protein 14 (NSP14).27 The high-throughput sequencing results are presented as three-dimensional matrices, referred to as “fingerprints” (Fig. 1C). In such fingerprints, two dimensions are used to represent the pairs of building blocks which unambiguously determine the chemical structure of each library member, while the third dimension indicates the number of counts for each compound at the end of the DNA-sequencing procedure.
![]() | ||
| Fig. 2 High-throughput sequencing results of affinity selections performed with different inputs (from 107 copies to 100 copies) of SO-DEL against (A) Carbonic Anhydrase IX (CAIX), (B) Human Serum Albumin (HSA) and (C) Non-Structural Protein 14 (NSP14). The data are presented as three-dimensional matrices (fingerprints) as described in Fig. 1. Enriched combinations which have been validated are highlighted with an arrow. Average counts for each selection as well as the number of counts for each enriched combination are reported in the ESI (Section 5†), cut-off = 30 counts. | ||
In selection campaigns against HSA (Fig. 2B), the highest enriched combination corresponded to A676/B642. This compound displayed a dissociation constant of 3 ± 1 nM against the target (Table 1 and ESI Table S4†). HSA fingerprints were characterized by the enrichment of singletons (e.g., A676/B642), indicating the requirement for the presence of both building blocks in the molecule, in order to yield a high-affinity interaction with the cognate target. Also in this case, a selection input of at least 105 copies of SO-DEL members was required in order to obtain high-quality fingerprints.
The results of SO-DEL screening on NSP14 (depicted in Fig. 2C) yielded one singleton (building block combination A206/B811) at selection inputs of 105 copies per compound or higher (Table 1 and ESI Table S4†). The compound had been previously validated for its high-affinity binding to NSP14 (KD = 25 ± 3 nM).26 When DEL selections were performed with inputs of 10
000 copies of compound or lower, no distinct enrichment patterns could be detected over the background signal.
For all three targets (CAIX, HSA and NSP14), additional building block combinations were visible when higher selection inputs were used (i.e., 107 or 106 million copies of SO-DEL).
000 copies per selection or lower did not produce informative fingerprints (i.e., no hit detected with a sufficiently high enrichment over the background).
![]() | ||
| Fig. 3 High-throughput sequencing results of affinity selections performed with different inputs of NF-DEL (ranging from 107 to 100 copies) against (A) CAIX and (B) HSA. The data are presented as three-dimensional matrices (fingerprints) as described in Fig. 1. The top enriched combinations are indicated with an arrow. Average counts of each selection and counts for enriched combinations can be found in the ESI (Section 5†), cut-off = 30 counts. | ||
In HSA selections with NF-DEL (Fig. 3B), the A502/B323 singleton was highly enriched at selection inputs of 10 million and 1 million copies of libraries (Table 1 and ESI Table S4†). When screening experiments were performed at 105 copies, the singleton was still detectable, but with lower counts.
Thus, in various selections performed using two libraries (NF-DEL and SO-DEL) against different targets, a minimum threshold of approximately 105 copies of each library member appeared to be required for a reliable detection of hits in affinity capture experiments. When using lower amounts of library copies (i.e., lower than the 105 threshold), singletons (dots) and binding fragments (lines) start to be indistinguishable from the background noise.
![]() | ||
| Fig. 4 High-throughput sequencing results of affinity selections performed with different inputs of SO-DEL-long (ranging from 107 to 100 copies of library per selection) against (A) CAIX and (B) NSP14. The data are presented as three-dimensional matrices (fingerprints) as described in Fig. 1. The top enriched combinations are indicated with an arrow. Average counts of each selection and counts for enriched combinations can be found in the ESI (Section 5†), cut-off = 30 counts. | ||
Even within well-established technology such as Phage Display, the achievement of 100% recovery efficiency remains a challenge.28 The success rate of hit discovery in DEL screenings is highly influenced by affinity capture, PCR, and Next Generation Sequencing (NGS) procedures. We have recently demonstrated that high-affinity binders (e.g., acetazolamide against Carbonic Anhydrase IX) can be efficiently captured with a yield close to 30%, unlike micromolar binders (e.g., m-SABA).22,24 The PCR efficiency of the first PCR amplification step after affinity capture is crucial, especially for selections involving library inputs lower than 105 copies.29 Indeed, results of model PCR experiments performed with growing library inputs (using NF-DEL, SO-DEL and SO-DEL-long) show the need for approximately 106–107 DNA molecules to achieve successful barcode amplification (see the ESI, Section 8†). Further worsening of the hit discovery rate happens during the final next-generation sequencing (NGS) step. Modern NGS procedures still suffer from a “count loss” effect, which however equally affects all library members due to PCR normalization normally performed prior to sequencing.
We investigated the impact of library input on the success rate of DEL selections using two high-quality libraries (SO-DEL and NF-DEL) previously described by our laboratory, which had yielded high-affinity hits against various proteins of pharmaceutical interest.25,26 Parallel selection experiments were performed with serially diluted libraries, ranging from 10 million copies to 100 copies per library member. The resulting fingerprints unambiguously revealed that an input threshold of approximately 105 copies of each compound were needed in order to confidently identify potent ligands. Higher input values (e.g., more than 106 copies per library member) may allow the identification of additional binders and further improvement of the signal-to-noise ratio, but this choice depends on the amount of the library which is available for selection experiments. The length of oligonucleotides used for PCR amplification did not appear to have a strong impact on selection performance. In theory, large amounts of DELs can be synthesized, but at some stage the costs for oligonucleotides and building blocks become prohibitively expensive. This limitation underlines the need for a quantitative characterization of input threshold for efficient selections. The correlation between discovery rates at distinct DEL inputs is further discussed in the ESI, Section 7.†
The findings of this study may not only be relevant for the correct execution of DEL selections, but also for the design and use of very large libraries (i.e., those containing billions of compounds). An effective screening (using 106 copies per DEL as selection input) of a library containing 10 billion compounds would require the total use of 16.6 nmoles of total DEL per selection.22,24
In practical terms, this implies that micromoles of final library DNA are needed for realistic screening campaigns, which are performed in at least 100 experiments. Considering a total yield in the range of 1–4% for 2 building blocks DELs25,26 and 0.1–0.2% for 3 building blocks DELs30,31 in relation to the starting DNA material for DEL construction, millimoles of total DNA would be needed, leading to costs which are excessive for most laboratories. In addition, the cost of expensive building blocks would also have to be considered.
As an illustrative example, considering a budget of approximately 25
000 euros equally distributed between the purchase of DNA codes (oligonucleotides) and small organic building blocks, we can secure approximately 16 nmol of a 3 building-blocks library (factoring in a 20% yield for each coupling step, with a final yield of ∼0.2%
30,31 (see also the ESI, Section 9†). In the context of 100 selections, each performed with 105 DEL copies (considered as the limit of detection based on the finding reported in this article), our calculations suggest that the library could theoretically comprise up to 109 compounds.
Since library size has been shown to substantially impact on the probability that high-affinity ligands are discovered in selection campaigns,32–35 the findings described in this article highlight the importance to accurately document experimental parameters in DEL publications and to continue performing research in this area. Discoveries leading to the use of lower copy numbers for individual library members will not only facilitate the efficient use of laboratory resources, but also enable the productive screening of very large DELs.
Our findings indicate that a careful optimization of affinity-capture conditions22 and of decoding methodologies is crucially required if DEL technology is to be productively applied with libraries containing billions of compounds. This aspect is particularly important for the confident detection of singletons (i.e., library members that are found to be enriched only when all building blocks simultaneously contribute to a productive binding interaction). The use of very large libraries, yielding only lines or planes in the selection fingerprints (corresponding to fragments of a molecular structure), is de facto equivalent to the screening of much smaller compound collections and may not allow the full advantage of the potential of DEL technology to be taken.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3sc03688j |
| This journal is © The Royal Society of Chemistry 2023 |