Amirhossein
Taghavi†
a,
Noah A.
Springer†
ab,
Patrick R. A.
Zanon
a,
Yanjun
Li
cd,
Chenglong
Li
c,
Jessica L.
Childs-Disney
a and
Matthew D.
Disney
*ab
aDepartment of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA. E-mail: mdisney@ufl.edu
bDepartment of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA
cDepartment of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, The University of Florida, Gainesville, FL 32610, USA
dDepartment of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA
First published on 13th February 2025
RNA structure plays a role in nearly every disease. Therefore, approaches that identify tractable small molecule chemical matter that targets RNA and affects its function would transform drug discovery. Despite this potential, discovery of RNA-targeted small molecule chemical probes and medicines remains in its infancy. Advances in RNA-focused libraries are key to enable more successful primary screens and to define structure–activity relationships amongst hit molecules. In this review, we describe how RNA-focused small molecule libraries have been used and evolved over time and provide underlying principles for their application to develop bioactive small molecules. We also describe areas that need further investigation to advance the field, including generation of larger data sets to inform machine learning approaches.
Different approaches have been adopted to target disease-causing RNAs. Antisense oligonucleotides (ASOs) were among the first modalities19 used for RNA-targeting and have been used in numerous cases.19,20 As ASOs bind to complementary sequences in the target RNA, the most potent oligonucleotides are often those that bind to unstructured regions.21–23 ASOs function via a variety of mechanisms, including RNase H-mediated degradation and steric blockage of RNA binding proteins (RPBs),21 to modulate alternative pre-mRNA splicing, for example. Fomivirsen was the first US Food and Drug Administration (FDA)-approved ASO,24 and since then multiple ASO-based therapeutics have been clinically approved.25–27 Although numerous backbone and base modification have been introduced that improve their stability and chemical properties,28,29 challenges associated with ASO delivery,30,31 stability and trafficking,32 and hepatotoxicity33,34 still hinder their use in the clinic.35
Small molecules provide an alternative for targeting RNA. Although their development towards RNA targets is not as advanced as ASOs, small molecules can have favorable drug-like and physicochemical properties, and these properties can be fine-tuned through conventional medicinal chemistry approaches.36 Folded RNAs provide binding pockets for small molecules including RNA-protein complexes,37–40 and also RNAs with internal loops, bulges, stems, pseudoknots, and junctions.41–45 Designing small molecules to target RNA, however, is fundamentally different and perhaps even more complex than targeting proteins, partly due to the unique features of RNA such as its structural flexibility and dynamics, surface electrostatics, and lower diversity of building blocks (four nucleotides vs. 20 amino acids). Despite these challenges, small molecule–RNA interactions hold the potential to affect biology by various mechanisms, including directing pre-mRNA splicing events, inhibiting precursor miRNA processing, repressing translation, inhibiting RNA–protein complexes, and targeting RNA for degradation (Fig. 1). Additionally, drugging the mRNAs of disease-relevant proteins may provide an alternative method to target “undruggable” proteins,46,47 which comprise a substantial portion of the proteome.
In this review, the evolution of approaches that have been used to develop RNA-focused libraries will be discussed, including substructure-based libraries, chemical similarity searching, and physicochemical property filtering. Noteworthy developments in the application of machine learning in hit identification and lead optimization are finding their way into the RNA world with promising results. These new approaches combined with more traditional methods such as fragment-based screening can be used to generate more focused libraries.48
![]() | ||
Fig. 2 History of RNA-targeting drugs. RNA has been a small molecule drug target for nearly 80 years. In 1946, the first class of ribosome-targeting small molecule antibiotics, aminoglycosides, were introduced clinically. All early small molecules were natural products or derived from natural product scaffolds.50 In 2000, the first fully synthetic RNA-targeting small molecule drug, linezolid,52 was approved, which also targets the ribosome. In 2020, risdiplam became the first non-ribosomal RNA-targeting small molecule drug, with the SMN2/spliceosome complex as the primary target.53 Dates above represent the first clinical use of each drug class, with a representative structure for each. |
The discovery and subsequent approval of linezolid in 2000 marked the first fully synthetic (i.e., not derived from a natural product pharmacophore) RNA-targeting antibiotic.54 Linezolid demonstrated that RNA-targeting small molecules could be successfully identified and developed from sources other than natural products. Importantly, the diversity of compound structures, ranging from large, highly charged aminoglycosides to the traditionally drug-like linezolid demonstrated that a wide range of chemical scaffolds can recognize and target RNA with clinical success. For other RNA targets, the development of primary screening hits into drugs has proved more challenging, in some cases due to lower expression levels, the lack of robust or dynamic structures, or inaccessibility due to protein binding.
The approval of risdiplam as an oral medication for the treatment of spinal muscular atrophy (SMA) showed for the first time that small molecule targeting of non-ribosomal RNAs could be therapeutically successful in humans. The SMN-C series of compounds were identified from a phenotypic screen searching for SMN2 pre-mRNA splicing enhancers to compensate for loss-of-function of the highly homologous SMN1.55 This initial hit was optimized via medicinal chemistry to afford the drug risdiplam,53 which gained full FDA approval in 2020. Detailed biophysical and structural studies showed that risdiplam stabilizes an RNA–protein manifold between the SMN2 splice site and a component of the spliceosome, U1 snRNP (small nuclear ribonucleoprotein), to direct splicing.56,57 Risdiplam makes direct contacts to the SMN2 pre-mRNA, inducing stabilization of the RNA by pulling the bulged adenosine within the helical stack and eliminating the clashes with the protein component (U1-C) of the U1 snRNP. A detailed nuclear magnetic resonance (NMR) study revealed that the carbonyl group of risdiplam forms a direct hydrogen bond with the amino group of the unpaired adenine linking the U1 snRNA and the 5′-splice site of SMN2 exon 7, serving as the minimal trans-splicing factor.57 Overall, it was suggested that risdiplam acts through the mechanism of 5′-splice site bulge repair. The extensive work to elucidate risdiplam's mechanism of action provides undeniable evidence that targeting RNA structures with small molecules can transform conventional drug discovery.58–60
![]() | ||
Fig. 3 Challenges for RNA-targeted small molecule drugs. (A) A comparison of RNA-only and protein-only structures in the RCSB PDB reveals a large (∼100-fold) discrepancy in the number of three-dimensional structures. (B) Structures containing RNA–small molecule interactions lack diversity, with approximately 80% of all structures belonging to either rRNA, riboswitches, or in vitro selected aptamers. RNA–small molecule structures were collected from the HARIBOSS database.62 (C) A comparison of protein (EGFR Kinase, PDB: 2ITZ) and RNA (HIV TAR, PDB: 1UUI) binding pockets is presented, where red and blue represent negative and positively charged surfaces, respectively, as calculated by UCSF ChimeraX.63 |
Small molecules identified from screening efforts are often non-specific or promiscuous binders, related to both RNA's negatively charged backbone and perhaps lack of diversity in its building blocks (Fig. 3(C)). Kelly et al. discuss that electrostatic interactions between the anionic RNA backbone phosphate and cationic functional groups on the small molecule can enhance binding affinity, however they also contribute to non-specific binding.64 Thus, incorporation of positively charged functional groups likely needs to be counter balanced with other target-specific interactions. Perhaps, the most notable example of cationic ligands that target RNA is the aminoglycoside class of antibiotics that bind and inhibit the bacterial ribosome. Its observed side effects, such as ototoxicity, are tied to its promiscuity, hindering clinical application.65,66 However, charge can also be used to improve selectivity. For example, flexible scaffolds with charged centers were specific for an RNA duplex as compared to a DNA duplex where the small molecule could not orient itself in the DNA minor groove.67 Addition of positive charge to diphenylfuran ligands changed an intercalative binding mode to an ionic one, improving specificity.68
RNA contains planar aromatic nucleotide bases that can engage in π–π stacking interactions with aromatic rings in small molecules. While stacking is often specific in defined binding pockets (e.g. in aptamers or riboswitches), it can also occur non-specifically along helical regions or loops of RNA, in DNA, or in proteins. However, various studies have shown that stacking interactions can be used to drive selectivity for RNA.69 One example is an acridine-based ligand discovered to stabilize a G-quadruplex structure found in the long noncoding (lnc)RNA Telomeric Repeat-Containing RNA (TERRA).70 For intercalating small molecules, selectivity can also be improved by designing threading intercalators.71
Another way to increase the binding specificity is to use multivalent compounds to target adjacent binding sites simultaneously, particularly for RNAs with less complex binding pockets.72,73 This multivalency strategy has been used to target expanded repeating RNAs and miRNAs, which are primarily comprised of relatively simple stem-loop structures with internal loops or bulges. Collectively, the design strategy employed for small molecules should be based on the geometrical properties of the target RNA.
Non-specific binding can have broad implications, as exemplified in pentamidine, an FDA-approved antimicrobial drug.74,75 In addition to its antimicrobial function, this compound can bind to the triplet repeat expansion r(CUG)exp that causes myotonic dystrophy type 1 (DM1) and inhibit the binding of the alternative pre-mRNA splicing regulator muscleblind-like 1 (MBNL1), thus improving the splicing defects associated with MBNL1 sequestration.75 However, further investigations showed compound is non-specific for r(CUG)exp and also interacts with DNA and proteins.76
Finally, RNA conformational flexibility is both a challenge and an opportunity for selective binding.77 In solution, both RNAs and proteins exist in multiple different conformations called conformational ensembles.78 While proteins exist in one or a limited number of well-folded structures, RNA exhibits a relatively large number of conformations that have similar stability.79 RNA molecules also have a greater degree of local structural fluctuations compared to proteins,80 posing signficant challenges for structural determination.
Transient RNA conformations observed along the conformational flexibility pathway have important biological functions78 and are also involved in diseases, making them a potential therapeutic target.81,82 These transient conformations provide an opportunity to increase the compound specificity as exemplified in r(G4C2)exp, the hexanucleotide repeat expansion that causes genetically defined amyotrophic lateral sclerosis and frontotemporal dementia, C9orf72 (c9) ALS/FTD.83 The r(G4C2)exp forms two distinct structures: a G-quadruplex or a hairpin structure with a periodic array of 1 × 1 nucleotide GG internal loops.84–87 These two alternative conformations can be targeted with different small molecules that selectively bind either conformation. Interestingly, a small molecule that conformationally selects a hidden, minor conformation, a hairpin that forms 2 × 2 nucleotide GG internal loops, was also discovered.83 This broad conformational flexibility is one of the unique features that distinguishes RNA from protein targets and can be exploited for RNA drug discovery.88
Overall, the adopted design strategy to create RNA-focused libraries, either general or target-specific, with the goal of minimizing non-specific binding, should be initiated with careful examination of the geometrical properties of the binding pocket and then selection of compounds that can form specific interactions with the binding pockets. Thus, compounds are selected based on shape complementarity,89–91 electrostatic complementarity,92 and conformational flexibility.93,94
![]() | ||
Fig. 4 Substructure-based approaches to developing RNA-focused libraries. Early RNA-focused libraries72,96–98 utilized prior knowledge of individual RNA binders such as aminoglycosides (left), 4′,6-diamidino-2-phenylindole (DAPI, middle), and a 2-aminobenzimidazole-based Hepatitis C viral RNA binder. Important substructures from these known RNA binders were extracted and incorporated into new scaffolds, generating early RNA-focused libraries. |
This knowledge, the importance of substructures, was applied to design the first RNA-focused library with diverse chemical features, which was synthesized on a peptoid backbone using building blocks that are likely to bind RNA.97 The library used substructures extracted from molecules known to bind RNA (linezolid,52 xanthinol,99 and pentamidine,74 for examples) or building blocks hypothesized to facilitate hydrogen bonding or stacking interactions with RNA, such as the benzene or benzenesulfonaminde (Fig. 4).97 This library comprising 109 compounds was synthesized via a solid-phase approach where each molecule contained an azide handle that was used for site-specific conjugation to alkyne-functionalized agarose microarrays. The microarrays were then screened for binding to the Candida albicans group I intron, a catalytically active RNA molecule (ribozyme) that folds into a tertiary structure with well-defined binding pockets.100 The hit molecules from the binding screen were then tested for their ability to inhibit group I intron self-splicing, resulting in IC50 values ranging from 150 to >5000 μM. The data obtained from this first round of screening were used to identify a set of features that drive binding, incorporating both building block identity and position within the peptoid. These features aided design of a second generation of compounds with overall improved IC50 values for in vitro inhibition of splicing (31–110 μM). This ligand-based approach showed how embedded features in moieties that confer binding to RNA can be harnessed to design an RNA-focused library with improved features and emphasized the importance of the availability of such information. The combinatorial approach adopted in this study showed the importance of feature extraction in disposing the ligands for RNA recognition.
Benzimidazoles are another privileged substructure/scaffold for RNA targets. An early NMR-based approach by Abbott Laboratories identified that simple 2-aminobenzimidazoles bind the bacterial A-site rRNA with Kd values of ∼200 μM.101 Shortly thereafter an “SAR by MS” approach by Ibis Therapeutics identified a 2-aminobenzimidazole with ∼100 μM affinity to the Hepatitis C Virus (HCV) IRES RNA. Medicinal chemistry approaches were able to optimize this compound to afford a restricted, cyclic benzimidazole derivative with <1 μM affinity to the target and cellular activity at single digit micromolar concentrations in an HCV replicon assay.102
Because of the success of 2-aminobenzimidazoles in targeting several RNA structures, a library of 79 compounds containing the substructure (Fig. 4) was synthesized and evaluated in a microarray-based selection named 2-dimensional combinatorial screening (2DCS).103 In 2DCS, a microarray of small molecules is incubated with a library of radiolabeled RNAs – in this case, a library of 4096 RNA hairpin structures containing a randomized region in a 3 × 3 nucleotide internal loop pattern. After washing, the bound RNAs are manually excised from the microarray surface, amplified by RT-PCR, then identified by sequencing. Screening of this library revealed that functionalization of the 2-aminobenzimidazole scaffold both determined whether the compound could bind RNA (only 19 of 79 compounds bound the 4096-member RNA library) and differences in the small molecule's preferred RNA structures.103
In a similar approach, a panel of 43 RNA-focused compounds harboring privileged scaffolds known to bind RNA such as benzimidazoles,104 pentamidine,105 and 4′,6-diamidino-2-phenylindole (DAPI)106 was synthesized and screened for binding to various RNA libraries (Fig. 4).98 Three different RNA motif libraries (containing randomized 6-nucleotide hairpin loops, 3 × 2 nucleotide internal loops, and 4 × 4 nucleotide internal loops) were screened for binding to these RNA-focused small molecules using a fluorescent dye displacement assay. Of the 43 compounds, eight bound one or more of the RNA libraries, affording a hit rate of ∼19%. In contrast, the hit rate for the library of pharmacologically active compounds (LOPAC; designed for protein targets) was only ∼1%.98 Scaffold analysis also showed the importance of indole, 2-phenyl indole, 2-phenyl benzimidazole and pyridinium groups providing invaluable information for building RNA-focused libraries.98 Like the previous study described above, the preferred RNA motifs for hit compounds were determined by 2DCS.
These early attempts to generate RNA-focused libraries demonstrate the value of prior knowledge of RNA-binding chemotypes in their design. However, due to their strict reliance on incorporation of specific substructures, often in a specified arrangement, these early libraries contained minimal diversity and only a handful of novel chemotypes capable of binding RNA were identified. As more information on the types of small molecules that bind RNA is discovered, new and improved RNA-focused libraries will emerge.
In the absence of structural data, as is the case for nearly all RNA targets, ligand-based approaches can be used to create focused libraries. In this approach, the physicochemical properties of known active molecules can be used for a molecular similarity115–117 search campaign against screening libraries (commercial/non-commercial). The ligand-based approaches can leverage either one-dimensional or two-dimensional molecular descriptors,118 which encompass the chemical nature of the small molecules, or three-dimensional descriptors such as pharmacophore properties, shape, or volume.119–121
The accumulated knowledge from other studies122 was carried over to design the first RNA target-specific libraries generated by computational chemical similarity searching. Expanded RNA repeats are causative of microsatellite disorders,123 and expanded r(CUG) repeats [r(CUG)exp] in particular are the toxic agent in myotonic dystrophy type 1 (DM1).124 Small molecule targeting of this disease-relevant RNA has therapeutic potential.125 The 3D shapes of previously identified small molecules targeting this particular RNA105,126–128 were used for screening the National Cancer Institute's (NCI; 250000 compounds) and eMolecules databases (8
000
000).129 This virtual screening resulted in identifying a bis-benzimidazole scaffold when a Hoechst derivative was used as the query molecule, as it binds the RNA and displaces MBNL1 in vitro.129 The most potent derivative identified, H1, rescued DM1-associated splicing defects and foci formation in a DM1 cell culture model and rescued splicing defects in a DM1 mouse model, albeit with modest potency. This study was the first attempt to apply the ligand-based virtual screening in the creation of a target-specific RNA-focused library, resulting in the identification of lead molecules with improved potency compared to the query molecules.
The identification of bis-benzimidazole as a privileged scaffold for RNA129 initiated the creation of the second target-specific library using the concept of chemical similarity searching.130 A library of structurally diverse yet chemically similar compounds (n = 320) was generated by performing a chemical similarity search of The Scripps Research Institute's (TSRI) drug discovery collection, using H1 (above) as the query molecule. The library was screended for binding to r(CUG) repeats using a time-resolved fluorescence resonance energy transfer (TR-FRET) assay,131 yielding a hit rate of ∼9%. A subsequent substructure analysis showed the abundance of pyridyl, benzimidazole, or imidazole ring systems. Downstream analysis of the bioactive compounds (compounds that improved DM1-associated pre-mRNA alternative splicing defects) and extraction of the physicochemical properties showed the differences between these compounds and the starting library. Bioactive compounds had a larger topological polar surface area (TPSA) compared to the starting library (101 ± 33 Å2vs. 75 ± 25), as well as more hydrogen bond donors (3 ± 1 vs. 2 ± 1,) and acceptors (4 ± 1 vs. 3 ± 1). The majority of these compounds were benzimidazoles with a para-substituted phenyl ring at the 2-position.
With the advent of combinatorial chemistry, the size of the screening libraries started to grow, and concepts such as drug-likeness or lead-like properties were later applied to improve the quality of such screening libraries.142 Aided by the development of more accurate and sophisticated cheminformatic tools, higher quality libraries were created by incorporating physicochemical property information such as TPSA and the octanol/water partition coefficient (logP).143 TPSA is a property that affects the ligand's ability to interact with the hydrophilic regions of RNA, for example its backbone or phosphate groups. RNA is often highly hydrated and polar, so ligands with a high enough TPSA may exhibit better water solubility, making them more bioavailable and effective in aqueous environments like the cytoplasm or nuclei. However, too much polar surface area could limit their ability to cross biological membranes, thus it is essential to balance TPSA with log
P. While proteins often benefit from high hydrophobicity for membrane binding or intracellular trafficking, RNA-targeting molecules often require a balance of lipophilicity. RNA's negatively charged backbone makes it more hydrophilic, therefore RNA-binding ligands with moderate log
P values are more likely to be both water-soluble and able to interact effectively with the RNA target (Table 1). Also, while proteins are highly structured, dynamic molecules with complex three-dimensional shapes, RNA is also structurally diverse but exhibits secondary and tertiary structures that are more flexible and can undergo dynamic conformational changes. This flexibility significantly impacts the chemical motifs used for binding.
Physicochemical property | Description | Effect on RNA binding |
---|---|---|
Topological polar surface area (TPSA) | The surface area of polar atoms | Higher TPSA typically suggests improved aqueous solubility and interaction with the polar RNA backbone. A balance is needed, as a large TPSA may reduce membrane permeability. |
Octanol–water partition coefficient (logP) | A measure of lipophilicity | A moderate logP is desirable for RNA-targeting ligands. Highly lipophilic compounds may struggle to interact with the negatively charged, hydrophilic RNA backbone, while very hydrophilic compounds may not penetrate cells effectively. |
Molecular weight | The sum of the atomic weights of all the atoms in the molecule | Larger molecules might have better binding potential due to multiple interaction sites, but they might also face difficulty in membrane penetration. |
Hydrogen bond donors (HBD) and acceptors (HBA) | Functional groups capable of forming hydrogen bonds | Hydrogen bond donors and acceptors are crucial for achieving selectivity and stability in RNA–ligand complexes, and they are essential for optimizing the efficacy of RNA-targeting small molecules. |
Conformational flexibility | The ability of a ligand to adopt different conformations | RNA-binding ligands that are flexible may be able to adapt their structure to fit into different RNA target sites. However, flexibility needs to be balanced, as too much flexibility can reduce binding affinity. |
Planarity | Aromatic rings present in the ligand | Aromatic rings can participate in π–π stacking interactions with RNA bases, a key interaction for small molecules targeting RNA. |
Despite the improvements in the quality and size of screening libraries, the currently available libraries cover a very small part of the drug-like chemical space, comprising an estimated 1030 compounds.144 Because no library can cover this immense druggable space, design of specialized libraries that incorporate both chemical diversity as well as drug-like physicochemical properties can improve the success of screening campaigns without the need for exceedingly large library sizes.
While there have been many HTS campaigns against RNA targets over the past ∼30 years, they primarily focused on single targets and drew few conclusions about general principles of RNA-targeting molecules. In recent years, there has been a more concerted effort to develop principles of RNA-targeting small molecules by screening diverse libraries against multiple RNA targets.
To address this need, Haniff et al. created an RNA-focused library by exploiting the chemical features found in a repository of RNA-binding small molecules found in the literature named Inforna.145 The physicochemical properties of these RNA binders were compared with those of commercially available compounds, affording an RNA-focused library with 3271 compounds. At least 20% of library members were chemically dissimilar from known RNA-binding small molecules. This resulting library was enriched with nitrogen-containing heterocyclic molecules such as phenyl-substituted thiazoles, benzimidazoles, indoles, and quinazolines. Despite the enrichment in these RNA-binding chemotypes, the physicochemical properties of the library more closely resembled those of drugs in DrugBank146 than compounds in the Inforna database. The library was screened against four different RNA targets composed of A–U or G–C base pairs in different arrangements using a fluorescent dye displacement assay. Six structurally distinct classes of small molecules were identified as base pair binders, ranging in affinity from high nanomolar to low micromolar. This study demonstrated that a diverse, drug-like library could successfully identify RNA-binding small molecules.
In a similar but expanded approach, another ∼2000 compound RNA-focused library was created by comparing features of the Inforna library to AstraZeneca's corporate collection (Fig. 5(A)).147 This library was screened against three RNA libraries comprising a total of 21504 unique RNA structures using 2DCS. In all, 27 compounds (1.4% hit rate) bound RNA and contained five key scaffolds: phenyl-bis-benzimidazoles, phenyl-benzimidazoles, 2-aminoquinazolines, 4,6-diaminopyrimidines, and 2-guanidino-3-methylthiazoles. By comparing the physicochemical properties of binders to non-binders, computational analysis revealed that the RNA binders were more lipophilic, had fewer rotatable bonds, more hydrogen bond donors, greater polar surface area, and fewer sp3 carbons. Though structurally distinct from them, the hit molecules had similar physicochemical properties as compounds in two other repositories of known small molecules that bind RNA.148,149
![]() | ||
Fig. 5 Examples of the applicability of HTS to identify RNA-binding chemotypes against diverse RNA targets. (A) Haniff et al.145 utilized molecular descriptors of Inforna RNA binders to generate a ∼2000 compound library which was screened against >20![]() |
In an alternative approach, an RNA-focused library was created by first screening two diverse, protein-targeting libraries (∼55000 compounds) against 42 different disease-relevant RNA targets with the affinity mass spectrometry method dubbed Automated Ligand Identification System (ALIS).151,152 ALIS employs sequential size exclusion and reverse-phase chromatography followed by mass spectrometry to identify RNA binders. Perhaps unsurprisingly, this initial screen had low hit rates for the RNA targets, 0.04% and 0.01% for the two libraries, much lower than the 1.5% and 0.05% hit rates observed for proteins screened against the same libraries (Fig. 5(B)).
By generating a machine learning (ML) model based upon calculated molecular fingerprints, chemical features that discriminated between RNA binders and non-binders were identified. The model was then used to select molecules from Merck's compound collection to create an RNA-focused library with 3700 compounds.150 This new RNA-focused library was re-screened against 32 of the 42 RNA targets and showed a markedly increased hit rate of 0.32%. Additionally, this increased hit rate was seen across most RNA targets (24 out of the 32 tested RNAs had a higher hit rate in the RNA-focused screen than the initial screen), suggesting that the new library contains features generally associated with RNA binding. Importantly, the screening hits identified were largely specific, with 66% of identified hits only binding one of the 42 RNA targets. Further, of the compounds that specifically recognized one RNA structure, nearly 58% did not bind proteins from Merck's prior in-house ALIS screens, suggesting these compounds preferentially bind specific RNA targets. PCA (principal component analysis) of physicochemical properties of RNA binders vs. protein binders revealed that although the chemical space of RNA-binding and protein-binding small molecules overlap largely within the drug-like space, the chemical motifs that contribute to RNA and protein binding are distinct. Examples of these discriminatory characteristics include aromatic amine-containing heterocycles and amidine-like motifs.
In another campaign against diverse RNA targets, a small molecule microarray library containing nearly 25000 compounds was screened for binding 27 RNAs (including G-quadruplexes, hairpins, pseudoknots, three-way junctions, and triple helices) and nine DNA targets, with several individual screens published previously.41,153 The RNA targets had hit rates between 0.48% (hairpins) and 0.85% (pseudoknots); when only considering selective small molecules, the hit rate ranged from 0.16% to 0.33%.154 Perhaps it is not surprising that pseudoknots, the most structurally complex of the targets, had the highest hit rate. The 2188 small molecules that bound at least one of these nucleic acid targets, of which 2003 bound one of the RNAs, were compiled in the Repository Of BInders to Nucleic acids (ROBIN) library.154 As will be discussed in more detail later, several machine learning models were developed and assessed to identify discriminatory features between RNA binders and non-binders, revealing that nitrogen content and aromaticity were important features.
Overall, primary screening of general-purpose screening libraries against RNA targets or target classes can be adopted as a strategy to identify the properties that discriminate RNA binders from non-binders or protein binders. These features can then be applied to create RNA-focused libraries that are enriched in properties that facilitate binding to RNA. In the following sections, alternative methods to identify RNA-binding small molecules, such as fragment screening, DNA-encoded libraries (DELs), and virtual screening, are discussed.
While prior studies had utilized fragment screening methods against RNA, the first attempt to generate a library of RNA-focused fragments was in 2009. Bodoor et al. chose fragments by first collecting known RNA binders from literature, prioritizing compounds with Kd values less than 50 μM.158 In total, 120 molecules were identified, and analysis of the physicochemical properties showed the similarities between these molecules and kinase and protease binders. The overlap of the chemical space between known RNA binders and proteins such as kinases provides guidelines to discriminate RNA from protein binders.159 These 120 molecules were fragmented in silico to generate 250 fragments. These fragments were then clustered using their molecular fingerprints and chemical descriptors, and 102 commercially available fragments were chosen to represent each cluster. This library was then screened against the ribosomal decoding site (“A-Site”) RNA using different NMR experiments, which resulted in five hits, including two hit fragments chemically dissimilar from known A-site RNA binders. This study also showed that a detailed analysis of already known RNA binders is a logical starting point to generate a library of RNA-focused fragments.
The exploitation of the chemical features of known drugs to find similar molecules, named analogues, has been a successful approach.160–162 The chemical features of known RNA binders collected in Inforna were used to identify enriched scaffolds that afford RNA binding.163 Scaffold extraction showed high enrichment of triazole, thiazole, furan, and quinoline, along with other scaffolds. These scaffolds were then virtually screened against a library of 11788 small molecules, building an RNA-focused fragment library with 2500 compounds. The Rule of Three was generally followed in building the library, wherein fragments had a molecular weight (MW) below 300 Da and three or fewer hydrogen bond donors and acceptors.164 The library was also drug-like, with an average quantitative estimate of drug-likeness (QED)165 score of 0.75 ± 0.10. The RNA binding propensity of this library was assessed using 2DCS, where one fragment was identified to prefer an A-bulge found in the Dicer processing side of pre-miR-372 (KD = 300 ± 140 nM). Although this approach did not identify new scaffolds for RNA binding, it represents a successful workflow in design of RNA-focused fragment libraries.
Fragment-based hit discovery or lead optimization has several advantages regarding the ease of synthesis, characterization, and high success rate.162–167 This study showed that despite the differences between targeting proteins and RNA, many established approaches can be adapted for RNA targets.
Two different approaches have been developed recently to address the concerns of the RNA target of interest interacting with DNA tags in DEL screens. In the first example, Benhamou et al.169 utilized a one-bead one-compound (OBOC) solid-phase DEL library, rather than the more traditional solution-phase DEL. This enabled sub-stoichiometric loading of DNA onto the bead, where the compound was in ∼250-fold excess relative to the DNA barcode, reducing the potential effect of RNA–DNA hybridization. The authors integrated 2DCS with this solid-phase DEL in a massively parallel screening pipeline to probe affinity landscapes between RNA folds and small molecules. Fluorescence-activated cell sorting (FACS) was used to identify the DEL beads which preferentially bind to a library of RNAs containing 3 × 3 nucleotide internal loops relative to a fully base-paired control RNA. Hit molecules from the DEL screen were then resynthesized and screened using 2DCS to identify which of the 4096 3 × 3 internal loops preferentially bind the small molecules. Though the library was synthesized using a racemic mixture of proline derivatives, resynthesis of hits as pure diastereomers revealed that some, but not all, hits preferred different RNA structures dependent upon stereochemical identity. These results suggested that future design of RNA-targeting DELs may benefit from the incorporation of compounds with defined stereochemistry to improve or alter specificity.
The Inforna platform was then utilized to identify disease-relevant pre-miRNAs that overlapped with the DEL selection, affording a nanomolar binder for the primary transcript of oncogenic miR-27a (pri-miR-27a). This study demonstrated the power of DEL screening in combination with selection-based methods to target RNA.
Shortly thereafter, Chen et al. utilized an alternative, solution-phase DEL approach that incorporated patches to reduce interactions between the RNA targets and the DNA tags.170 The authors noted significant extents of false positives when screening their ∼10 billion member DEL library against HIV TAR RNA using standard techniques. In their optimized method, non-specific RNA–DNA hybridization was essentially ablated upon utilization of a combination of pre-incubation with RNA “patches” containing the same sequences as the RNA target and competitive elution using known ligands that bind HIV TAR. The authors then used their approach to identify two new binders of the FMN riboswitch with affinity of <20 nM towards the target.170
While these initial RNA-targeting DELs demonstrated the utility of the approach towards RNAs, the libraries were not enriched in known RNA-binding substructures. Such a library was developed by enriching the DEL building blocks with RNA-focused motifs, including benzimidazoles, azaindoles, pyrazoles, and others.171 The 12672-member DEL library was then screened for binding specifically to r(CUG) repeats by co-incubation with a fully base-paired RNA. As expected, hit compounds shared commonalities with previously known RNA binders, for example the number of hydrogen bond acceptors and the number of tertiary amines, but also showed discriminatory properties such as a lack of positive charge. Subsequent studies for target engagement and bioactivity showed that one DEL compound improved aberrant alternative splicing associated with DM1. This study demonstrated that utilization of prior knowledge in the design of an RNA-focused DEL can identify bioactive RNA-targeting ligands from relatively small, focused libraries.
Overall, the examples discussed above delineate a general strategy in building an RNA-focused library: (i) screening of general-purpose libraries against a variety of RNA structures (or a specific class of RNAs) and then leveraging the obtained knowledge of RNA binders to create an RNA-focused library; or (ii) exploiting the chemical features of known RNA binders to create RNA-biased libraries to narrow down the very specific features that can discriminate an RNA binder from protein binders.
To overcome the lack of available 3D structures for RNA targets, methods for molecular docking for hit identification through virtual screening have been developed.176 For example, Shi et al. identified a potent inhibitor of miR-21 through virtual screening of 1990 compounds from the National Cancer Institute's (NCI) diversity dataset against a computationally predicted structure of miR-21's Dicer processing site (using MC-Fold/MC-Sym177).178 In another study, selective molecules targeting RNA tetraloops (arginine–RNA aptamer complex, a biotin–RNA pseudoknot complex, and a theophylline–RNA complex) were identified.179 After establishing that docking can recapitulate the experimentally determined pose of an aminoacridine derivative (AD1) bound to a hairpin, virtual screening identified AD2 with a binding specificity for tetraloops (Kd = ∼1 μM), as compared to double stranded RNA (Kd = ∼25 μM).
The same group applied this approach to identify small molecules targeting the GGAG tetraloop, a highly conserved stem-loop (SL-3) in the HIV-1 genome.180 Molecular docking of 1367 compounds identified two compounds (compounds 5 and 9) as selective binders of SL-3 RNA. The compounds showed noticeable specificity for tetraloops over double-stranded RNAs (∼3.5- and ∼6-fold, respectively) and single-stranded RNA (∼50- and ∼25-fold, respectively).180 It should be noted that docking was combined with a short (5 ns) molecular dynamics simulation to account for the flexibility of RNA, and the final hits were identified among the compounds that formed a stable complex with SL-3. Such an approach has also been successfully implemented for other viral targets including HIV-1 TAR,181 a pseudoknot present in the SARS-CoV genome,43 the HCV IRES subdomain IIa,182 and a cis-acting regulatory stem-loop RNA of hepatitis B virus (HBV).183 Nine of the molecules identified in this study bound RNA with micromolar affinity, most of which had not been shown previously to bind RNA, demonstrating that the docking methods can identify new RNA-binding ligands.
In a particularly interesting example, ∼100000 compounds were screened for binding HIV-1 TAR RNA using a Tat peptide displacement assay, affording seven hit molecules and, importantly, over 100
000 non-hit molecules or “decoys”, both of which were employed in a subsequent virtual screening campaign.184 The training set for the virtual screen comprised these seven hits plus an additional 78 experimentally validated small molecules and ∼100
000 decoys. Prior to virtual screening, this library of 100
085 compounds was filtered to provide two distinct libraries: (i) in one library, the molecules with outlier physicochemical properties were removed; and (ii) in the other, the DUD-E protocol was used to select a subset of both the hits and property-matched non-hits. Virtual screening of these libraries against an ensemble of 20 molecular dynamics-generated HIV-1 TAR RNA structures demonstrated enrichment of the true, experimentally validated hits among the virtual screening hits. Importantly, screening against the full ensemble of 20 RNA structures was key, as screening against fewer structures resulted in the identification of fewer hits. This study showed that including experimental data to refine virtual screening efforts can significantly improve the identification of RNA-binding chemical matter.
Overall, these attempts showcased the applicability and usefulness of virtual screening in hit identification targeting different RNAs. Of particular promise are the ensemble-based approaches, which may better account for the inherent flexibility of RNA targets which is otherwise difficult to account for in high-throughput virtual screening campaigns. Although no follow-up analyses were performed to extract the chemical features of virtual screening hits compared to non-hits, the lead small molecules could be used to design target-specific libraries in future studies. Molecular docking, therefore, may provide a means to generate libraries enriched in RNA-binding compounds, although subsequent experimental validation is always required.
One of the challenges in using molecular docking is the inaccuracy of docking poses.192,193 ML-based methods can be trained even on the limited available data that describes RNA–small molecule complexes and can help to separate accurate vs. non-accurate poses. In one of the first examples, it was shown that using the random forest classifier in RNAPosers could separate the accurate RNA–small molecules poses from decoys.194 AnnapuRNA is another example that uses supervised ML models, K Nearest Neighbors (a simple, yet powerful, supervised machine learning algorithm used for both classification and regression tasks), and multi-layer feedforward artificial neural network, achieving high accuracy in prediction of bound poses of small molecules.195 These augmentation approaches can be combined with conventional molecular docking to increase the efficiency of hit identification.
As one of the only attempts to use primitive machine learning models in the evaluation of RNA-focused libraries, Yazdani et al. determined features that differentiate RNA binders from non-binders incorporated in the ROBIN library by using a class-weighted logistic least absolute shrinkage and selection operator (LASSO) regression model.154 To generate a model that can separate RNA binders from non-binders, 1664 molecular descriptors were calculated using Mordred196 (an open-source software tool designed to calculate molecular descriptors), and these features were used to train the LASSO regression model. The analysis found that features related to nitrogen content and aromaticity favor RNA binding. The performance of the LASSO model was not ideal (AUPRC score of 0.37, where AUPRC stands for Area Under the Precision–Recall Curve, a performance metric often used in evaluating binary classification models, especially for imbalanced datasets where the focus is on the minority class), as expected due to its several disadvantages, particularly when dealing with large datasets, high-dimensional data, or outliers. The computational cost, sensitivity to feature scaling, and difficulty with imbalanced data also make it less suitable for complex or large-scale problems without careful tuning and pre-processing. However, the authors showed that application of more advanced techniques like feedforward neural networks can significantly improve the model performance (AUPRC score of 0.78).154
The main power of ML methods is their predictive capabilities, and they can be trained on almost any class of data, not solely structural data, particularly important due to the scarcity of available RNA–small molecule complexes. RNA 3D structure information, however, has also been employed in building predictive models to identify new small molecules.197,198 ML-based methods have also been used for the de novo generation of small molecules by exploiting the chemical features that drive specificity toward RNA. Such approaches can significantly decrease the search of the chemical space to identify target-specific hit molecules.199 Several ML-based models have been developed around miRNAs to identify novel small molecules.200–202
Overall, ML approaches can make important contributions in the RNA therapeutics field, from predicting 3D structures of RNA molecules to small molecule design and lead optimization. Although none of these methods has been used to create an RNA-focused library with experimental validation, we anticipate such a library will be forthcoming. As each method depends on large amounts of data for training purposes, the quality of predictions will improve as more data becomes available.
Despite the tremendous progress made in the RNA-targeted small molecule field, several factors have hampered its advancement. Assay development for small molecule screening has been mainly focused on proteins and extensively optimized over the years. For example, SPR (surface plasmon resonance), which is now a standard technique for screening small molecules and extracting kinetic data for proteins, still faces several challenges when applied to RNA. The different kinetic behavior of RNA and in some cases significant conformational changes after binding to the small molecule and lower affinity poses several challenges ranging from immobilization and mass transport203 to data analysis.203,204
Thermodynamic data has been essential for lead optimization during the drug discovery process.205 It has been shown that like proteins, enthalpically driven binding is more favorable for RNA-targeting small molecules than entropically driven ones.206–210 Moreover, idiosyncrasies have been observed within an RNA-binding chemotype. For example, electrostatic interactions play an important role in the binding of aminoglycosides to RNA, contributing >50% of the total free energy of binding.211 The same analysis for deoxystreptamine dimers binding to RNA hairpin loops revealed only ∼20% of the total free energy of binding is due to electrostatic interactions, and in contrast to aminoglycosides (enthalpic), binding of dimers is an entropically driven interaction.212 There is a dearth of these types of thermodynamic measurements, which are especially challenging when characterizing RNA–small molecule interactions. These difficulties arise due to weak interactions of initial hits which necessitates using of high concentrations of small molecules to detect binding, where aggregation may also occur; likewise achieving such concentrations may require higher amounts of DMSO that affects RNA stability/structure.213 Nonetheless, elucidation of these forces can help in the design of RNA-focused libraries and to discriminate RNA-binding small molecules from protein binders.
Despite these challenges, significant progress has been made to identify small molecules that target different classes of RNA, as manifested in publicly available databases such SM2miR214 (a database of the experimentally validated small molecules that affect microRNA expression), R-SIM215 (a database for binding affinities for RNA–small molecule interactions), R-BIND216 (Database of Bioactive RNA-Targeting Small Molecules and Associated RNA Secondary Structures), Inforna217 (a database of experimentally determined RNA–small molecule interactions that enables sequence-based design), NoncoRNA218 (a database of experimentally supported non-coding RNAs and drug targets in cancer), ROBIN,154 and RNAmigos219 (a combination of machine learning and molecular docking to identify RNA targeting small molecules).
The future of RNA-targeted small molecule libraries is rapidly evolving, fueled by advances in drug discovery, high-throughput screening, AI-driven design, and structural biology. The current developments in RNA 3D structure determination at high resolutions has become increasingly promising which may help overcome hurdles in structure-based drug design toward RNA targets.220
As researchers continue to recognize the therapeutic potential of RNA molecules in treating diseases, including cancer, viral infections, and neurodegenerative disorders, RNA-targeted small molecule libraries are becoming crucial resources for identifying effective modulators. Developing libraries with RNA-biased scaffolds and functional groups that specifically favor interactions with RNA structures could improve hit rates in RNA-targeted screening, and AI can optimize RNA-targeted libraries by predicting binding affinity, selectivity, and even RNA-binding motifs, enabling the design of novel small molecules with high RNA specificity. For example, by using generative models like Variational Autoencoders (VAEs)221 and Generative Adversarial Networks (GANs),222 new small molecules tailored to bind specific RNA motifs or structures can be designed. This approach allows the exploration of novel chemistries outside conventional chemical spaces.
Considering RNA structure and dynamics for therapeutic applications, conformation adopted by an RNA in vivo may be very different than the folds adopted in vitro. RNA molecules in the cell are typically involved in dynamic processes such as transcription, splicing, translation, and interaction with RNA-binding proteins (RBPs). These interactions cause the RNA to adopt a variety of conformational states throughout its life cycle. In vitro experiments, however, often employ isolated RNAs or fragments of a transcript that are possibly in a more static, simplified state. The conformations and dynamics in vivo may be far more complex, with RNA undergoing multistate folding, binding events, or undergoing conformational transitions triggered by specific cellular factors. Therefore, for therapeutic applications, it is crucial that small molecules designed to interact with RNA can bind and modulate RNA conformations that are relevant to its function in the natural cellular environment, not just in its static in vitro state.223,224
The future of RNA-targeted small molecule libraries will be defined by greater specificity, structural diversity, and an enhanced ability to interact with complex RNA structures and RNA–protein complexes. Leveraging AI-driven molecular design, 3D structural data, and high-throughput RNA-specific screening technologies will be key to accelerating discovery. These advancements could open doors to novel RNA-targeted therapeutics for a wide array of diseases, including cancers, viral infections, and rare genetic disorders.
Footnote |
† These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2025 |