Identification and characterisation of G-quadruplex DNA-forming sequences in the Pseudomonas aeruginosa genome

A number of Gram-negative bacteria such as Pseudomonas aeruginosa are becoming resistant to front-line antibiotics. Consequently, there is a pressing need to find alternative bio-molecular targets for the development of new drugs. Since non-canonical DNA structures such as guanine-quadruplexes (G4s) have been implicated in regulating transcription, we were interested in determining whether there are putative quadruplex-forming sequences (PQS) in the genome of Pseudomonas aeruginosa. Using bioinformatic tools, we screened 36 genes potentially relevant to drug resistance for the presence of PQS and 10 of these were selected for biophysical characterisation (i.e. circular dichroism and thermal difference UV/Vis spectroscopy). These studies showed that three of these G-rich sequences (linked to murE, ftsB and mexC genes) form stable guanine-quadruplexes which were studied by NMR spectroscopy; detailed analysis of one of the sequences (mexC) confirmed that it adopts a two-quartet antiparallel quadruplex structure in the presence of K+ ions. We also show by FRET melting assays that small molecules can stabilise these three new G4 DNA structures under physiological conditions. These initial results could be of future interest in the development of new antibiotics with alternative bio-molecular targets which in turn would help tackle antimicrobial resistance.


Introduction
Recent research has shown that G-quadruplex DNA (G4 DNA) structures form in bacteria and could potentially be new targets for antibiotics. [1][2][3][4][5][6][7] These tetra-stranded structures can readily form under physiological conditions from guanine-rich sequences due to guanines' ability to display Hoogsteen hydrogen bonding interactions. Over the past few years, mounting experimental evidence has gathered indicating that G4 DNA structures form in vivo [8][9][10][11][12][13] and play essential biological roles including regulation of gene expression, replication and telomere maintenance. [14][15][16] Due to their proposed biological relevance, G-quadruplexes are attractive targets for the development of drugs. 9,17,18 Therefore, there has been increasing interest in the development of small molecules able to bind to G4s with high affinity and selectivity. Most studies on G4s (and molecules developed to target them) thus far, have focused on eukaryotic cells and have been mainly aimed at developing new anti-cancer therapies. 9,17,18 Bioinformatic studies have shown that G4 DNA structures are also prevalent in bacteria, particularly in gene promoters. In 2006, it was reported that the Escherichia coli genome contains ca. 3000 putative quadruplex-forming sequences (PQS). 19 Subsequently a database was published outlining putative G4forming sequences in 146 different microbial genomes. 20 Some of these sequences in Escherichia coli, 21 Deinococcus radiodurans 22,23 and Mycobacterium tuberculosis, 24 have been found associated to promoter regions and hence implicated in regulation of gene expression. More recently, a G4 sequencing method was applied to identify in vitro 131 different G-quadruplexes in Escherichia coli and 1990 in Rhodobacter. 25 Another recent study analysed PQSs in the genome of seven Gram-negative and five Gram-positive bacteria, and it showed that Pseudomonas aeruginosa has a particularly dense PQS prevalence. 26 In this same study, the authors investigated the biological activity of a library of G4 DNA binders against both Gram-positive and Gram-negative bacteria. For their lead compound, they reported distinct mechanisms of action against Gram-positive and Gram-negative bacteria which they hypothesise could be attributed to the differences in prevalence of putative G4 sequences in each of them.
There is significant interest in finding alternative biomolecular targets for the development of antibiotics, with particular emphasis in combating multidrug resistance. With this in mind, we carried out bioinformatics studies to identify guanine-rich sequences in the Pseudomonas aeruginosa genome associated to genes relevant to antimicrobial resistance (see below). P. aeruginosa is a Gram-negative bacterial pathogen associated with severe acute and chronic human diseases and is the leading cause of death in cystic fibrosis patients. This bacterium also affects immune-compromised patients in hospitals and infections of the blood, following surgery or pneumonia, resulting in high morbidity and mortality. The emergence of a number of Gram-negative pathogens that are resistant to the front-line antibiotic therapies is a global concern and the pipeline of antibiotics is essentially empty. In this respect, P. aeruginosa has been ranked by WHO in the top three of organisms that are critical and needing immediate attention. Therefore, it is urgent to identify and validate alterative biomolecular targets to develop new antibacterial agents capable of either killing these multi-drug resistant (MRD) bacteria or make them susceptible again against current antibiotics.
Herein we report studies on a series of G-rich sequences in P. aeruginosa genome with the potential to form G4 structures in regulatory/coding regions of genes relevant to antimicrobial resistance. Following a bioinformatics analysis, ten sequences were selected, and studied by circular dichroism (CD) and thermal difference UV/Vis spectroscopy (TDS) to determine their ability to fold into G-quadruplexes under physiological conditions. This allowed us to identify three sequences that readily fold into stable G4s under physiological conditions. These new G4s were further studied by NMR spectroscopy. The interactions of some previously reported binders with the new G4 structures were studied by FRET melting assays which confirm they can be stabilised by small molecules.

Results and discussion
Bioinformatic analysis for G4 in the P. aeruginosa genome.
To explore the role of G-quadruplexes on gene expression in natural genetic contexts, we investigated their occurrence in a total of 36 genes that have been proposed to play critical roles in P. aeruginosa antibiotic resistance and lifestyle regulation (Table S1, ESI †). The name and function of the genes studied are given according to the National Centre for Biotechnology Information's (NCBI) BLAST database. The nucleotide sequences of the genes were obtained from UniProt, EMBL-EBI or NCBI in FASTA format. As the position of a G4 sequence within the promoter, 5 0 -UTR, and 3 0 -UTR regions has been shown to influence gene expression, 21 we included in our analysis the upstream regulatory regions (500 nucleotides upstream from the start codon) and 3 0 -UTR regions (250 nucleotides downstream from the stop codon) (SI) from our selected genes using the KEGG database. 27 These sequences in FASTA format were then analysed using the QGRS mapping web server 28 (https://bioinformatics. ramapo.edu/QGRS/analyze.php) with a minimum G-group size 2 to identify putative G4-forming sequences. The top ten sequences showing a minimum G-score of 19 (Table 1) were selected for biophysical and spectroscopic studies to establish whether G4 structures could be formed. The majority of genes selected for the biophysical studies (e.g., murE, 29 amprR, 30 czcA, 31 mexC, 32 nfxB, 33 oprF 34 and pvrR 35 ) have been associated with antibiotic resistance in P. aeruginosa and other ESKAPE pathogens (i.e. Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii and Enterobacter Paper RSC Chemical Biology species, which with P. aeruginosa are the leading cause of nosocomial infections throughout the world).
Biophysical characterisation of G-rich sequences identified by bioinformatics analysis.
The ten putative quadruplex-forming sequences (PQS) identified in the P. aeruginosa genome (see Table 1 for the sequences as well as their position -i.e. promoter, coding sequence or UTRs), were studied by circular dichroism (CD) spectroscopy and thermal difference UV/Vis spectroscopy (TDS). When combined, the results from these two techniques give a very good indication of whether a given sequence folds into a G4 structure or not. Three of the sequences -named murE, ftsB and mexC according to the corresponding gene (see Table 1) -were shown to form G4 structures. For these sequences, the CD spectra recorded at room temperature showed the characteristic positive ellipticity band at ca. 295 nm and a negative one at ca. 260 nm associated to an antiparallel G4 DNA conformation (see Fig. 1a). 36,37 The formation of the three G4 DNA structures was confirmed by the characteristic pattern in the TDS: positive bands at ca. 243 and 273 nm, and a negative one at ca. 295 nm (see Fig. 1b). 38 In contrast, the combined analysis of the CD and TDS data (see Fig. S1, ESI †) for the other seven sequences under study, suggest they do not form stable G4 DNA structures. It should be noted that four of the sequences (czcA, nfxB, oprF and pilE) that did not show clear formation of G4 structures by CD and TDS, contain five rather than four G-tracks. This could potentially lead to more than one G4 structure forming depending which G-tracks are involved; this in turn could lead to ambiguous spectroscopic data as has been previously highlighted. 38 While it would be possible to study these sequences using a truncated version of the PQS (i.e. with four G-tracks), we decided to focus our further studies on the sequences associated to genes murE, ftsB and mexC which clearly form stable G4s. In full agreement, the 1 H NMR spectra of murE, ftsB and mexC oligonucleotides revealed signals in the region between d 11.0 and 12.5 ppm, which are characteristic for imino protons of guanine residues involved in G-quartets (Fig. 1c). Additionally, the signals between d 12.5 and 13.3 ppm in the 1 H NMR spectra indicate formation of Watson-Crick GC base pairs. In the case of mexC oligonucleotide sequence, three signals between d 10.5 and 10.7 ppm were detected suggesting the presence of hydrogen bonded imino protons in non-Watson-Crick geometries. Monitoring the conformational rearrangement of the three oligonucleotides, in the presence of 0 to

RSC Chemical Biology
Paper 100 mM concentrations of KCl, revealed that murE and ftsB sequences adopted defined pre-folded structures in the absence of K + ions (Fig. S2, ESI †). After addition of KCl the pattern of imino signals changed in the 1 H NMR spectra of murE and ftsB oligonucleotides showing only the presence of G4 structures, which remained constant at KCl concentrations ranging from 10 to 100 mM KCl. In the case of mexC, the intensity of imino signals increased as KCl was titrated into a solution of mexC with no other changes observed in the imino region that would be connected to structural changes.
mexC oligonucleotide adopts a two-quartet G4 The number of signals in the 1 H NMR spectra of murE, ftsB and mexC oligonucleotides in the region between d 11.0 and 12.5 ppm suggests the formation of more than one structure in solution. Wellresolved NMR spectra of mexC sequence allowed us to study its structural features, which was not possible in the case of murE and ftsB sequences due to the severe overlapping of the signals. Residuespecific, partial 13 C, 15 N-labeling of guanine (10%) and thymine (8%) residues in the mexC oligonucleotide enabled the unambiguous assignment of H1 and H3 proton resonances of guanines and thymines, respectively. We identified eight guanines involved in Gquartets, while no signals were observed in imino region of the 1D 15 N-edited HSQC spectra for G6, G7 and G11 residues suggesting that they are located in loops (Fig. 2a). It is interesting, that for G1, G9, G14, G21 and T12 residues additional imino signals with lower intensity were observed that belong to the minor structure of mexC (B30%). Herein, we are focused on the structure of the major G4. The characteristic H1-H8 connectivities observed in the NOESY spectra allowed us to establish a topology for mexC G4, which involves two G-quartets with the following hydrogen-bond directionalities: G1 -G14 -G21 -G9 and G2 -G8 -G20 -G15 (Fig. 2b). The guanine residues of G1-G14-G21-G9 quartet exhibit a clockwise donor-acceptor hydrogen-bonding directionality, while those of G2-G8-G20-G15 quartet display an anti-clockwise directionality (Fig. S3, ESI †).

Small molecules stabilise murE, ftsB and mexC G4 DNA structures
Having established that sequences associated to the murE, ftsB and mexC genes fold into stable G4 structures, it was of interest

Paper
RSC Chemical Biology to determine whether previously reported G4 DNA binders would stabilise these structures. Therefore, binding of BRACO19, PDS, TMPyP4 and Ni-salphen (see Fig. 3 for molecular structures) towards the three G4 structures was studied via FRET melting assays. The data shows that these four molecules stabilise the G4 structures with particularly high DT m in the case of TMPyP4 (see Fig. 3 and Fig. S6, ESI †).
Interestingly, Ni-salphen shows some selectivity towards the G4-forming sequences associated to ftsB and mexC over murE.
In the case of the interaction of PDS with ftsB, the G4-DNA melting process did not follow the same pattern than for all other compounds and sequences (see Fig. S6, ESI †). Instead, a two-step melting process was observed, indicating a more complex interplay between the binding of PDS to different sites and/or topologies of ftsB-G4.

Conclusion
There is significant interest in finding new targets for the development of antibacterial drugs, with particular emphasis in combating multidrug resistance. Previous studies on bacterial G4s are relatively scarce as compared to those on eukaryotes.
With this in mind, we carried out bioinformatics studies to identify PQSs in the P. aeruginosa genome associated to genes relevant to antimicrobial resistance. CD spectroscopy, thermal difference UV/Vis spectroscopy and NMR spectroscopy showed that three of the ten sequences investigated -i.e. murE, ftsB and mexC -clearly fold into stable G-quadruplexes. It is worth highlighting that these sequences form two-tetrad G-quadruplexes, unlike most G4s reported to date. Detailed structural NMR studies for mexC confirmed the folding of this sequence into an antiparallel G4 structure with a diagonal loop. FRET melting assays for the three sequences showed that murE is the most thermally stable followed by ftsB and then mexC. Furthermore, with four known G4 binders, we show that the three structures can be further stabilized upon interaction with small molecules. Future studies should establish if formation of these G4s play regulatory roles in the expression of genes involved in crucial events such as drug efflux, cell division or cell wall biogenesis. If so, they could be new targets for antibacterial agents.

Experimental details
Circular dichroism (CD) measurements DNA oligonucleotides (100 mM in milliQ water) were diluted to 5 mM in buffer containing 10 mM Tris-HCl pH 7.4 and 100 mM KCl to give a 700 mL final volume. The sample was annealed by heating to 95 1C for 5 minutes and then cooled to room temperature overnight. CD spectra were recorded on a JASCO-714 spectropolarimeter equipped with a MPTC-490S/15 multicell temperature unit using a 1 cm optical path and a reaction volume of 700 mL. Scans were performed at 20 1C over a wavelength range of 220-400 nm (3 accumulations) with a scanning speed of 100 nm min À1 , 4 s response time, 1 nm data pitch and 1 nm bandwidth. The buffer spectrum was subtracted and plotted using GraphPad Prism 7.

RSC Chemical Biology Paper
Thermal difference spectra DNA oligonucleotides (100 mM in milliQ water) were diluted to 5 mM in buffer containing 10 mM LiCacodylate pH 7.4 and 100 mM KCl to give a 700 mL final volume. The sample was annealed by heating to 95 1C for 5 minutes and then cooled to room temperature overnight. Thermal denaturation was performed with a Cary 100 Bio UV-Visible Spectrophotometer and 1 cm path-length quartz cuvette. The UV spectra were measured over a 220-335 nm spectral range at a 1 nm data interval. Spectra were recorded at 20 1C, the sample was then heated to 90 1C and measurements repeated. The thermal difference spectra were obtained by subtracting the UV spectra at 20 1C from the corresponding UV spectra at 90 1C.

Conflicts of interest
There are no conflicts to declare.