Aberystwyth University A survey of non-coding RNAs in the social and predatory myxobacterium Myxococcus xanthus DK1622

Prokaryotic ncRNAs are important regulators of gene expression, and can be involved in complex signalling networks. The myxobacteria are model organisms for studies into multicellular development and microbial predation, being particularly renowned for their large genomes and exceptionally sophisticated signalling networks. However, apart from two specific examples, little is known about their regulatory ncRNAs. Here, we integrate bioinformatic predictions and transcriptome sequence data to provide a comprehensive survey of the ncRNAs made by the exemplar myxobacterium M. xanthus DK1622. M. xanthus RNA-seq data from four experimental conditions was interrogated to identify transcripts mapping outside coding sequences and to known ncRNAs. The resulting 37 ncRNAs were clustered on the genome and most (30/37) were conserved across the myxobacteria. A majority of ncRNAs (22/37) were intergenic, while 13 were at least partially antisense to protein-coding genes. Predicted promoter and terminator sequences explained the start/stop sites of 18 ncRNAs. mRNA targets for the ncRNAs were predicted, including plausible candidates for a known regulatory ncRNA. 22 ncRNAs were diﬀerentially expressed by nutrient availability and expression of 25 predicted targets was found to correlate strongly with that of their regulatory ncRNAs. Sharing of predicted mRNA targets by multiple ncRNAs suggests that some ncRNAs might regulate each other within signalling networks. This genomic survey of M. xanthus ncRNA biology provides a starting point for further studies of myxobacterial ncRNAs, which are likely to have important functions in these industrially important and sophisticated organisms.


Introduction
The RNA molecules within a cell have diverse functions.][3] The central role of rRNA, tRNA and mRNA in the translation of proteins was elucidated in the latter half of the previous century.Further studies of translation then led to the identification of catalytic RNAs, such as the RNA component of RNase P (RnpB), which processes tRNA gene transcripts to give mature tRNAs, and self-splicing introns. 4,5Other RNAs involved with translation were subsequently found, including Ffs, which is the RNA component of SRP for co-translational translocation of nascent proteins across membranes, and SsrA, which is a transfermessenger RNA (tmRNA) for unstalling stalled ribosomes. 6,7n prokaryotes, the term small RNAs (sRNA) has been widely used to describe regulatory RNAs such as Ffs, SsrA and RnpB, as they are typically small (50-500 nt), and they are distinct from tRNAs, rRNAs and mRNAs. 1 However, cells have also been found to contain large RNAs which are not tRNA, rRNA or mRNA, and these long transcripts became known as noncoding RNAs (ncRNA), a term which includes rRNAs, tRNAs and sRNAs.
At the beginning of this century, a role for the cryptic 6S ncRNA of Escherichia coli (SsrS) was finally elucidated.Rather than being involved with translation, SsrS was found to be a transcriptional regulator, repressing transcription of Sigma70dependent promoters during stationary phase. 8Since that first discovery, many ncRNAs have been shown to regulate the expression of specific subsets of genes, and ncRNAs are being

View Article Online View Journal | View Issue
This journal is © The Royal Society of Chemistry 2020 Mol. Omics, 2020, 16, 492--502 | 493 discovered in increasing numbers due to bioinformatics predictions and whole-transcriptome sequencing. 9,10cRNAs can participate in functional associations with other biomolecules through two mechanisms -by binding to target proteins (e.g., within nucleoprotein complexes), and/or by basepairing with mRNA.Regulatory ncRNAs bind to the mRNA transcripts of their target genes, and in doing so stimulate/ inhibit their transcription/translation.For instance, ncRNAs can bind to the ribosome-binding site (RBS) of their target mRNA, blocking translation.Conversely, some ncRNAs bind to their target mRNAs close to the RBS, preventing formation of an inhibitory secondary structure involving the RBS, and thus stimulating translation. 11In a similar fashion, binding of an ncRNA to its target mRNA can regulate transcription of the mRNA by occluding or exposing rut sites for rho-dependent transcription termination. 2ncRNAs can also bind to ncRNAs that themselves bind to gene regulatory elements, competing with one another for binding, and forming complex gene regulatory networks. 12In addition, matchmaking between ncRNAs and their mRNA targets is often stimulated by RNA chaperone proteins such as Hfq, ProQ and CsrA. 13he myxobacteria are a family of proteobacteria that are particularly noted for their complex gene regulatory networks. 14evertheless, with two notable exceptions (Pxr and MsDNA), their ncRNAs have received little research attention.6][17] When prey becomes scarce, a population of myxobacterial cells embark co-operatively on a developmental programme culminating in the formation of a macroscopic fruiting body, within which a subset of cells differentiate into spores. 18,19This social phenomenon requires complex regulatory systems for responding to: the nutritional status within each cell, the cell's context within the population, and its progression through the developmental programme. 20he best understood regulatory ncRNA in myxobacteria is Pxr, a gatekeeping ncRNA that is expressed when nutrients are abundant, and which prevents cells from commencing multicellular development. 21Expression of the pxr gene is under the control of the two-component system PxrR/PxrK (also known as SpdS/SpdR and MXAN_1077/MXAN_1078) and its secondary structure includes a stem-loop which is essential for function. 22,23How Pxr inhibits fruiting is currently unclear.
Another peculiar ncRNA was first described in M. xanthus, but has since been shown to occur widely across bacteria.Known as msDNA (multicopy single-stranded DNA), the msDNA gene is transcribed and the nascent MsDNA RNA folds into a structure which allows priming by reverse transcriptase from a 2 0 -OH group of the RNA.DNA synthesis proceeds using the MsDNA RNA strand as a template, which is then degraded by RNase H.After reverse transcription has completed, the end product is a mature molecule comprising single-stranded DNA covalently linked to a portion of the original RNA transcript. 3,24Despite being discovered more than 30 years ago and being conserved across bacteria, the function of MsDNA, and msDNA more generally, remains unknown.
In order to provide a genome-wide view of the production of ncRNAs by M. xanthus, RNA-seq data from four experimental conditions were interrogated, identifying distributions of transcripts that mapped between protein coding sequences (CDSs).In this fashion 37 ncRNAs were identified and further characterised.We also integrated knowledge on previously identified ncRNAs and bioinformatics predictions to give a comprehensive overview of ncRNAs in M. xanthus DK1622, providing a starting point for further studies of these enigmatic entities.

Experimental procedures
Identification of ncRNAs in experimental datasets ncRNAs were identified from the RNA-seq transcriptome datasets described by Livingstone et al. (2018b) using the ToRNAdo script, which has previously been applied to characterise staphylococcal ncRNAs. 25,26ToRNAdo maps sequencing coverage to distributions with peaks above a threshold, and calls each distribution an RNA.Annotated RNA and protein-coding genes are then excluded, and each remaining ncRNA is outputted with start/stop positions, the genomic strand transcribed and maximum peak height.ToRNAdo also identifies ncRNAs as intragenic (found within and antisense to a gene), intergenic (found between genes), and mixed (partially antisense to a gene).
Transcriptome sequence datasets were available for M. xanthus DK1622 incubated under four experimental conditions: nutrient-free buffer (STARVED, n = 4), nutrient broth (FED, n = 3), buffer with live E. coli prey (LIVE, n = 4) and buffer with pre-killed E. coli prey (DEAD, n = 4).See Livingstone et al. for full technical details. 27In short, liquid cultures of predator and prey were grown separately in a medium which sustained growth of both organisms (LBCY). 27Mid-exponential phase cells were harvested and resuspended in either buffer (TM) or LBCY, either in pure culture, or mixed with the other organism.After 4 hours of unshaken incubation at 30 1C, cells were harvested for RNA extraction, ribodepletion, cDNA synthesis, library preparation and sequencing, as described previously. 27f ncRNAs identified by ToRNAdo in one replicate overlapped ncRNAs in another replicate, the records were integrated such that the ncRNA was ascribed the first start site, the last stop site, and the maximum peak height of any constituent ncRNA.For each experimental condition, if an ncRNA was absent from at least 1 replicate, it was excluded from further analysis.sRNA genes already in the DK1622 genome annotation were not detected by ToRNAdo, and were therefore manually incorporated into the ncRNA dataset.Their relative expression levels in different experimental conditions were taken from Livingstone et al. 27 ncRNAs were also discarded from the dataset if they were on the same strand and within 10 nt of the 5 0 end of a CDS as they likely represented 5 0 -untranslated regions (5 0 -UTRs) of mRNAs.Also excluded were the non-coding and leader sequences of polycistronic rRNA gene transcripts.

Identification of homologues of ncRNAs in myxobacteria was achieved by querying against the Refseq myxobacterial genomes
This journal is © The Royal Society of Chemistry 2020 in the 'Myxococcales' taxon within the NCBI nr database, using BLASTn with an e-value cut-off 0.1 to maximise the sensitivity of homologue identification.Location of ncRNAs and other genes around the genome were visualised using the Circos platform. 28utative promoters and terminators were predicted using PePPER, queried with regions of genomic DNA extending at least 1 kb beyond the start and stop sites of each ncRNA. 29erminators were often identified as nested sequences centred on the same genome position and these were manually curated to remove redundancy.
Regulatory ncRNA target gene predictions mRNA targets were predicted for intergenic ncRNAs and the intergenic portions of mixed ncRNAs using CopraRNA. 30irstly, BLASTn was used to identify homologues of ncRNAs in myxobacterial genomes as described above.For every ncRNA, five homologues were included in CopraRNA queries: M. xanthus DK1622 (NC_008095), Myxococcus macrosporus HW-1 (NC_015711), Myxococcus stipitatus DSM14675 (NC_020126), Myxococcus fulvus 124B02 (NZ_CP006003), and Corallococcus coralloides DSM2259 (NC_017030).These five organisms were chosen as the four phylogenetically closest organisms to M. xanthus for which there was a RefSeq genome available.For all ncRNAs except Mxs009, Mxs015, Mxs021, Mxs030 and Mxs037, at least 3 homologues could be found from amongst the five organisms, allowing CopraRNA prediction of targets.Mxs028 and Mxs035 were not found in Corallococcus coralloides, therefore their queries comprised four Myxococcus spp.homologues.All other queries contained homologues from all five organisms.

Differential expression analysis
For the four experimental conditions employed, the total maximum peak heights of the full ToRNAdo output was used for normalisation, allowing comparison of relative abundance of individual ncRNAs between conditions.The relative abundances of normalised maximum peak heights for each ncRNA under all four experimental conditions were compared to the corresponding normalised expression values for their putative target mRNAs (or normalised peak heights if their predicted target was an ncRNA), and correlation coefficients calculated.Differential expression of ncRNAs was defined as log 2 (maximum peak height under condition 1/maximum peak height under condition 2).A cut-off for differential expression was applied to the ncRNAs, such that only values with a greater than two-fold change were considered.Heatmaps of normalised expression values were generated using Heatmapper with complete linkage clustering of Spearman rank correlated distances. 31ediction of ncRNAs ncRNAs were identified in Myxococcus spp.genomes using RNAz 2.1, which searches for conserved DNA sequences in non-coding parts of genomes, predicts the secondary structure, and calculates its thermodynamic stability. 32The input for RNAz was a multiple whole genome alignment performed using progressive Mauve, with M. xanthus DK1622 as the reference genome.(The resulting XMFA alignment is not a file format supported by RNAz.It was therefore converted to a ClustalW file format which is supported by RNAz using the script provided in the Mauve package). 33The alignment was then processed using the windowing approach described in the RNAz manual.Genomes included in the alignment were Myxococcus fulvus 124B02 (NZ_CP006003), Myxococcus hansupus mixupus (NZ_CP012109), Myxococcus macrosporus HW-1 (NC_015711), M. macrosporus DSM 14697 (NZ_CP022203), Myxococcus stipitatus DSM 14675 (NC_020126), and M. xanthus DK1622 (NC_008095).RNAz output was filtered with a RNA class probability cut-off of 0.9, as suggested in the RNAz manual.To visualize the ncRNAs predicted by RNAz in a genome browser, the unfiltered RNAz output was processed using custom scripts to extract ncRNA sequences in fasta format.BLASTn was used to locate these within the M. xanthus DK1622 genome and a GFF format file was generated, including Z-scores and probabilities for each prediction.

Compiling the ncRNAs of M. xanthus DK1622
Annotated ncRNAs.There are four rRNA gene clusters in the 9.14 Mbp Myxococcus xanthus DK1622 genome, at 0.38, 3.29, 5.89 and 8.96 Mbp from the origin of replication.Each cluster contains genes for the 16S, 23S, 5S rRNAs (from 5 0 to 3 0 ), with 1-2 tRNA genes between each 16S and 23S gene.The current annotation of the DK1622 genome contains a total of 65 tRNAs and an additional four ncRNA genes annotated as sRNAs (SsrA, RnpB, Ffs and SsrS).There are also CRISPR systems annotated in the DK1622 genome.Three sets of CRISPRs with associated cas genes are found in close proximity to one another, near the chromosomal origin of replication, at 8.58, 8.86 and 8.89 Mbp.CRISPRs are expressed as long ncRNAs, which are then cleaved post-transcriptionally to generate unit-length small RNAs.Details of ncRNAs in the DK1622 genome annotation are provided in Supplemental File 1 (ESI †).The Pxr and MsDNA ncRNAs aren't described in the DK1622 genome annotation.
To investigate the possibility of further ncRNAs in M. xanthus, we interrogated DK1622 transcriptomic data for evidence of transcription of non-coding regions of the genome.
Extraction of ncRNAs from transcriptomics data.For each experimental condition, with M. xanthus DK1622 incubated in nutrient medium (FED) or starvation buffer (STARVED), and in starvation buffer with either viable E. coli prey or pre-killed E. coli (LIVE and DEAD), ncRNAs identified in multiple replicates were integrated and any overlapping ncRNAs from different replicates combined into single records, giving 231 potential ncRNAs.A screen was then applied to only consider putative ncRNAs which were found in every replicate of an experimental condition, resulting in a set of 57 putative ncRNAs.Although 174/ 231 (75%) candidate ncRNAs were filtered out at this stage, those 174 ncRNAs accounted for just o7% of the total signal (the sum of maximum peak heights), indicating that they were relatively minor components of the transcriptome compared the remaining 57 remaining ncRNAs, which together made up 493% of the To avoid potential confusion, hereafter the term 'RNA-seq ncRNAs' is used to refer to these 37 ncRNAs and not rRNA, tRNAs, or CRISPRs.This term includes 33 ncRNAs identified by ToRNAdo and the four ncRNAs annotated as sRNAs in the DK1622 genome, expression of which was observed in the transcriptome data published of Livingstone et al. 27 Prediction of ncRNAs.The RNAz algorithm assesses sequence conservation in non-coding regions of genomes in order to identify putative ncRNAs. 32RNAz was queried using an alignment of all available complete Myxococcus spp.genomes, identifying 1582 putative ncRNAs conserved in each genome with a probability greater than 0.9 (Supplemental File 2, ESI †).(The second sheet in Supplemental File 2, ESI † also provides all 4147 predictions with a probability greater than 0.5.)Seven of the RNA-seq ncRNAs were found amongst the RNAz-predicted ncRNAs: Mxs002, MsDNA, Pxr, Ffs, Mxs019, RnpB and Mxs035 (all of which had a probability of 40.98).

Characteristics of M. xanthus DK1622 ncRNAs
Conservation and genomic organisation of ncRNAs.The characteristics of the ncRNAs identified from transcriptome sequencing are provided in Table 1 and further details (e.g., their sequences) are provided as Supplemental File 3 (ESI †).
BLASTn searches found that more than 80% (30) of the 37 ncRNAs had homologues across the myxobacteria.Mxs028 and Mxs035 were only present in Myxococcus spp.genomes, Mxs009 and Mxs037 were only present in M. xanthus and M. macrosporus, while Mxs15, Mxs21 and Mxs30 were only present in M. xanthus.Conservation across the myxobacteria suggests a biological role for those ncRNAs, while the less-conserved ncRNAs are presumably either clade-specific regulators, or are the result of spurious transcription. 34Only Ffs, SsrA, SsrS, RnpB were found to have homologues in Rfam. 35or genes involved in fruiting body formation, different categories of regulators are distributed unevenly around the chromosome. 36Therefore, the location of ncRNA genes in the genome was determined.The RNA-seq ncRNAs were distributed around the DK1622 genome in a non-uniform fashion (unlike that seen for tRNAs and rRNA genes), with ncRNA genes falling into clusters separated by large stretches of genome entirely lacking ncRNA genes (Fig. 2).18 of the 37 ncRNAs (49%) were found in just 19% of the genome (from 8.80 to 1.42 Mbp -i.e., 1.76 Mbp spanning the chromosomal origin).Other 'hotspots' were apparent between 2.11 and 2.52 Mbp (6 ncRNAs in 4% of the genome) and between 3.11 and 3.71 Mbp (7 ncRNAs in 7% of the genome).Conversely, the region from 3.8 to 8.80 Mbp (55% of the genome) had just 6 ncRNA genes (16%).There was also a slight bias towards ncRNAs being expressed from the negative strand (25 of 37).
Genetic contexts of ncRNAs.For all 37 RNA-seq ncRNAs, the genomic context of the encoding gene was established.The ncRNAs were found at just 30 loci, as some were antisense to one another, or were present close to each other (o1 kbp between them).Genome contexts were defined as either antisense (13), intergenic (14), or mixed (10).Ffs and RnpB both have antisense ncRNA genes which presumably regulate them, and Ffs and RnpB were therefore manually defined as being intergenic rather than antisense.
Antisense RNAs were located entirely within another gene.Eight of the 13 were encoded antisense to other RNA genes (16S and 23S rRNA genes, ffs and rnpB), but five were antisense to CDSs (IF-3, pdhC and three hypothetic proteins MXAN_4364, MXAN_RS05615 and MXAN_RS30530.This journal is © The Royal Society of Chemistry 2020 Intergenic ncRNA genes had no overlaps with CDSs.Mixed ncRNAs partially overlapped at least one CDS, having at least one antisense and at least one intergenic region.Four were 3 0 -convergent with their antisense CDS, one (Mxs012) was 5 0 -divergent from its antisense CDS, three (Mxs009, Mxs014 and Mxs034) were antisense (with extensions (o25 bp) at one or both ends of their antisense genes), and the remaining had complex relationships with other genes, being antisense to two or more genes).Gene organisation for all RNA-seq ncRNAs is shown schematically in Fig. 3.
Lengths of ncRNAs.The mean length of the RNA-seq ncRNAs was 947 nt, however intergenic ncRNAs averaged just 211 nt, antisense ncRNA had a mean length of 847 nt, and mixed ncRNA were 2107 nt long.Trans-acting intergenic ncRNAs are usually small (50-250 nt) and antisense ncRNAs can be much larger, fitting with the observed size of M. xanthus DK1622 intergenic ncRNAs. 37or most ncRNAs there was remarkable consistency in the start/stop positions of the same ncRNAs identified under different experimental conditions.Of the 66 start or stop sites, 42 had standard deviations of less than 10 nt.However some start/stop sites were very variable (five 5 0 -ends, five 3 0 -ends), with standard deviations in their positions of more than 100 nt (Table 1).For the 5 0 -ends this large deviation could represent alternative promoters, or spurious transcriptional initiation.Intriguingly, all five ncRNAs with variable 5 0 -ends are large antisense ncRNAs for rRNA genes, making spurious initiation a more likely explanation than alternative promoters.
Promoters and terminators.Promoter and terminator sequences were predicted for the genomic regions that included ncRNAs, using the PePPER webserver.In several cases promoters and terminators were predicted to lie at the 5 0 and 3 0 ends of ncRNAs respectively (predicted transcriptional start sites and location of predicted terminators are included in Supplemental File 3, ESI †).
Of the five ncRNAs with variable 3 0 -ends, three lacked predicted terminators.However, the mxs031 gene had a terminator coincident with its furthermost 3 0 -end, implying that a large proportion of its transcription complexes halt transcription before reaching the terminator.Conversely, the mxs010 gene has a terminator coincident with its shortest 3 0 -ends, suggesting that the predicted terminator is weak and may not efficiently halt transcription complexes.
The small number of predicted promoters is probably because most ncRNAs are expressed using other Sigma factors, rather than the s70-dependent class on which PePPER is based.This supposition is supported by the large numbers of sigma factors (55) identified in the DK1622 genome by the P2TF database. 38NA targets of regulatory ncRNAs Target prediction.Candidate targets were predicted for the ncRNAs using two approaches.Mixed and antisense ncRNA genes were presumed to act in cis, therefore their antisense genes were predicted to be their targets (Supplemental File 4, ESI †).Target genes of the 13 antisense ncRNAs included 6 ncRNAs (including 16S, 23S rRNA genes, Ffs and RnpB) and five CDSs (IF-3, pdhC and genes for three hypothetical proteins).cis-Target genes of mixed ncRNAs included 4 rRNAs (16S and 23S), Table 1 RNA-seq identified regulatory ncRNAs.ncRNAs are defined by their position and strand of the genome.Their genetic organisation is presented, and whether the ncRNAs start/stop sites can be explained by predicted promoter and terminators.Evidence supporting the identification of ncRNAs is provided as a code: 'A' represents annotated ncRNAs in the DK1622 genome, ncRNAs whose differential expression correlates (R 4 |0.5|) with that of a predicted target are indicated with a 'C', 'E' denotes experimentally observed expression, the presence of homologues in other myxobacteria is denoted 'H', those described in the literature are indicated 'L', 'P' denotes a predicted promoter upstream of the ncRNA, 'T' denotes a predicted terminator at the 3 0 -end of the ncRNA, and 'Z' denotes an ncRNA predicted by RNAz.Start/stop sites marked with an * were variable between experimental conditions (standard deviation 4100 nt)  The two approaches combined gave 67 predictions for 32 of the 37 ncRNAs (Supplemental File 4, ESI †).Predicted targets were defined as cis or trans depending on whether they were in the same genomic location as their regulatory ncRNA.Predictions based on antisense complementarity identified 32 cis targets.CopraRNA predictions included 7 cis predictions (presumably due to overlap between an ncRNA and the region upstream of the translational start site of its neighbouring genes investigated by CopraRNA), but also provided 30 trans predictions.Looking at COG categories and annotations, there seemed to be no enrichment of particular functional classes, as might be expected given the diversity of ncRNAs and their likely roles.
Interestingly, one of the predicted trans-targets of Pxr is actA mRNA (MXAN_3213).ActA is needed for fruiting body formation upon starvation -deletion of the actA gene blocks developmental progression. 39In the presence of nutrients Pxr prevents development, and a plausible mechanism by it which it might do so, is through inhibiting the expression of actA.Pxr is known to exert its regulatory effects through a (terminatorlike) stem-loop denoted SL3. 22The CopraRNA prediction of interaction between actA mRNA and Pxr is via SL3, consistent with Pxr regulating expression of actA (Fig. 4).
Some predicted trans targets were shared by more than one ncRNA.MXAN_0702 (encoding a hypothetical protein) is targeted by Mxs013 and Mxs028, and MXAN_1951 (also encoding a hypothetical protein) is targeted by Mxs016 and Mxs028, suggesting that they may form a regulatory network in vivo.Differential expression of ncRNAs.Normalised expression data for the 37 RNA-seq ncRNAs are presented in Supplemental File 6 (ESI †) and as a heatmap in Fig. 5. Across all four conditions tested, nine of the RNA-seq ncRNAs accounted for more than 75% of the total ncRNA transcripts mapped by ToRNAdo: Mxs002, MsDNA, Mxs016, and six ncRNAs antisense to rRNA genes (Mxs006, Mxs007, Mxs025, Mxs026, Mxs031 and Mcs037).
Where expression data was available from more than one condition, the relative expression of ncRNAs was assessed comparing experimental conditions.In conditions DEAD and FED, DK1622 cells are actively feeding, and comparing those conditions with condition STARVED, most ncRNAs (28 of the 31 with non-zero expression in condition STARVED) were found to be differentially expressed more than two-fold (Supplemental File 6, ESI †).
This journal is © The Royal Society of Chemistry 2020 Mol. Omics, 2020, 16, 492--502 | 499 The relative expression of ncRNAs under the four experimental conditions was compared with the relative expression of predicted targets under the same conditions (Supplemental File 6, ESI †).In many cases (33 of 65) the expression of an ncRNA was found to significantly correlate with that of a predicted target, supporting the target prediction.Most correlations were positive (30 of 33), 13 involved trans targets and 20 involved cis targets.
Pxr expression was induced more than 16-fold when fed compared to starved, as expected.Of the six predicted trans targets of Pxr, five were repressed by nutrients (four by more than 2-fold), while one (MXAN_0569) was induced more than 8-fold.

Discussion
The number and nature of DK1622 ncRNAs Interrogation of transcriptome datasets identified 37 ncRNAs, including 13 antisense ncRNAs, which were expressed in cultures of M. xanthus, (4 of which were previously annotated in the DK1622 genome).The numbers of ncRNAs identified in other bacteria is dependent on whether an experimental or bioinformatics approach is taken, and what selection/filtering criteria are applied. 37Nevertheless, it would seem that most bacteria encode in the order of 100 ncRNA, with between 1/3 and 2/3 of those being antisense ncRNAs. 37The proportion of RNA-seq ncRNAs in DK1622 which are antisense (35%), mirrors the situation in other bacteria, and as usual, they are longer than intergenic ncRNAs (averaging 847 nt and 211 nt respectively).
The relatively small number of ncRNAs, 37, is likely due to the stringent filtering we applied.We deliberately excluded 5-UTRs, ultimately because RNAs with 5 0 -UTRs have 3-regions which are translated, and thus by definition aren't ncRNAs.Our merger of small ncRNAs into larger ncRNAs when the two were fused in other experimental conditions/replicates also reduced the number of apparent ncRNAs.Perhaps more importantly, we also applied a screen to retain only ncRNAs which were present in every replicate of an experimental condition.Applying this filter resulted in the loss of 174 candidate ncRNAs, albeit representing just o7% of the total ncRNA signal.Many of the candidate ncRNAs lost at this stage were present in n À 1 replicates of an experimental condition, and often in more than one experimental condition, suggesting that they may represent real but relatively low abundance ncRNAs.
The identification of ncRNAs is also highly dependent on genome annotation of CDSs.Comparing between the original DK1622 annotation from 2012, with a reannotation in 2017, around one fifth of candidate ncRNAs were discarded, as they could be mapped onto new CDSs (generally encoding hypothetical proteins averaging just 118 amino acyl residues in length), or they were found to be 5 0 -UTRs of CDSs whose predicted translational start site had been reannotated further upstream.It may well transpire in coming years that our approach has falsely identified transcripts as ncRNAs which turn out to encode proteins, but we may also have excluded true ncRNAs because they coincide with mis-annotated CDSs.It is also quite possible that there may be DK1622 ncRNAs which are conditionally expressed (as indeed we have shown many are), and which have remained undetected as they are not expressed in any of our four experimental conditions.
By assessing the conservation predicted RNA secondary structures, RNAz identified 1582 putative ncRNAs in the M. xanthus DK1622 genome.Seven of those putative ncRNAs predictions mapped onto RNA-seq ncRNAs.Presumably the remaining 1575 putative ncRNAs predicted by RNAz are either false positives or conditionally-expressed.As they were not observed experimentally, and because of their large number, a detailed analysis of the 1575 remaining ncRNA predictions was not undertaken.

Start, stops, promoters and terminators
Comparing between experimental conditions (and between replicates), the start and stop sites of the majority of ncRNAs were remarkably consistent.In some cases putative promoter and terminator sequences could be identified at the ends of ncRNAs, agreeing with the experimentally inferred start/stop sites (Table 1).However, for the majority of ncRNAs, no candidate promoters or terminators could be identified.Promoter prediction algorithms rely on experimentally derived promoter sequences, and usually focus on promoters that are recognised by RNA polymerase holoenzyme containing the housekeeping Sigma factor s70. 40 According to the P2TF database, Myxococcus spp.genomes encode an average of 57 Sigma factors, with DK1622 encoding 55, suggesting that a large proportion of the DK1622 ncRNAs might be transcribed from condition-dependent non-s70 promoters. 38The ncRNAs with the most variable 5-transcriptional start sites were all antisense to highly-expressed rRNA genes, suggesting that transcription initiation of these ncRNAs occurs non-specifically.
The location of terminators can be important as regulatory ncRNAs often exert effects on their target mRNAs in a mechanism mediated by the terminator-binding RNA chaperone Hfq. 2 However, M. xanthus does not have an Hfq homologue, or homologues of the other major RNA chaperones ProQ and CsrA.Nevertheless, the M. xanthus DK1622 genome does encode six homologues of CspA, which has RNA chaperone activity. 13edicted mRNA targets Prediction of the regulatory targets of antisense ncRNAs is trivial.However, currently available algorithms for the identification of the regulatory targets of trans-acting ncRNAs are prone to false positives. 10The performance of CopraRNA is enhanced compared to other algorithms as it looks for complementarity between ncRNAs and potential targets, and then factors in patterns of sequence conservation in both the ncRNA and its potential targets. 32praRNA outputs the most probable 200 candidate mRNA targets, and provides p-values and FDRs as a measure of confidence in the predicted targets.We again applied a very conservative set of criteria to CopraRNA output to minimise false-positive predictions.Nevertheless, some of the predictions will be false positives.For instance querying CopraRNA with Ffs, SsrA and SsrS generated predicted targets, despite these ncRNAs not having mRNA targets.
Encouragingly, expression of many of the CopraRNA predicted target mRNAs, correlated with those of their regulatory ncRNAs.The function of only one DK1622 regulatory ncRNA has been elucidated (Pxr), which prevents fruiting body formation when nutrients are present.Intriguingly, a high confidence predicted target mRNA for Pxr (actA), is known to be a positive regulator of fruiting, and the predicted ActA mRNA-Pxr interaction involves the region of Pxr shown to be required for its function. 23,39It is therefore tempting to speculate that Pxr expression under nutrient-rich conditions prevents ActA production, impeding fruiting, and that upon starvation, loss of Pxr allows ActA production to stimulate fruiting.
Several predicted ncRNA targets are likely to be involved in predation and/or the response to changes in nutrient availability, being differentially expressed during predation/starvation.Some have annotated functions which suggest plausible mechanisms by which they might play roles in nutrition/predation.For instance, predicted targets for Mxs010 include an M10 peptidase, those for Mxs013 and Mxs016 include sugar kinases/phosphatases, predicted targets of Mxs010 and Mxs016 include transporters, while Mxs017, Mxs022 and Mxs023 likely target metabolic processes, including pyruvate dehydrogenase.It would be interesting to determine whether different ncRNAs and targets are expressed if M. xanthus feeds upon different prey organisms, or whether the generalist nature of M. xanthus predation is mirrored by a common set of ncRNAs being expressed during predation, regardless of prey species.

Are the RNA-seq ncRNAs functional?
In Table 1 the RNA-seq ncRNA of DK1622 are presented.Their proposal is based on experimental observation of non-coding transcripts, but in most cases several other lines of evidence support the hypotheses that each ncRNA has a biological function, including sequence conservation, predicted terminator and promoter sequences, and correlated expression of putative target mRNAs.For some ncRNAs, there is little supporting evidence that the ncRNA has a biological function, beyond the experimental observation of its existence (e.g.Mxs021).In other cases we can be very confident that the ncRNA has a role in some yet undetermined biological process (e.g., Mxs008 and Mxs017).
Other ncRNAs give cause to doubt their biological role, despite having some supporting evidence.For instance, eight ncRNAs are antisense to the 16S and 23S rRNA genes which are found in four gene clusters in the DK1622 genome.In two cases (Mxs031 and Mxs036) an ncRNA spans the entire gene cluster.ncRNAs antisense to the 16S rRNA gene (Mxs005/Mxs006 and Mxs024/Mxs025).All eight ncRNAs show sequence conservation across the myxobacteria, but of course they would do, as they are complementary to highly conserved rRNA genes.The ncRNAs have no identifiable promoters (despite being expressed under vegetative conditions and thus likely to be s70-dependent), but have highly variable start sites, suggesting non-specific initiation of their transcription.
It has been proposed that antisense rRNA transcripts might have biological functions, however, it is also clear that antisense transcription is a pervasive by-product of transcriptional activity. 41,42Are the observed anti-rRNA ncRNAs a nonfunctional by-product of the particularly copious transcription of their antisense rRNA genes?If this were the case, it might explain why expression of the anti-rRNA ncRNAs correlates positively rather than negatively with the expression of their antisense rRNA genes.This might also be the case for ncRNAs encoded antisense to high-expressed genes (e.g.Mxs019 and Mxs033, which are antisense to Ffs and RnpB respectively).

The DK1622 ncRNAome
Here we provide evidence for the existence of 37 ncRNAs in M. xanthus DK1622.Given the likely importance of ncRNAs in regulating diverse aspects of myxobacterial biology, this characterisation provides an evidence-based framework for further investigations, including molecular genetics experiments as well as genome-wide association studies. 43,44 This journal is © The Royal Society of Chemistry 2020 Mol.Omics, 2020, 16, 492--502 | 495 ncRNA complement.A final screen removed likely leader sequences and 5 0 -UTRs of mRNAs, and the leader and spacer regions of polycistronic rRNA operons, leaving a conservative set of 37 ncRNAs.Each of the 37 ncRNAs were found in at least two of the four experimental conditions.ncRNA were assigned a unique identifier (Mxs001 to Mxs037 for 'Myxococcus xanthus sRNA') according to their ordering in the DK1622 genome.Six previously identified ncRNAs were found amongst the set of 37: MsDNA = Mxs003, Pxr = Mxs011, Ffs = Mxs018, SsrA = Mxs020, SsrS = Mxs027, and RnpB = Mxs032.Illustrative examples of newly-identified ncRNAs are provided in Fig. 1.

Fig. 1
Fig. 1 Mapped transcripts for two example ncRNAs.ncRNAs are shown as black arrows, and surrounding (protein-coding) genes as white arrows, pointing in the direction of transcription.Grey areas above each arrow show the sequencing coverage for two replicates of the experimental condition STARVED.

3
tRNAs, and15 CDSs, of which six encoded hypothetical proteins.For target CDSs with annotated functions, there was no clear commonality of function, but IF-3, Hsp20, and a helicase might have a common role in protein synthesis.CopraRNA was used to predict mRNA targets for intergenic ncRNAs and the intergenic portions of mixed ncRNAs.Output from CopraRNA is provided as a.zip file (Supplemental File 5, ESI †), including mRNA region plots, ncRNA region plots and lists of the top 200 predicted targets.Lists of CopraRNA predicted targets were then filtered to retain only those predictions with p values o0.00001 and false discovery rates of o0.1.

Fig. 2
Fig. 2 Circos diagram of ncRNAs mapped onto the 9.14 Mbp M. xanthus DK1622 genome (outer ring).The red, green and blue rings indicate genes for regulatory ncRNAs, rRNAs/tRNAs, and CRISPRs, respectively.For each coloured ring, the outer leaflet indicates genes on the forward (+) strand while the inner leaflet shows those on the reverse (À) strand.A-D highlight the four rRNA gene clusters.
For the other gene clusters there is an ncRNA antisense to the 23S rRNA gene (Mxs007 and Mxs026) and two tandem pairs of This journal is © The Royal Society of Chemistry 2020 Mol.Omics, 2020, 16, 492--502 | 501