Bacterial RNAs

From Wiki CEINGE

Revision as of 13:45, 18 June 2007 by Luca (Talk | contribs)
Jump to: navigation, search

Most of bacterial genomes is involved in protein coding, but a number of sequences, mostly located within the intergenic regions, have been shown to play a role in the control of gene expression both at DNA and RNA level. These sequences often are able to fold as a stem-loop based structures (SLS) and this feature is indispensable to their biological functions. We performed a first systematic analysis of the distribution of SLSs in 40 wholly-sequenced bacterial genomes and demonstrated that SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and base composition. We also detect an enrichment of specific, non random, SLS sub-populations of higher stability within the intergenic regions of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all evaluated species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. A second analysis revealed that 29 out of 40 analyzed genomes have a number of SLSs which can be grouped by sequence similarities. Such SLSs corresponding to about 1% of the whole population and have a substantially higher aptitude to fold into a stable secondary structure than the initial set.

SLSs selected in this way Regrouping of the selected sequences by sequence similarity, strand reciprocity and genomic location allowed to remove redundancies. HMM analysis was used to define a final set of 92 families. 25 of them include all well-known SLS containing repeats and some families reported in literature, but not analyzed in detail. The remaining 67 families have not been previously described. Two thirds of the families share a common predicted secondary structure and are located within intergenic regions.

Conclusions Systematic analysis of 40 bacterial genomes revealed a large number of repeated sequence families, including known and novel ones. Their predicted structure and genomic location suggest that even in compact bacterial genomes, a relatively large fraction of the genome consists of non-protein-coding sequences, possibly functioning at RNA level.


References

Personal tools