Bacterial RNAs
From Wiki CEINGE
Revision as of 17:24, 20 June 2007 (edit) Giovanni (Talk | contribs) ← Previous diff |
Revision as of 18:02, 20 June 2007 (edit) (undo) Luca (Talk | contribs) Next diff → |
||
Line 1: | Line 1: | ||
- | Bacterial genomes are generally compact and most of their sequence is involved in protein coding, but a growing number of sequences, mostly located within the intergenic regions, have been shown to play a role in the control of gene expression. Many of these sequences are active as RNA and often contain simple stem-loop structures (SLS), essential to their functionality | + | [[Image:Bacterial_pae.jpg|300px|right|thumb|PAE-1 bacterial family secondary structure]] |
- | + | Bacterial genomes are generally compact and most of their sequence is involved in protein coding, but a growing number of sequences, mostly located within the intergenic regions, have been shown to play a role in the control of gene expression. Many of these sequences are active as RNA and often contain simple stem-loop structures (SLS), essential to their functionality. Moreover, families of repeated sequences, sharing a common SLS, have been described in many bacterial genomes even if only in few cases a clear biological function was assessed. In order to quantify this phenomenon we performed a systematic analysis of the distribution of SLSs in 40 wholly-sequenced bacterial genomes which are representative of the whole bacterial world. In this way we demonstrated that SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and base composition. | |
- | + | a large collection of families of repeated stem-loop containing sequences has been identified by clustering stemloop structures according to sequence similarity in the analyzed species. Secondary structure analysis reveals the presence of a large number of sequences where a conserved secondary structure may be demonstrated within the family. | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
We performed a first systematic analysis of the distribution of SLSs in 40 wholly-sequenced bacterial genomes and demonstrated that SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and base composition. We also detect an enrichment of specific, non random, SLS sub-populations of higher stability within the intergenic regions of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all evaluated species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. | We performed a first systematic analysis of the distribution of SLSs in 40 wholly-sequenced bacterial genomes and demonstrated that SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and base composition. We also detect an enrichment of specific, non random, SLS sub-populations of higher stability within the intergenic regions of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all evaluated species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. |
Revision as of 18:02, 20 June 2007
Bacterial genomes are generally compact and most of their sequence is involved in protein coding, but a growing number of sequences, mostly located within the intergenic regions, have been shown to play a role in the control of gene expression. Many of these sequences are active as RNA and often contain simple stem-loop structures (SLS), essential to their functionality. Moreover, families of repeated sequences, sharing a common SLS, have been described in many bacterial genomes even if only in few cases a clear biological function was assessed. In order to quantify this phenomenon we performed a systematic analysis of the distribution of SLSs in 40 wholly-sequenced bacterial genomes which are representative of the whole bacterial world. In this way we demonstrated that SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and base composition. a large collection of families of repeated stem-loop containing sequences has been identified by clustering stemloop structures according to sequence similarity in the analyzed species. Secondary structure analysis reveals the presence of a large number of sequences where a conserved secondary structure may be demonstrated within the family.
We performed a first systematic analysis of the distribution of SLSs in 40 wholly-sequenced bacterial genomes and demonstrated that SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and base composition. We also detect an enrichment of specific, non random, SLS sub-populations of higher stability within the intergenic regions of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all evaluated species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs.
A second analysis based on clustering procedures revealed that 29 out of 40 analyzed genomes have SLSs that can be grouped by sequence similarities. Such SLSs corresponding to about 1% of the whole population and have a substantially higher aptitude to fold into a stable secondary structure than the initial set. Further refinements led to identify 92 families of repeated sequence, mostly sharing a common SLS. 25 of them include all well-known SLS containing repeats and some families reported in literature, but not analyzed in detail. The remaining 67 families have not been previously described. Two thirds of the families share a common predicted secondary structure and are located within intergenic regions.
References
- PETRILLO M., SILVESTRO G., DI NOCERA PP., BOCCIA A. and PAOLELLA G. Stem-loop structures in prokaryotic genomes (2006) BMC GENOMICS 2006, 7:170
- COZZUTO L., PETRILLO M., SILVESTRO G., DI NOCERA PP. and PAOLELLA G. Systematic identification of stem-loop containing sequence families in bacterial genomes SUBMITTED.