Eukaryotic CSTs

From Wiki CEINGE

Jump to: navigation, search

Comparative genomics offers the opportunity to identify sequences characterized by strong conservation between different species, and with a putative functional role. This mainly reflects the different selective pressure on sequences carrying functional elements compared to more 'neutral' sequences, which are allowed a higher number of random mutation events.

Along this line a large number of 'conserved sequence tags' (CSTs), characterized by a variable degree of conservation in other species, were identified in the human genome, and collected in two databases (DG-CST, originally produced in a collaborative effort with a small group of italian research institutes and currently maintained at CEINGE, and KINWEB, in collaboration with ITB, Milano).

Apart from coding sequences, the identified CSTs tend to include a relatively large number of functional elements, such as structural and regulatory non coding RNAs and DNA sequence elements involved in the control of gene expression. In fact, a large number of CSTs from intronic and intergenic areas, has no feature typical of coding sequences, and their functional roles should therefore be searched out of the context of protein coding: transcriptional control, scaffold attachment, splicing control or RNA stability are a few candidate areas worth exploring in these cases.

The identified CSTs are currently being analyzed in a number of projects aimed to their functional evaluation by further computational characterization and, in collaboration with other groups, to evaluate their relevance for transcriptional regulation and their role in genetically transmitted diseases.

For example, CSTs can provide very valuable information in selecting transcriptional regulatory elements. Sequence alone is not usually believed to be sufficient to identify true binding sites for transcription factors, which are typically very small, and degenerated. However, transcription 'enhancers' and 'silencers' are often organized as clusters of binding sites for transcriptional regulatory moleculs. As a consequence, as shown by other groups, variations in the local density of binding sites are useful in many cases to distinguish between specific and random findings. In our approach we identify transcriptionally active modules searching (within CSTs) for clusters recognized by more than one factor, followed by experimental validation, in close collaboration with other groups.

The study of CSTs within the context of genes related to genetically transmitted diseases, may also result in the identification of novel elements involved in the molecular mechanisms underlying the disease. For example, the molecular diagnosis of many monogenic diseases is based on the identification of mutations affecting the coding sequence in both alleles. In some cases it is not identified a mutation in the exonic sequence. This sub-group of patients is a good candidate for CST analysis.

Sequences collected in DG-CST and KINWEB databases are extensively annotated for features concerning chromosomal localization, nucleotide composition, degree of conservation in other species, potential to be transcribed, presence of putative transcription factor binding sites and so on. These annotations highlight sequence features that, although weak as a signal, may be associated to the functional role. Taken together these data contain information that may be useful, if adeguately recognized, to identify patterns common to groups of sequence elements. Annotations can be fed to clustering algorithms, in order to search for similarities among sequence annotations, able to reveal homogenous groups.

[edit] References

  • BOCCIA A., PETRILLO M, DI BERNARDO D., GUFFANTI A., MIGNONE F., CONFALONIERI S., LUZI L, PESOLE G., PAOLELLA G., BALLABIO A., BANFI S. (2005). DG-CST (Disease Gene Conserved Sequence Tags), a database of human-mouse conserved elements associated to disease genes. NUCLEIC ACIDS RESEARCH. vol. 33 pp. D505-D510 ISSN: 0305-1048
  • MILANESI L., PETRILLO M., SEPE L., BOCCIA A., D'AGOSTINO N., PASSAMANO M., DINARDO S., CASADIO R. and PAOLELLA G. (2005). Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity. BMC BIOINFORMATICS. 6(Suppl 4):S20 ISSN: 1471-2105.

Bioinformatics: Research Activity: Eukaryotic CSTs - Bacterial RNAs - Image processing
Personal tools