DG-CST

From Wiki CEINGE

Revision as of 16:54, 20 June 2007 by Mauro (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

The DG-CST (Disease Gene Conserved Sequence Tags) database is a collection of conserved sequence elements, identified by a systematic genomic sequence comparison between a set of human more than 1000 genes involved in the pathogenesis of genetic disorders and their murine counterparts. About 8% of analyzed genomic areas is conserved, surpisingly mostly located in introns or intergenic regions. Extensive annotations have been also performed on each CST, aimed to identify any possible biological role which may explain such conservation (see below).

[edit] How to reach and use DG-CST

The database is available at http://dgcst.ceinge.unina.it, where it is possible to query and select CSTs respect to both sequence and annotations. A graphic browser allows direct visualization of the CSTs and related annotations within the context of the relative gene and its transcripts.

DG-CST Graphic browser
DG-CST Graphic browser

[edit] Available annotations of CSTs

Human and Mouse genomic sequences were compared by BLASTZ, an independent implementation of the Gapped BLAST algorithm, specifically designed for aligning two long genomic sequences. Sequences longer than 100 and with identity better than 70% were selected as CSTs and imported into the DB. CST counterparts in other species were identified by using BLAST to scan genomes from other species, and selecting on the basis of homology and colinearity. In the database, CSTs are collected together with a large number of annotations, including:

  • genomic location, i.e. chromosome, position, relationship with the closest gene and with the selected disease gene (often coincident);
  • sequence content, i.e. sequence, length, GC percentage;
  • identity between human and mouse sequences, number of gaps, polarity;
  • BLAST matches with other CSTs, as well as with other human genomic sequences;
  • BLAST matches versus non-redundant nucleotide databases;
  • conservation in other species, as assessed by BLAST analysis versus the drafts of fugu chicken, rat and zebrafish genome sequences;
  • classification of CSTs in ‘intronic’, ‘intergenic’, ‘exonic’ based on Ensembl gene annotations;
  • presence of single nucleotide polymorphisms (SNPs), as reported in Ensembl;
  • presence of palindromes, tandem repeats, putative RNA
  • secondary structures prediciton;
  • presence of putative transcription factor (TF) binding sites;
  • presence of palindromes and tandem repeats

In order to identify CSTs potentially representing novel transcribed/coding elements, different tests were performed, such as:

  • determination of maximum ORF size;
  • presence of putative splice sites, exonic splicing enhancers;
  • exon predictions based on GENSCAN;
  • BLAST matches with expressed sequence tags (ESTs) and non-redundant protein databases;


[edit] References

Boccia A, Petrillo M, di Bernardo D, Guffanti A, Mignone F, Confalonieri S, Luzi L, Pesole G, Paolella G, Ballabio A, Banfi S. DG-CST (Disease Gene Conserved Sequence Tags), a database of human-mouse conserved elements associated to disease genes. Nucleic Acids Res. 2005, 33(Database issue):D505-10.

BLASTZ is available at http://bio.cse.psu.edu.



Bioinformatics: Research DB: DG-CST - KINWEB - ONCOEMAT - EYEDIS - CFTR
Personal tools