DG-CST

From Wiki CEINGE

(Difference between revisions)
Jump to: navigation, search
Revision as of 18:07, 8 June 2007 (edit)
Angelo (Talk | contribs)

← Previous diff
Revision as of 15:32, 20 June 2007 (edit) (undo)
Mauro (Talk | contribs)

Next diff →
Line 1: Line 1:
-The DG-CST database is a collection of conserved sequence elements, identified by a systematic genomic sequence comparison between a set of human genes involved in the pathogenesis of genetic disorders and their murine counterparts. Human and Mouse genomic sequences were compared by BLASTZ, an independent implementation of the Gapped BLAST algorithm, specifically designed for aligning two long genomic sequences [http://bio.cse.psu.edu]. Sequences longer than 100 and with identity better than 70% were selected as CSTs and imported into the DB. CSTs are extensively annotated with respect to exon/intron structure and other biological parameters.+The DG-CST database is a collection of conserved sequence elements, identified by a systematic genomic sequence comparison between a set of human more than 1000 genes involved in the pathogenesis of genetic disorders and their murine counterparts. Human and Mouse genomic sequences were compared by BLASTZ, an independent implementation of the Gapped BLAST algorithm, specifically designed for aligning two long genomic sequences. Sequences longer than 100 and with identity better than 70% were selected as CSTs and imported into the DB. CST counterparts in other species were identified by using BLAST to scan genomes from other species, and selecting on the basis of homology and colinearity.
-CST counterparts in other species were identified by using BLAST to scan genomes from other species, and selecting on the basis of homology and colinearity.+ 
 +===Annotation of CSTs===
 +In the database, CSTs are collected with a large number of annotations, including:
 +*genomic location, i.e. chromosome, position, relationship with the closest gene and with the selected disease gene (often coincident);
 +*sequence content, i.e. sequence, length, GC percentage;
 +*identity between human and mouse sequences, number of gaps, polarity;
 +*BLAST matches with other CSTs, as well as with other human genomic sequences;
 +*BLAST matches versus non-redundant nucleotide databases;
 +*conservation in other species, as assessed by BLAST analysis versus the drafts of fugu chicken, rat and zebrafish genome sequences;
 +*classification of CSTs in ‘intronic’, ‘intergenic’, ‘exonic’ based on Ensembl gene annotations;
 +*presence of single nucleotide polymorphisms (SNPs), as reported in Ensembl;
 +*presence of palindromes, tandem repeats, putative RNA
 +*secondary structures prediciton;
 +*presence of putative transcription factor (TF) binding sites;
 +*presence of palindromes and tandem repeats
 + 
 +Different tests were performed on CSTs, looking for those potentially representing transcribed/coding elements, such as:
 +*determination of maximum ORF size;
 +*presence of putative splice sites, exonic splicing enhancers;
 +*exon predictions based on GENSCAN;
 +*BLAST matches with expressed sequence tags (ESTs) and non-redundant protein databases;
 + 
 + 
 + 
 + 
 + 
 + 
 +===References===
 + 
 + 
 + 
 +BLASTZ is available at http://bio.cse.psu.edu.

Revision as of 15:32, 20 June 2007

The DG-CST database is a collection of conserved sequence elements, identified by a systematic genomic sequence comparison between a set of human more than 1000 genes involved in the pathogenesis of genetic disorders and their murine counterparts. Human and Mouse genomic sequences were compared by BLASTZ, an independent implementation of the Gapped BLAST algorithm, specifically designed for aligning two long genomic sequences. Sequences longer than 100 and with identity better than 70% were selected as CSTs and imported into the DB. CST counterparts in other species were identified by using BLAST to scan genomes from other species, and selecting on the basis of homology and colinearity.

Annotation of CSTs

In the database, CSTs are collected with a large number of annotations, including:

  • genomic location, i.e. chromosome, position, relationship with the closest gene and with the selected disease gene (often coincident);
  • sequence content, i.e. sequence, length, GC percentage;
  • identity between human and mouse sequences, number of gaps, polarity;
  • BLAST matches with other CSTs, as well as with other human genomic sequences;
  • BLAST matches versus non-redundant nucleotide databases;
  • conservation in other species, as assessed by BLAST analysis versus the drafts of fugu chicken, rat and zebrafish genome sequences;
  • classification of CSTs in ‘intronic’, ‘intergenic’, ‘exonic’ based on Ensembl gene annotations;
  • presence of single nucleotide polymorphisms (SNPs), as reported in Ensembl;
  • presence of palindromes, tandem repeats, putative RNA
  • secondary structures prediciton;
  • presence of putative transcription factor (TF) binding sites;
  • presence of palindromes and tandem repeats

Different tests were performed on CSTs, looking for those potentially representing transcribed/coding elements, such as:

  • determination of maximum ORF size;
  • presence of putative splice sites, exonic splicing enhancers;
  • exon predictions based on GENSCAN;
  • BLAST matches with expressed sequence tags (ESTs) and non-redundant protein databases;




References

BLASTZ is available at http://bio.cse.psu.edu.

Personal tools