SRS databases

From Wiki CEINGE

(Difference between revisions)
Jump to: navigation, search
Revision as of 20:40, 12 June 2007 (edit)
Mauro (Talk | contribs)
(SRS databases)
← Previous diff
Revision as of 22:58, 14 June 2007 (edit) (undo)
Mauro (Talk | contribs)
(SRS databases)
Next diff →
Line 1: Line 1:
-==SRS databases== 
-===SRS databases at CEINGE=== 
At CEINGE, [[SRS]] is used as the main web interface for accessing to biological databases. At CEINGE, [[SRS]] is used as the main web interface for accessing to biological databases.
An automatic procedure specifically developed takes care to mantain the databases up to date: every night public servers are checked for the presence of new releases: in case of new releases, the new file data are downloaded and automatically indexed.<br> An automatic procedure specifically developed takes care to mantain the databases up to date: every night public servers are checked for the presence of new releases: in case of new releases, the new file data are downloaded and automatically indexed.<br>
Line 8: Line 6:
Practically all the most used public databases are available, such as: Practically all the most used public databases are available, such as:
*'''DNA databases''': *'''DNA databases''':
-**EMBL, divided in its sections;+**''EMBL'': The EMBL nucleotide sequence database including updates
-**REFESEQN, including daily updates;+**''REFSEQ'': Database providing non-redundant curated data representing knowledge of known genes
-**FANTOM1;+**''FANTOMn'': Database of mouse transcriptome
 +**''UTRnr'': 5-end and 3-end Untranslated Regions Database
 +**''IMGT'': ImMunoGeneTics database. A database containing nucleotide sequences of immune system-related genes
 +**''EMBLWGS'': The EMBL nucleotide sequence database - whole genome shotgun sequences
 +**''REFSEQNEW'': Database providing non-redundant curated data representing knowledge of known genes RefSeq Updates
*'''PROTEIN databases''': *'''PROTEIN databases''':
-**UNIPROT, as the sections UNIPROT-SWISSPROT and UNIPROT-TREMBL;+**''REFSEQP'': Database of protein information from NCBI
-**REFSEQP, including daily updates;+**''UNIPROT'': The UniProt Knowledgebase is the central database of protein sequences with accurate, consistent, **''REMTREMBL'': REM-TrEMBL (REMaining TrEMBL) contains translations of EMBL nucleotide sequences that will not be included in TrEMBL
-**IPI;+**''UNIREF100'': Non redundant sequence database which combines identical sequences and sub-fragments from the same organism into a single UniRef entry
 +**''UNIREF90'': A non-redundant sequence set, based on uniref100 with each sequence representing a cluster of sequence with at least 90% sequence identity
 +**''UNIREF50'': A non-redundant sequence set, based on uniref100 with each sequence representing a cluster of sequences with at least 50% sequence identity
 +**''FANTOMp'': Database of translations of mouse transcriptome
 +**''IMGTHLA'': The IMGT/HLA Database is part of the international ImMunoGeneTics IMGT project
 +**''IPI'': International Protein Index - a top level guide to main proteome databases
 +**''REFSEQPNEW'': Database of protein information from REFSEQ RefSeq Protein Updates
 +**''UNIPROT_SWISSPROT'': The UniProt Knowledgebase is the central database of protein sequences with accurate, consistent, and rich sequence and functional annotation. UniProt/Swissprot contains manually-annotated records with information extracted from literature and curator-evaluated computational analysis
 +**''UNIPROT_TREMBL'': The UniProt Knowledgebase is the central database of protein sequences with accurate, consistent, and rich sequence and functional annotation. UniProt/Trembl consists of computationally analyzed records that await full manual annotation
 + 
*'''GENE-related databases''': *'''GENE-related databases''':
-**UNIGENE and UNISEQ;+**''ENTREZGENE'': NCBI's database for gene-specific information.
-**ENTREZGENE;+**''EPD'': Eukariotic Promoter Database - Philipp Bucher (1996)
-**HSAGENS;+**''UNIGENE'': Unique gene cluster db from the NCBI
 +**''UNISEQ'': Sub-component of the UniGene db. Contains the sequence information from UniGene.
 +**''UTRSITE'': Sub-component of the UTRnr
 +**''HGBASE'': Human Genic Bi-Allelic Sequences Database
 +**''RHDB'': The RHDB Radiation Hybrid Mapping Submissions database
 +**''RHEXP'': The RHDB Radiation Hybrid Mapping Experimental Conditions database
 +**''RHMAP'': The RHDB Radiation Hybrid Map Information database
 +**''RHPANEL'': The RHPANEL RH Mapping panels database
 + 
*'''PROTEIN-related databases''': *'''PROTEIN-related databases''':
-**INTERPRO;+**''INTERPRO'': Integrated Resource of Protein Domains and Functional Sites
-**PRODOM;+**''IPRMATCHES'': All hits to Swiss-Prot and TrEMBL entries in which the signatures are found by INTERPRO
-**PFAM as PFAMA, PFAMB, SWISSPFAM, PFAMHMM;+**''PROSITE'': A Dictionary of Protein Sites and Patterns - A. Bairoch
-**BLOCKS;+**''BLOCKS'': The Blocks database of multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins.
-**PROSITE and PROSITEDOC;+**''PRINTS'': Protein Motif Fingerprint Database
-**PRINTS;+**''PFAMA'': The A division (human curated) division of the Pfam database. Alignments of protein domains and conserved regions.
-**UNIREF;+**''PFAMB'': The B division (automatically clustered) division of the Pfam database. Alignments of protein domains and conserved regions
 +**''SWISSPFAM'': An annotated description of how Pfam domains map to (possibly multidomain) SwissProt entries.
 +**''PFAMHMM'': PfamHmm database. Database of the Hidden Markov Models (HMMs) derived from the seed alignment in Pfam.
 +**''PFAMSEED'': PfamSeed database. Seed alignments (hand edited) representing each domain
 +**''PRODOM'': A comprehensive collection of protein domain families
 + 
*'''ONTOLOGIES databases''': *'''ONTOLOGIES databases''':
-**GO;+**''GOA'': Gene Ontology Annotation of UniProtKb
-**GOA;+**''GO'': GO - Geneontology Database
*'''3D structures databases''': *'''3D structures databases''':
-**PDB and PDBFINDER;+**''NRL3D'': PIR-NRL3D Sequence-Structure Database.
 +**''PDB'': Protein Data Bank (PDB) - repository for the processing and distribution of 3-D biological macromolecular structure data
 +**''PDBFINDER'': Directory for the Brookhaven Protein Data Bank. Constructed from the PDB, DSSP and HSSP databases
*'''Methabolic pathway databases''': *'''Methabolic pathway databases''':
-**PATHWAY;+**''PATHWAY'': Kyoto Encyclopedia of Genes and Genomes (KEGG)
-**ENZYME;+**''LENZYME'': Ligand Chemical Database for Enzyme Reactions
 +**''LCOMPOUND'': Ligand Chemical Database for Enzyme Reactions
 +**''ENZYME'': Database of enzyme nomenclature
*'''Reference databases''': *'''Reference databases''':
-**TAXONOMY;+**''TAXONOMY'': Contains names of all organisms represented in sequence databases by at least one nucleotide or protein sequence
-**GENETICCODE;+**''GENETICCODE'': NCBI database of genetic codes
-**OMIM;+**''OMIM'': Online Mendelian Inheritance in Man database.
-**REBASE;+**''REBASE'': Restriction Enzyme database
-<br>+
-and may others.+
- +
-You can browse and query all the available databases on [http://bioinfo.ceinge.unina.it/srs7131/ SRS@CEINGE], where the ''Information'' section lists all.+
 +You can browse and query all the available databases on [http://bioinfo.ceinge.unina.it/srs7131/ SRS@CEINGE], where the ''Information'' section lists all.
==VECCHIO== ==VECCHIO==

Revision as of 22:58, 14 June 2007

At CEINGE, SRS is used as the main web interface for accessing to biological databases. An automatic procedure specifically developed takes care to mantain the databases up to date: every night public servers are checked for the presence of new releases: in case of new releases, the new file data are downloaded and automatically indexed.
Through SRS, more than 60 public databases are available at CEINGE, stored as flat-files on a dedicated file server, for a total of about 1 Tbyte (1000 Gigabytes) of HD space.

Available databases

Practically all the most used public databases are available, such as:

  • DNA databases:
    • EMBL: The EMBL nucleotide sequence database including updates
    • REFSEQ: Database providing non-redundant curated data representing knowledge of known genes
    • FANTOMn: Database of mouse transcriptome
    • UTRnr: 5-end and 3-end Untranslated Regions Database
    • IMGT: ImMunoGeneTics database. A database containing nucleotide sequences of immune system-related genes
    • EMBLWGS: The EMBL nucleotide sequence database - whole genome shotgun sequences
    • REFSEQNEW: Database providing non-redundant curated data representing knowledge of known genes RefSeq Updates
  • PROTEIN databases:
    • REFSEQP: Database of protein information from NCBI
    • UNIPROT: The UniProt Knowledgebase is the central database of protein sequences with accurate, consistent, **REMTREMBL: REM-TrEMBL (REMaining TrEMBL) contains translations of EMBL nucleotide sequences that will not be included in TrEMBL
    • UNIREF100: Non redundant sequence database which combines identical sequences and sub-fragments from the same organism into a single UniRef entry
    • UNIREF90: A non-redundant sequence set, based on uniref100 with each sequence representing a cluster of sequence with at least 90% sequence identity
    • UNIREF50: A non-redundant sequence set, based on uniref100 with each sequence representing a cluster of sequences with at least 50% sequence identity
    • FANTOMp: Database of translations of mouse transcriptome
    • IMGTHLA: The IMGT/HLA Database is part of the international ImMunoGeneTics IMGT project
    • IPI: International Protein Index - a top level guide to main proteome databases
    • REFSEQPNEW: Database of protein information from REFSEQ RefSeq Protein Updates
    • UNIPROT_SWISSPROT: The UniProt Knowledgebase is the central database of protein sequences with accurate, consistent, and rich sequence and functional annotation. UniProt/Swissprot contains manually-annotated records with information extracted from literature and curator-evaluated computational analysis
    • UNIPROT_TREMBL: The UniProt Knowledgebase is the central database of protein sequences with accurate, consistent, and rich sequence and functional annotation. UniProt/Trembl consists of computationally analyzed records that await full manual annotation
  • GENE-related databases:
    • ENTREZGENE: NCBI's database for gene-specific information.
    • EPD: Eukariotic Promoter Database - Philipp Bucher (1996)
    • UNIGENE: Unique gene cluster db from the NCBI
    • UNISEQ: Sub-component of the UniGene db. Contains the sequence information from UniGene.
    • UTRSITE: Sub-component of the UTRnr
    • HGBASE: Human Genic Bi-Allelic Sequences Database
    • RHDB: The RHDB Radiation Hybrid Mapping Submissions database
    • RHEXP: The RHDB Radiation Hybrid Mapping Experimental Conditions database
    • RHMAP: The RHDB Radiation Hybrid Map Information database
    • RHPANEL: The RHPANEL RH Mapping panels database
  • PROTEIN-related databases:
    • INTERPRO: Integrated Resource of Protein Domains and Functional Sites
    • IPRMATCHES: All hits to Swiss-Prot and TrEMBL entries in which the signatures are found by INTERPRO
    • PROSITE: A Dictionary of Protein Sites and Patterns - A. Bairoch
    • BLOCKS: The Blocks database of multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins.
    • PRINTS: Protein Motif Fingerprint Database
    • PFAMA: The A division (human curated) division of the Pfam database. Alignments of protein domains and conserved regions.
    • PFAMB: The B division (automatically clustered) division of the Pfam database. Alignments of protein domains and conserved regions
    • SWISSPFAM: An annotated description of how Pfam domains map to (possibly multidomain) SwissProt entries.
    • PFAMHMM: PfamHmm database. Database of the Hidden Markov Models (HMMs) derived from the seed alignment in Pfam.
    • PFAMSEED: PfamSeed database. Seed alignments (hand edited) representing each domain
    • PRODOM: A comprehensive collection of protein domain families
  • ONTOLOGIES databases:
    • GOA: Gene Ontology Annotation of UniProtKb
    • GO: GO - Geneontology Database
  • 3D structures databases:
    • NRL3D: PIR-NRL3D Sequence-Structure Database.
    • PDB: Protein Data Bank (PDB) - repository for the processing and distribution of 3-D biological macromolecular structure data
    • PDBFINDER: Directory for the Brookhaven Protein Data Bank. Constructed from the PDB, DSSP and HSSP databases
  • Methabolic pathway databases:
    • PATHWAY: Kyoto Encyclopedia of Genes and Genomes (KEGG)
    • LENZYME: Ligand Chemical Database for Enzyme Reactions
    • LCOMPOUND: Ligand Chemical Database for Enzyme Reactions
    • ENZYME: Database of enzyme nomenclature
  • Reference databases:
    • TAXONOMY: Contains names of all organisms represented in sequence databases by at least one nucleotide or protein sequence
    • GENETICCODE: NCBI database of genetic codes
    • OMIM: Online Mendelian Inheritance in Man database.
    • REBASE: Restriction Enzyme database


You can browse and query all the available databases on SRS@CEINGE, where the Information section lists all.

VECCHIO

BLOCKS 
Protein Domains
CD40LBASE 
CD40 mutations
EMBL 
DNA Sequence Database
EMBLCONTIGS 
Contigs only
EMBLTPA 
Third part annotation sequences
EMBLWGS 
WGS DNA sequences
ENSEMBL 
Human chromosomes
ENSEMBL_HUM_CDNA 
Human cDNA
ENSEMBL_HUM_PEP 
Human proteins
ENTREZGENE
Genes Classification database
ENZYME 
Enzyme Classification
EPD 
Eukaryotic Promoter
FANTOM 
Mouse cDNA
G6PD 
G6PD mutations
GENETICCODE 
Geneic Codes
GO 
Ontologies
GOA 
Ontologies
HGBASE 
SNP database
HSAGENES 
Human gene classification
IMGT 
Immunoglobulins only sequences
IMGTHLA 
HLA only sequences
INTERPRO 
Protein Domains
IPI 
Protein Sequence Database
LCOMPOUND
Methabolic Pathways
LDLR 
LDLR mutations
LENZYME
Methabolic Pathways
LOCUSLINK 
Loci
NRL3D 
Proteins from PDB
OMIM 
Mendelian diseases
OMIMALLELE 
Mendelian diseases alleles
P53 
P53 mutations
PATHWAY
Methabolic Pathways
PDB 
Protein structure database
PDBFINDER 
Protein structure database
PFAMA 
Protein families
PFAMB 
Protein families
PFAMHMM 
Protein families
PFAMSEED 
Protein families
PIR 
Protein Sequence Database
PRINTS 
Protein Domains
PRODOM 
Protein families
PROSITE 
Protein Domains
PROSITEDOC 
Protein Domains
REBASE 
Restriction Enzymes daabase
REMTREMBL 
Protein Sequence Database
RHDB 
Radiation Hybrid Maps
RHEXP 
Radiation Hybrid Maps
RHMAP 
Radiation Hybrid Maps
RHPANEL 
Radiation Hybrid Maps
SPTREMBL 
Protein Sequence Database
SWALL 
Protein Sequence Database
SWISSPFAM 
Protein families
SWISSPROT 
Protein Sequence Database
TAXONOMY 
Organism Taxonomy
TFCELL 
Transcriptional Factor
TFCLASS 
Transcriptional Factor
TFFACTOR 
Transcriptional Factor
TFGENE 
Transcriptional Factor
TFMATRIX 
Transcriptional Factor
TFSITE 
Transcriptional Factor
TREMBL 
Translation of EMBL coding
TREMBLNEW 
Protein Sequence Database
UNIGENE 
Gene database
UNIPROT
Protein Sequence Database
UNIREF
Protein Sequence Database
UNISEQ 
DNA Sequence Database
Personal tools