6. 2,60,000 named species
1700 species are being added per month
12 % are human origin & 8% are human EST
The top species
Homo sapiens ------- 12.7 billion
Mus musculus ------- 8.3 billion
Rattus norvegicus ------- 5.8 billion
Bos taurus ------- 3.8 billion
Zea mays ------- 3.6 billion
9. WGS Accession numbers are issued to these sequences
eg., AAAA01072744
AAAA Project ID
01 Version number
072744 Contig number
TPA Third party annotation
1. Experimental
2. Inferential
11. BankIt
Use BankIt if:
you have one or a few sequence submissions
you prefer to use a WWW-based submission tool
your sequence annotation is not complicated
you do not require sequence analysis tools to submit your sequence(s)
12. Sequin
Use Sequin if:
you are submitting long or complex submissions
you are submitting mutation, phylogenetic, population,
environmental, or segmented sets
you would like graphical viewing and editing options,
including the alignment editor
you would like network access to related analytical tools
21. STACK (Sequence Tag Alignment and Consensus Knowledgebase)
Ribosomal database
HIV sequence database
EPD (Eukaryotic Promoter Database)
REBASE
22. SwissProt
Curated protein sequence database
High level of annotation
Description of the function
Domains structure
PTMs
Variants
TrEMBL
Consists of entries in SWISS-PROT-like format
23. PIR-PSD
Protein Information resource- Protein Sequence Database
World’s first database of classified and funtionally annotated
protein sequences
Grew out of The Atlas of Protein Sequence and Structure
26. Consensus Sequence Databases
Multiple Alignment
↓
A single sequence in which each residue is the most common or
consensus for the sequence family
↓
Consensus Sequence Database
27. Consensus Sequence Databases
Disadvantage:
Much information from the sequences that do not
contain the consensus residue is ignored, even though these
hold information about allowed substitutions.
28. PROSITE
Database of sequence patterns
Associated with protein family membership.
Developed using patterns that best fit particular protein
families and functions.
30. PROSITE
Features:
1.Much shorter than total sequence length
2.Provide information on acceptable substitution.
3.Provide information on shared biological functions.
32. PRINTS and BLOCKS
Contain multiply aligned ungapped segments.
BLOCKS- blocks
PRINTS - motifs
33. PRINTS and BLOCKS
Advantages
1. Potentially more sensitive (more
distant relationships can be found)
2.More specific (fewer false positives
occur)
35. OMIM
Online Mendelian Inheritance in Man
Comprehensive database of human genes and genetic
disorders.
Has numerous links to databases like SWISS- PROT,
PubMed, Mutation databases, Mapviewer.
41. Structural Databases
PDBe of EBI
MMDB
Structures derived from the PDB, with value-added features
such as,
Explicit chemical graphs,
Links to literature,
Similar sequences,
Related 3D structures,
Information about chemicals