SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Department of Zoology, GACW (2018-2019) Page 1
Major Biological databases
Introduction:
Database is convenient system to properly store, search and retrieve any type of data.
Its help to easily handle and share large amount of data. Biological databases are libraries of life sciences
information, collected from scientific experiments, published literature, high –throughput experiment
technology and computational analysis. They contain information from genomics, proteomics,
microarray gene expression etc.
Variants of Biological Database
1. Primary Database
2. Secondary database
3. Composite Database
Primary databases:
 Contains original data from the researchers.
 Public or open access mostly.
Biological Databases
Transcriptome
databases
Structure database Genome
databases
Sequence databases
 Nucleotide
 Protein
Model Organism
databases
 PlasmoDB,
 TAIR etc
Department of Zoology, GACW (2018-2019) Page 2
Eg: NCBI GenBank, EmBL, DDBJ
Secondary databases:
A Secondary database contains additional information derived from the analysis of data
available in primary databases. Manually created or automatically generated data are available.
Eg TrEMBL, Pfam, Profiles, Scop, CATH
GenBank (Genetic Sequence Databank)
Introduction:
 GenBank® is the genetic sequence database at the National Center for Biotechnology Information
(NCBI).
 It wasestablished in the year 1982 and now maintained by the National Center for Biotechnology
(NCBI).
 DNA sequences can be submitted to GenBank using several different methods.
 It contains publicly available nucleotide sequences for more than 240 000 named organisms,
obtained primarily through submissions from individual laboratories and batch submissions from
large-scale sequencing projects.
 It has a flat file structure that is an ASCII text file, readable & downloadable by both humans and
computers.
Sequence Submission:
 GenBank is built by direct submissions from individual laboratories, as well as from bulk
submissions from large-scale sequencing centers.
 Only original sequences can be submitted to GenBank.
 Direct submissions are made to GenBank using BankIt, which is a Web-based form, or the stand-
alone submission program, Sequin.
 Upon receipt of a sequence submission, the GenBank staff examines the originality of the data
and assigns an accession number to the sequence and performs quality assurance checks.
 The submissions are then released to the public database, where the entries are retrievable
by Entrez or downloadable by FTP.
 Bulk submissions of Expressed Sequence Tag (EST), Sequence-tagged site (STS), Genome
Survey Sequence (GSS), and High-Throughput Genome Sequence (HTGS) data are most often
submitted by large-scale sequencing centers.
 The GenBank direct submissions group also processes complete microbial genome sequences.
Department of Zoology, GACW (2018-2019) Page 3
GenBank flat file Format
GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section.
The start of the annotation section is marked by a line beginning with the word "LOCUS". The start of
sequence section is marked by a line beginning with the word "ORIGIN" and the end of the section is
marked by a line with only "//".
Department of Zoology, GACW (2018-2019) Page 4
1. The LOCUS field:
It consists of five different subfields, namely:
 1a Locus Name (e.g. HSHFE) - It is a tag for grouping similar sequences.
 The first two or three letters usually designate the organism.
 In this case HS stands for Homo sapiens. The last several characters are associated with another
group designation, such as gene product. In this example, the last three digits represent the gene
symbol, HFE.
 1b Sequence Length (12146 bp) – It is the total number of nucleotide base pairs (or amino acid
residues) in the sequence record.
 1c Molecule Type (e.g. DNA) - Type of molecule that was sequenced.
 1d GenBank Division (PRI) - GenBank has different divisions.
 In this example, PRI stands for primate sequences.
 Other divisions include ROD (rodent sequences), MAM (other mammal sequences), PLN (plant,
fungal, and algal sequences), &BCT (bacterial sequences).
2. 1e Modification Date (23-July-1999) - Date of most recent modification made to the record.
DEFINITION: – It is a brief description of the sequence.
 The description may include source organism name, gene or protein name, or designation as
untranscribed or untranslated sequences (e.g., a promoter region).
 For sequences containing a coding region (CDS), the definition field may also contain a
“completeness” qualifier such as "complete CDS" or "exon 1."
3. ACCESSION (Z92910): – It is a unique identifier assigned to a complete sequence record.
 This number never changes, even if the record is modified.
4. VERSION (Z92910.1) – It is an identification number assigned to a single, specific sequence in
the database.
Department of Zoology, GACW (2018-2019) Page 5
 This number is in the format “accession.version.”
 If any changes are made to the sequence data, the version part of the number will increase by one.
 E.g. U12345.1 becomes U12345.2.
5. Gene Identifier (GI) (1890179) - Also a sequence identification number.
 Whenever a sequence is changed, the version number is increased and a new GI is assigned.
6. KEYWORDS (haemochromatosis; HFE gene) – A “keyword” can be “any word or phrase used
to describe the sequence”.
7. SOURCE (human) -Usually contains an abbreviated or common name of the source organism.
8. ORGANISM (Homo sapiens)- The scientific name (usually genus & species)
9. REFERENCE –It is a citation of publications by sequence authors that supports information
presented in the sequence record.
 Several references may be included in one record.
 References are automatically sorted from the oldest to the newest.
 Cited publications are searchable by author, article or publication title, journal title, or MEDLINE
unique identifier (UID).
Department of Zoology, GACW (2018-2019) Page 6
10. . The FEATURES Table:
11. BASE COUNT & ORIGIN:
BASE COUNT - Base Count gives the total number of adenine (A), cytosine (C), guanine (G), and
thymine (T) bases in the sequence.
12. ORIGIN - Origin contains the sequence data, which begins on the line immediately below the field
title.
Department of Zoology, GACW (2018-2019) Page 7
//
 Genbank Division shows the GenBank division to which a record belongs and is indicated by a three
letter abbreviation.
1. PRI - primate sequences
2. ROD - rodent sequences
3. MAM - other mammalian sequences
4. VRT - other vertebrate sequences
5. INV - invertebrate sequences
6. PLN - plant, fungal, and algal sequences
7. BCT - bacterial sequences
8. VRL - viral sequences
9. PHG - bacteriophage sequences
10. SYN - synthetic sequences
11. UNA - unannotated sequences
12. EST - EST sequences (expressed sequence tags)
13. PAT - patent sequences
14. STS - STS sequences (sequence tagged sites)
15. GSS - GSS sequences (genome survey sequences)
16. HTG - HTG sequences (high-throughput genomic seq)
17. HTC - unfinished high-throughput cDNA sequencing
18. ENV - environmental sampling sequences
European Molecular Biology Laboratory (EMBL)
 The European Molecular Biology Laboratory (EMBL) is a molecular biology research institution
supported by 22 member states, four prospect and two associate member states.
 EMBL was created in 1974 and is an inter-governmental organization funded by public research
money from its member states.
 The Laboratory operates from five sites: the main laboratory in Heidelberg, and outstations in
Hinxton (the European Bioinformatics Institute (EBI), in England), Grenoble (France), Hamburg
(Germany), and Monterotondo (near Rome).
Department of Zoology, GACW (2018-2019) Page 8
 EMBL groups and laboratories perform basic research in molecular biology and molecular
medicine as well as training for scientists, students and visitors.
 Israel is the only Asian state that has full membership.
 The EMBL Nucleotide Sequence Database (http:// www.ebi.ac.uk/embl/), maintained at the
European Bioinformatics Institute (EBI),
 It is used to incorporate and distributes nucleotide sequences from public sources.
 The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA).
 Data are exchanged between the collaborating databases on a daily basis.
 The web-based tool, Webin, is the preferred system for individual submission of nucleotide
sequences, including Third Party Annotation (TPA) and alignment data.
 Automatic submission procedures are used for submission of data from large-scale genome
sequencing
 The latest data collection can be accessed via FTP, email and WWW interfaces.
 The EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and
protein databases as well as many other specialist molecular biology databases.
 For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that
allow external users to compare their own sequences against the data in the EMBL Nucleotide
Sequence Database and other databases.
 All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
Department of Zoology, GACW (2018-2019) Page 9
The EMBL Nucleotide Sequence database
 The main activity of the group is the development, maintenance and distribution of a
comprehensive database of nucleotide sequences.
 The EMBL nucleotide sequence database, produced in collaboration with GenBank and the DNA
database of Japan, is Europe’s primary nucleotide sequence data resource.
 Each of these three groups collects a portion of the total sequence data reported worldwide. All
new and updated database entries are exchanged between the groups on a daily basis.
 Important sources of data have been secured through collaborations with genomic sequencing
projects and other groups, such as phylogenetic research groups, who produce large quantities of
new nucleotide sequence data.
 A typical entry (Flat File) contains a sequence, a brief description for cataloging purposes, the
taxonomic description of the source organism, bibliographic information, and the feature table,
containing locations of coding regions and other biologically significant sites.
EMBL flat file format
ID LISOD standard; DNA; PRO; 756 BP.
XX
AC X64011; S78972;
XX
SV X64011.1
XX
DT 28-APR-1992 (Rel. 31, Created)
DT 30-JUN-1993 (Rel. 36, Last updated, Version 6)
XX
DE L.ivanovii sod gene for superoxide dismutase
XX
KW sod gene; superoxide dismutase.
XX
OS Listeria ivanovii
OC Bacteria; Firmicutes; Bacillus/Clostridium group;
OC Bacillus/Staphylococcus group; Listeria.
XX
RN [1]
RX MEDLINE; 92140371.
RA Haas A., Goebel W.;
RT "Cloning of a superoxide dismutase gene from Listeria ivanovii by
RT functional complementation in Escherichia coli and characterization of the
RT gene product.";
RL Mol. Gen. Genet. 231:313-322(1992).
XX
RN [2]
RP 1-756
RA Kreft J.;
RT ;
RL Submitted (21-APR-1992) to the EMBL/GenBank/DDBJ databases.
RL J. Kreft, Institut f. Mikrobiologie, UniversitaetWuerzburg, Biozentrum Am
RL Hubland, 8700 Wuerzburg, FRG
Department of Zoology, GACW (2018-2019) Page 10
XX
DR SWISS-PROT; P28763; SODM_LISIV.
XX
FH Key Location/Qualifiers
FH
FT source 1..756
FT /db_xref="taxon:1638"
FT /organism="Listeria ivanovii"
FT /strain="ATCC 19119"
FT RBS 95..100
FT /gene="sod"
FT terminator 723..746
FT /gene="sod"
FT CDS 109..717
FT /db_xref="SWISS-PROT:P28763"
FT /transl_table=11
FT /gene="sod"
FT /EC_number="1.15.1.1"
FT /product="superoxide dismutase"
FT /protein_id="CAA45406.1"
FT /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVSG
FT HAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLKAA
FT IESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPVLGL
FT DVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
XX
SQ Sequence 756 BP; 247 A; 136 C; 151 G; 222 T; 0 other;
cgttatttaaggtgttacatagttctatggaaatagggtctatacctttcgccttacaat 60
gtaatttcttttcacataaataataaacaatccgaggaggaatttttaatgacttacgaa 120
ttaccaaaattaccttatacttatgatgctttggagccgaattttgataaagaaacaatg 180
gaaattcactatacaaagcaccacaatatttatgtaacaaaactaaatgaagcagtctca 240
ggacacgcagaacttgcaagtaaacctggggaagaattagttgctaatctagatagcgtt 300
cctgaagaaattcgtggcgcagtacgtaaccacggtggtggacatgctaaccatacttta 360
ttctggtctagtcttagcccaaatggtggtggtgctccaactggtaacttaaaagcagca 420
atcgaaagcgaattcggcacatttgatgaattcaaagaaaaattcaatgcggcagctgcg 480
gctcgttttggttcaggatgggcatggctagtagtgaacaatggtaaactagaaattgtt 540
tccactgctaaccaagattctccacttagcgaaggtaaaactccagttcttggcttagat 600
gtttgggaacatgcttattatcttaaattccaaaaccgtcgtcctgaatacattgacaca 660
ttttggaatgtaattaactgggatgaacgaaataaacgctttgacgcagcaaaataatta 720
tcgaaaggctcacttaggtgggtctttttatttcta 756
//
Description of flat file information:
ID - Identification.
AC - Accession number(s).
DT - Date.
DE - Description.
GN - Gene name(s).
OS - Organism species.
OG - Organelle.
OC - Organism classification.
RN - Reference number.
Department of Zoology, GACW (2018-2019) Page 11
RP - Reference position.
RC - Reference comments.
RX - Reference cross-references.
RA - Reference authors.
RL - Reference location.
CC - Comments or notes.
DR - Database cross-references.
KW - Keywords.
FT - Feature table data.
SQ - Sequence header.
- (blanks) sequence data.
// - Termination line.
Some entries do not contain all of the line types, and some line types occur many times in a single entry.
Each entry must begin with an identification line (ID) and end with a terminator line (//).
References:
 https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
 https://www.ncbi.nlm.nih.gov/genbank/
 https://www.embl.org/index.php
 https://www.slideshare.net/HafizMuhammadRaza/european-molecular-biology-
laboratory-embl-129985837?qid=38c50267-4b68-4a95-b353-
323d8826456f&v=&b=&from_search=1

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Protein database
Protein databaseProtein database
Protein database
 
Ddbj
DdbjDdbj
Ddbj
 
swiss-prot<bioinformatics>
swiss-prot<bioinformatics>swiss-prot<bioinformatics>
swiss-prot<bioinformatics>
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
EMBL
EMBLEMBL
EMBL
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Scop database
Scop databaseScop database
Scop database
 
YEAST TWO HYBRID SYSTEM
 YEAST TWO HYBRID SYSTEM YEAST TWO HYBRID SYSTEM
YEAST TWO HYBRID SYSTEM
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 

Ähnlich wie Major biological nucleotide databases

Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 

Ähnlich wie Major biological nucleotide databases (20)

Gen bank
Gen bankGen bank
Gen bank
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 
Biological database
Biological databaseBiological database
Biological database
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
Rishi
RishiRishi
Rishi
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
NCBI
NCBINCBI
NCBI
 
Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 

Mehr von Vidya Kalaivani Rajkumar

Mehr von Vidya Kalaivani Rajkumar (20)

Recombinant vaccines-Peptide Vaccines
Recombinant vaccines-Peptide Vaccines Recombinant vaccines-Peptide Vaccines
Recombinant vaccines-Peptide Vaccines
 
Transgenic plants- Abiotic stress tolerance
Transgenic plants- Abiotic stress toleranceTransgenic plants- Abiotic stress tolerance
Transgenic plants- Abiotic stress tolerance
 
Bioreactors in tissue engineering
Bioreactors in tissue engineeringBioreactors in tissue engineering
Bioreactors in tissue engineering
 
Tissue assembly in microgravity
Tissue assembly in microgravityTissue assembly in microgravity
Tissue assembly in microgravity
 
In vivo synthesis of tissues and organs
In vivo synthesis of tissues and organsIn vivo synthesis of tissues and organs
In vivo synthesis of tissues and organs
 
Bioartificial pancreas
Bioartificial pancreasBioartificial pancreas
Bioartificial pancreas
 
Biomaterials for tissue engineering
Biomaterials for tissue engineeringBiomaterials for tissue engineering
Biomaterials for tissue engineering
 
Haematopoietic system
Haematopoietic systemHaematopoietic system
Haematopoietic system
 
Fasta
FastaFasta
Fasta
 
Water vascular system of star fish
Water vascular system of star fishWater vascular system of star fish
Water vascular system of star fish
 
Cephalopodes are advance molluscs
Cephalopodes are advance molluscsCephalopodes are advance molluscs
Cephalopodes are advance molluscs
 
Beat air pollution
Beat air pollution Beat air pollution
Beat air pollution
 
Birth control methods
Birth control methodsBirth control methods
Birth control methods
 
Future of human evolution
Future of human evolutionFuture of human evolution
Future of human evolution
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Assignment on developmental zoology
Assignment on developmental zoologyAssignment on developmental zoology
Assignment on developmental zoology
 
Development of chick
Development of chickDevelopment of chick
Development of chick
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
 
Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
 
Swiss PROT
Swiss PROT Swiss PROT
Swiss PROT
 

Kürzlich hochgeladen

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Kürzlich hochgeladen (20)

Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 

Major biological nucleotide databases

  • 1. Department of Zoology, GACW (2018-2019) Page 1 Major Biological databases Introduction: Database is convenient system to properly store, search and retrieve any type of data. Its help to easily handle and share large amount of data. Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high –throughput experiment technology and computational analysis. They contain information from genomics, proteomics, microarray gene expression etc. Variants of Biological Database 1. Primary Database 2. Secondary database 3. Composite Database Primary databases:  Contains original data from the researchers.  Public or open access mostly. Biological Databases Transcriptome databases Structure database Genome databases Sequence databases  Nucleotide  Protein Model Organism databases  PlasmoDB,  TAIR etc
  • 2. Department of Zoology, GACW (2018-2019) Page 2 Eg: NCBI GenBank, EmBL, DDBJ Secondary databases: A Secondary database contains additional information derived from the analysis of data available in primary databases. Manually created or automatically generated data are available. Eg TrEMBL, Pfam, Profiles, Scop, CATH GenBank (Genetic Sequence Databank) Introduction:  GenBank® is the genetic sequence database at the National Center for Biotechnology Information (NCBI).  It wasestablished in the year 1982 and now maintained by the National Center for Biotechnology (NCBI).  DNA sequences can be submitted to GenBank using several different methods.  It contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects.  It has a flat file structure that is an ASCII text file, readable & downloadable by both humans and computers. Sequence Submission:  GenBank is built by direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centers.  Only original sequences can be submitted to GenBank.  Direct submissions are made to GenBank using BankIt, which is a Web-based form, or the stand- alone submission program, Sequin.  Upon receipt of a sequence submission, the GenBank staff examines the originality of the data and assigns an accession number to the sequence and performs quality assurance checks.  The submissions are then released to the public database, where the entries are retrievable by Entrez or downloadable by FTP.  Bulk submissions of Expressed Sequence Tag (EST), Sequence-tagged site (STS), Genome Survey Sequence (GSS), and High-Throughput Genome Sequence (HTGS) data are most often submitted by large-scale sequencing centers.  The GenBank direct submissions group also processes complete microbial genome sequences.
  • 3. Department of Zoology, GACW (2018-2019) Page 3 GenBank flat file Format GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word "LOCUS". The start of sequence section is marked by a line beginning with the word "ORIGIN" and the end of the section is marked by a line with only "//".
  • 4. Department of Zoology, GACW (2018-2019) Page 4 1. The LOCUS field: It consists of five different subfields, namely:  1a Locus Name (e.g. HSHFE) - It is a tag for grouping similar sequences.  The first two or three letters usually designate the organism.  In this case HS stands for Homo sapiens. The last several characters are associated with another group designation, such as gene product. In this example, the last three digits represent the gene symbol, HFE.  1b Sequence Length (12146 bp) – It is the total number of nucleotide base pairs (or amino acid residues) in the sequence record.  1c Molecule Type (e.g. DNA) - Type of molecule that was sequenced.  1d GenBank Division (PRI) - GenBank has different divisions.  In this example, PRI stands for primate sequences.  Other divisions include ROD (rodent sequences), MAM (other mammal sequences), PLN (plant, fungal, and algal sequences), &BCT (bacterial sequences). 2. 1e Modification Date (23-July-1999) - Date of most recent modification made to the record. DEFINITION: – It is a brief description of the sequence.  The description may include source organism name, gene or protein name, or designation as untranscribed or untranslated sequences (e.g., a promoter region).  For sequences containing a coding region (CDS), the definition field may also contain a “completeness” qualifier such as "complete CDS" or "exon 1." 3. ACCESSION (Z92910): – It is a unique identifier assigned to a complete sequence record.  This number never changes, even if the record is modified. 4. VERSION (Z92910.1) – It is an identification number assigned to a single, specific sequence in the database.
  • 5. Department of Zoology, GACW (2018-2019) Page 5  This number is in the format “accession.version.”  If any changes are made to the sequence data, the version part of the number will increase by one.  E.g. U12345.1 becomes U12345.2. 5. Gene Identifier (GI) (1890179) - Also a sequence identification number.  Whenever a sequence is changed, the version number is increased and a new GI is assigned. 6. KEYWORDS (haemochromatosis; HFE gene) – A “keyword” can be “any word or phrase used to describe the sequence”. 7. SOURCE (human) -Usually contains an abbreviated or common name of the source organism. 8. ORGANISM (Homo sapiens)- The scientific name (usually genus & species) 9. REFERENCE –It is a citation of publications by sequence authors that supports information presented in the sequence record.  Several references may be included in one record.  References are automatically sorted from the oldest to the newest.  Cited publications are searchable by author, article or publication title, journal title, or MEDLINE unique identifier (UID).
  • 6. Department of Zoology, GACW (2018-2019) Page 6 10. . The FEATURES Table: 11. BASE COUNT & ORIGIN: BASE COUNT - Base Count gives the total number of adenine (A), cytosine (C), guanine (G), and thymine (T) bases in the sequence. 12. ORIGIN - Origin contains the sequence data, which begins on the line immediately below the field title.
  • 7. Department of Zoology, GACW (2018-2019) Page 7 //  Genbank Division shows the GenBank division to which a record belongs and is indicated by a three letter abbreviation. 1. PRI - primate sequences 2. ROD - rodent sequences 3. MAM - other mammalian sequences 4. VRT - other vertebrate sequences 5. INV - invertebrate sequences 6. PLN - plant, fungal, and algal sequences 7. BCT - bacterial sequences 8. VRL - viral sequences 9. PHG - bacteriophage sequences 10. SYN - synthetic sequences 11. UNA - unannotated sequences 12. EST - EST sequences (expressed sequence tags) 13. PAT - patent sequences 14. STS - STS sequences (sequence tagged sites) 15. GSS - GSS sequences (genome survey sequences) 16. HTG - HTG sequences (high-throughput genomic seq) 17. HTC - unfinished high-throughput cDNA sequencing 18. ENV - environmental sampling sequences European Molecular Biology Laboratory (EMBL)  The European Molecular Biology Laboratory (EMBL) is a molecular biology research institution supported by 22 member states, four prospect and two associate member states.  EMBL was created in 1974 and is an inter-governmental organization funded by public research money from its member states.  The Laboratory operates from five sites: the main laboratory in Heidelberg, and outstations in Hinxton (the European Bioinformatics Institute (EBI), in England), Grenoble (France), Hamburg (Germany), and Monterotondo (near Rome).
  • 8. Department of Zoology, GACW (2018-2019) Page 8  EMBL groups and laboratories perform basic research in molecular biology and molecular medicine as well as training for scientists, students and visitors.  Israel is the only Asian state that has full membership.  The EMBL Nucleotide Sequence Database (http:// www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI),  It is used to incorporate and distributes nucleotide sequences from public sources.  The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA).  Data are exchanged between the collaborating databases on a daily basis.  The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data.  Automatic submission procedures are used for submission of data from large-scale genome sequencing  The latest data collection can be accessed via FTP, email and WWW interfaces.  The EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases.  For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database and other databases.  All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
  • 9. Department of Zoology, GACW (2018-2019) Page 9 The EMBL Nucleotide Sequence database  The main activity of the group is the development, maintenance and distribution of a comprehensive database of nucleotide sequences.  The EMBL nucleotide sequence database, produced in collaboration with GenBank and the DNA database of Japan, is Europe’s primary nucleotide sequence data resource.  Each of these three groups collects a portion of the total sequence data reported worldwide. All new and updated database entries are exchanged between the groups on a daily basis.  Important sources of data have been secured through collaborations with genomic sequencing projects and other groups, such as phylogenetic research groups, who produce large quantities of new nucleotide sequence data.  A typical entry (Flat File) contains a sequence, a brief description for cataloging purposes, the taxonomic description of the source organism, bibliographic information, and the feature table, containing locations of coding regions and other biologically significant sites. EMBL flat file format ID LISOD standard; DNA; PRO; 756 BP. XX AC X64011; S78972; XX SV X64011.1 XX DT 28-APR-1992 (Rel. 31, Created) DT 30-JUN-1993 (Rel. 36, Last updated, Version 6) XX DE L.ivanovii sod gene for superoxide dismutase XX KW sod gene; superoxide dismutase. XX OS Listeria ivanovii OC Bacteria; Firmicutes; Bacillus/Clostridium group; OC Bacillus/Staphylococcus group; Listeria. XX RN [1] RX MEDLINE; 92140371. RA Haas A., Goebel W.; RT "Cloning of a superoxide dismutase gene from Listeria ivanovii by RT functional complementation in Escherichia coli and characterization of the RT gene product."; RL Mol. Gen. Genet. 231:313-322(1992). XX RN [2] RP 1-756 RA Kreft J.; RT ; RL Submitted (21-APR-1992) to the EMBL/GenBank/DDBJ databases. RL J. Kreft, Institut f. Mikrobiologie, UniversitaetWuerzburg, Biozentrum Am RL Hubland, 8700 Wuerzburg, FRG
  • 10. Department of Zoology, GACW (2018-2019) Page 10 XX DR SWISS-PROT; P28763; SODM_LISIV. XX FH Key Location/Qualifiers FH FT source 1..756 FT /db_xref="taxon:1638" FT /organism="Listeria ivanovii" FT /strain="ATCC 19119" FT RBS 95..100 FT /gene="sod" FT terminator 723..746 FT /gene="sod" FT CDS 109..717 FT /db_xref="SWISS-PROT:P28763" FT /transl_table=11 FT /gene="sod" FT /EC_number="1.15.1.1" FT /product="superoxide dismutase" FT /protein_id="CAA45406.1" FT /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVSG FT HAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLKAA FT IESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPVLGL FT DVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK" XX SQ Sequence 756 BP; 247 A; 136 C; 151 G; 222 T; 0 other; cgttatttaaggtgttacatagttctatggaaatagggtctatacctttcgccttacaat 60 gtaatttcttttcacataaataataaacaatccgaggaggaatttttaatgacttacgaa 120 ttaccaaaattaccttatacttatgatgctttggagccgaattttgataaagaaacaatg 180 gaaattcactatacaaagcaccacaatatttatgtaacaaaactaaatgaagcagtctca 240 ggacacgcagaacttgcaagtaaacctggggaagaattagttgctaatctagatagcgtt 300 cctgaagaaattcgtggcgcagtacgtaaccacggtggtggacatgctaaccatacttta 360 ttctggtctagtcttagcccaaatggtggtggtgctccaactggtaacttaaaagcagca 420 atcgaaagcgaattcggcacatttgatgaattcaaagaaaaattcaatgcggcagctgcg 480 gctcgttttggttcaggatgggcatggctagtagtgaacaatggtaaactagaaattgtt 540 tccactgctaaccaagattctccacttagcgaaggtaaaactccagttcttggcttagat 600 gtttgggaacatgcttattatcttaaattccaaaaccgtcgtcctgaatacattgacaca 660 ttttggaatgtaattaactgggatgaacgaaataaacgctttgacgcagcaaaataatta 720 tcgaaaggctcacttaggtgggtctttttatttcta 756 // Description of flat file information: ID - Identification. AC - Accession number(s). DT - Date. DE - Description. GN - Gene name(s). OS - Organism species. OG - Organelle. OC - Organism classification. RN - Reference number.
  • 11. Department of Zoology, GACW (2018-2019) Page 11 RP - Reference position. RC - Reference comments. RX - Reference cross-references. RA - Reference authors. RL - Reference location. CC - Comments or notes. DR - Database cross-references. KW - Keywords. FT - Feature table data. SQ - Sequence header. - (blanks) sequence data. // - Termination line. Some entries do not contain all of the line types, and some line types occur many times in a single entry. Each entry must begin with an identification line (ID) and end with a terminator line (//). References:  https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html  https://www.ncbi.nlm.nih.gov/genbank/  https://www.embl.org/index.php  https://www.slideshare.net/HafizMuhammadRaza/european-molecular-biology- laboratory-embl-129985837?qid=38c50267-4b68-4a95-b353- 323d8826456f&v=&b=&from_search=1