SlideShare a Scribd company logo
1 of 67
Download to read offline
Basic bioinformatics concepts,
                 databases and tools

                        Introduction to the training
                          and Sequence databases

                                        Joachim Jacob
                                    http://www.bits.vib.be

Updated 22 February 2012
http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod1-intro_H1_2012_SeqDBs.pdf
Scope
        Introductory training to Bioinformatics


        Exploring and understanding
        databases and software
        for everyday bioinformatics use



        If there is any term which is unclear,
             please stop me and ask me!
Bioinformatics ...

         Bio
         all data is derived from living samples

         Informatics
         that data is stored and analyzed in and with computers to obtain
           understanding



         Extremely broad description, for which however we
           will extract common principles during the course
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics is present into every aspect
of life sciences research



                                         , sequences
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics is present into every aspect
of life sciences research
Bioinformatics ...

       Bio
               - different types of living samples
       Informatics
               - storing and categorizing the information
         and          making it easily accessible
               - interpreting that information reliably
Bioinformatics … and his companion

      Bio
              - different types of living samples
      Informatics
              - storing and categorizing the information
        and          making it easily accessible
              - interpreting that information reliably
      Statistics
              - large numbers, observational data
The siblings of Bioinformatics
       Based on the biological component extracted from life, the
         measured properties and the ultimate goal of the
         analysis, different sub-disciplines of bioinformatics exist.


DNA           RNA           proteins metabolites
Genomics
              Transcriptomics
                          Proteomics
                                                  Metabolomics

Epigenomics          Structural bioinformatics
Systems biology      Microbiomics       Interactomics
Metagenomics         Functional genomics Comparative gx
Mere data is worth nothing

CGCTACGCATATCGCT                Data = symbols

- Dasypus novemcinctus          Information = data that are processed to be useful;
- found in my garden               provides answers to "who", "what", "where", and
- Part of genome
- sequenced on June 2010           "when" questions. Also called metadata.

This species seems to be        Knowledge: application of data and information;
related to my neighbor's pet,
because it has also this          answers "how" questions
sequence

Has the same mother             Understanding: appreciation of "why"

                                Wisdom

                                              http://www.systems-thinking.org/dikw/dikw.htm
?                                   !        Life sciences
                                                 research as major
                                                 'end user' for the
          data              knowledge            bioinformatics tools
                                                 and conclusions
                                                 'tool user'
          Tools and approaches




                                                 Bioinformatics
                                                 research, as a
                                                 specific branch on
Biology          Computer           Statistics   the boundary of life
                                                 science,
                                                 mathematics and
                                                 computer science
                                                 'tool manufacturer'
This course is organised in several modules

Module 1: Sequence databases: what, where, how
Module 2: Sequence comparisons: searching, aligning
Module 3: Sequence analysis – domains in protein sequences and
 predicting functionality, standardisation and useful links
Module 4: Beyond sequences - additional important data sources
Module 5: Genome Browsers - integrating biological data and performing
 reproducible bioinformatics research in the Galaxy
Overview of the crash course
One tip for the future

          Be prepared for change...
            Information is fluid
            So are bioinfo tools


          Learn how to accommodate for change
            Major resources are more stable
            Important concepts do not change often
Module 1

           Sequence databases
Module 1: Sequence databases

        Sequence databases store DNA and RNA sequences. In
        Bioinformatics, they are by far (still) the largest
        collections of biological data, and used by many
        subdisciplines of bioinformatics.




                            http://www.ebi.ac.uk/embl/Services/DBStats/
... and growing




                  http://www.ebi.ac.uk/embl/Services/DBStats/
Three major nucleotide databanks host primary
sequence data
      European Nucleotide Archive (ENA) at EBI - http://www.ebi.ac.uk/
       Division EMBL-bank (European Molecular Biology Laboratory) (single)
       Trace Archive
       SRA Archive



      GenBank at NCBI - http://www.ncbi.nlm.nih.gov/
       maintained at NCBI (National Center for Biotechnology Information,
       (USA)



      DDBJ (DNA Data Bank of Japan) - http://www.ddbj.nig.ac.jp/
       maintained at NIG/CIB (National Institute of Genetics, Center for
       Information Biology, Mishima, Japan)
These databases are filled with NA sequence
   information by scientists and consortia
               Large-scale      Individual      Patent
               sequencing       scientists      Offices                ACTGCTGCTA
                                                                       GCTAGCTGAT
                 projects                                              CTATGCTAGC
                                                                       TGTAGCTGAG




                                                                           Primary
                                                                        sequence data

                           each primary sequence
                                     =
                               one experiment                              Primary
                                                                           sequence
                           Basically, all 'source' nucleotide
                                        material                           database


Jennifer McDowall - http://www.biotnet.org/training-materials/nucleotide-sequence-databases-ena
Primary NA sequence can be produced by
   Sanger-based technologies or NGS technologies

                                     Sanger
            sample
                                     Low output in number of seqs, high quality, 400-850 bp.
                                     Read profiles in .abi format. Stored in Trace Archive.
      RNA            DNA
          RT
                                     NGS
                                     Different technologies. Extremely high output rate, low
     cDNA                            quality, 30 bp – 600 bp. Reads in .fastq format, stored in
                                     the SRA.

                                     These techniques can only read DNA strands,
                                     so RNA needs first to be converted to cDNA
                                     with reverse transcriptases prior to loading to
                                     the machines.


Sanger overview: http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Obenrader
NGS overview: http://seqanswers.com/forums/showthread.php?t=3561
Overview major DNA reading technologies




            Dennis Wall, NGS Data Analysis and Computation I course, Wall Lab
In the primary sequence dbs a major distinction
can be made in two major categories
             High quality single submission (Sanger)
               - gene sequence (genomic – 'STD' data class)
               - mRNA sequence (via cDNA – 'STD')
               - BAC/YAC/cosmid sequences
               - genome sequencing projects (contigs,
               assemblies, WGS)
 DNA
cDNA   RNA     - genome markers, STS (sequence tagged
               sites, unique short sequences from a
               genome)

             Low quality batch submissions
               - Expressed Sequence Tags (EST)
               - Genome Survey Sequences (GSS)
               - high-throughput sequence data (e.g. NGS)
                                  http://www.ebi.ac.uk/ena/about/formats
The batch submissions originate mostly from
sequencing centers
         Large-scale
         sequencing
           projects                                            chromosome

                                                    fragment


                                                               sequencing library


         submission                                            sequence reads
       e.g. whole genome shotgun




         submission                                                 assemble
                                                                    sequence

         submission                                                 annotation
                                   cyp30   cyp309            insv
                                                     cg343
Each primary database stores their sequences
and batch submissions in their own way...
           - NCBI: ESTs are stored in dbEST (separate database)
           - ENA: ESTs are part of EMBL-bank in 'EST' data class

           Similar for GSS (see dbGSS at NCBI)


           ESTs : expressed sequence tag, often partial sequence
             derived from RNA in batch. See example
                                                 >est1
                                                 ATCGACTAGCATCA
  sample                                         >est2
                                                 TCGACTAGCGACTA
                               RNA-seq           >est3
                   RNA                           CAGCATCATCGAC
http://www.biotnet.org/sites/biotnet.org/files/documents/17/2010_ena_v2.0.ppt

Batch submissions are marked and/or stored
differently than single submissions
                                                                   Data class ESTs are
ENA-Annotation:                                                   also batch submissions
Feature annotation


                                                    1) EMBL-Bank

ENA-Assembly:
Assembly information
                                                                           Batch submissions


ENA-Reads:                                      2) Trace Archive
Sequencing and                                     - Raw data (capillary sequencing)
sampling information
                                                3) Sequence Read Archive
                                                  - Raw data (Next Gen sequencing)


  TIER                    CLASS                        TYPE                ENA structure
The 'normal' submissions are a minority in
primary sequence databases




             http://www.ebi.ac.uk/ena/about/statistics#embl_bases_per_dataclass
Primary sequence dbs are synchronised and
every sequence receives a unique identifier
      All database maintainers assign and share a unique accession number (AC) to each
      sequence – besides their own ID number – (info at NCBI). Sequences can get updated,
      and the accession number is extended with a version number, e.g. .1 (see SVA)
       Example of acc number: BC010109.2



http://www.insdc.org/
Collaboration on         GenBank                          DDBJ
Features, taxonomy,...    + SRA

                                                                          Synchronized
                               International nucleotide
                           Sequence databases collaboration               daily

                                                               All use the same
                                                               - Accession Ids
                                          ENA                  - Project Ids
                                                               - Feature tables (see later)




                               http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)
One sequence entry contains three categories
of different types of information

   1. Info about sequence, submitters and literature (metadata)
   2. Annotations of the sequence (metadata related to the seq)
   3. Stretch of ATGC / AUGC sequence (the 'data', at the bottom)
   •
       A sequence record is called 'annotated' when biological information is
       added and linked to a position in the sequence
   •
       Annotations, also called 'features', are abbreviated as codes, which
       can be found in the Feature Tables




                                        http://www.ebi.ac.uk/embl/Documentation/FT_d
This sequence information can be written in
  different formats
   (plain) Text format, e.g. GenBank
         1. General info

                                                     Official shared accession


                                                     Genbank specific identifier
                                                     (just sums up with each new)

                                                     A lot of different identifiers!
                                                     ~number of databases
                                                     → conversion tools can translate
                                                     identifiers needed (see exercises)

*In humans: HUGO Nomenclature committee determines the right gene
name
                             http://mobyle.pasteur.fr/cgi-bin/portal.py#tutorials::seqfmt
2. Annotation
                                  db_xref = cross references,

                                  = links to records of other
                                  databases which are related
                                  to this record (see later). The
                                  format dbname:identifier




Feature name     Qualifier name
3. Sequence




        Each protein sequence receives also an
        accession number
Other sequence formats
                 Fasta (minimal metadata, basically only sequence)
                      >genename And a description
                      ATCGATGCAGCTATATCCTCGCGATCAGC
                      CGGACAGCTCTCGAGCGCATCGACGACGAC
                 ASN.1       Abstract Syntax Notation (ASN.1)


                 EMBL :all info as in gb, online referred to as 'plain text'
                 XML
                 Fastq : sequence info and base 'call' quality
Important
'Format' has nothing to do with which program you save your file! You don't
have a choice: it needs to be 'plain text format' (.txt - not a file which can be
opened with MS Word such as .doc or .rtf files). Wordpad is a good choice for
this. 'Format' in bioinfo is all about how the information is structured and written
down in the plain text file.
                         http://emboss.sourceforge.net/docs/themes/SequenceFormats.html
http://www.biotnet.org/sites/biotnet.org/files/documents/17/2010_ena_v2.0.ppt


Degree of annotation differs between entries
                                                Batch submitted sequences are
ENA-Annotation:                                 annotated poorly, single
Feature annotation
                                                submissions are annotated better

                                                      Good seq
                                                    1) EMBL-Bank
                                                     annotations
ENA-Assembly:
Assembly information




ENA-Reads:                                      2)Experiment information
                                                   Trace Archive
                                                 is- of most(capillary sequencing)
                                                     Raw data importance in
Sequencing and
sampling information                              batch submissions (e.g.
                                                3) Sequence Read which
                                                    which species, Archive
                                                  - Raw data (Next Gen sequencing)
                                                        technique, ...)

  TIER                    CLASS                        TYPE              ENA structure
SRA contains batch submitted records of which
experiment information is of most importance




    Since the sequences are barely (not) annotated, is
    experiment description important: which machine, which
    organism, which tissue, which developmental stage,
    disease, treatment, …
How to get sequences into the db, and back out

 Submit                                                  Retrieve
 Always submit your sequence data (mostly                One or few sequences
 obliged by journals) and include your ACC
 number in articles (not any other number).               → Use one of the
                                                         numerous webbased tools
                                                         GenBank: Entrez
                                                         EMBL: EB-eye
                                                         MRS: developed for easy
Sequin (GenBank                                          retrieval
stand alone)
                                              retrieve   Many sequences (Batch
Bankit (GenBank submit
web tool)                                                retrieval)
Webin (EMBL                                              → use ftp (file transfer
                                                         protocol)
online submission)                                       → use perl (flexible pro-
                                                         gramming language)
                                                         → BioMart
                                                         http://www.biomart.org/
Example of a primary NA sequence record (ENA)




                           http://www.ebi.ac.uk/ena/about/formats
Example of a primary NA sequence record (ENA)
                                           Text format




 Code usable for   Data linked to that
   searching             code




                                         http://www.ebi.ac.uk/ena/about/formats
Primary sequence data contains a lot of
redundancy!

                                                             Chromosome sequence

                                                             Several gene sequences
                                                             from different labs

                                                             EST sequences
                                                             from transcripts

                                                             cDNA sequence



       Al match to the same gene. Often you end up in your
       database search with all these sequences...
       A lot of redundancy!
The primary sequences are the basis for
analyses that generate derived sequence data
       Scientists/Consortia → primary databases
             –   Source for further analyses. Which?
                   •   Create protein sequences
                   •   Curate the sequence database
                   •   Assemble genomes
                   •   Searching similarities
                   •   Aggregate information about one gene
                   •   …


                           Results stored in derived databases
Protein databases come in two kinds
The most important protein db is UniProt and
contains 'automatic' and manual entries
    UniProt Knowledge Base - 'the best annotated protein
      database of the world'
      http://www.uniprot.org/
The most important protein db is UniProt and
contains 'automatic' and manual entries
Refseq - The NCBI way to reduce redundancy in
primary sequence data
   RefSeq is NCBI 'Reference Sequences' (prot and nuc)
      Redundancy from primary sequence data is reduced both
       automatically and by manual annotation of NA and protein
       sequences. 'one natural biological molecule = one entry'. Links
       back to the original primary sequences. Hugely popular and a
       basis for a lot of analyses.




                                                               Click to apply
                                                               refseq filter in
                                                               entrez search


                                           http://www.ncbi.nlm.nih.gov/RefSeq/
RefSeq has its own identifiers, not to be mixed
up with accession numbers
    Refseq entry codes looks similar as ACC numbers (but are not ACC numbers –
      underscore!); and RefSeq is also in GenBank format. Note: in 'Features'
      section one can find the raw sequences from what is was derived. (typical
      mistake: search with refseq code in uniprot)
    NC_*   (curated) complete genomic element (chromosome, plasmid,...)
    NT_*   (automated) intermediate assembly from BAC
    NZ_*   (automated) incomplete genomic sequence from WGS
    NW_*   (automated) intermediate assembly from WGS
    NG_*   (curated) incomplete genomic element corresponding to gene
    NM_*   (curated) mRNA
    NR_*   (curated) non-coding RNA or predicted transcript of pseudogene
    NP_*   (curated) protein
    ZP_*   (automated) protein predicted from WGS sequence (NZ_*)
    YP_*   (curated) other predicted protein sequences from NCBI Genome Annotation Pipeline
    XM_*   (automated) mRNA
    XR_*   (automated) non-coding RNA or predicted transcript of pseudogene
    XP_*   (automated) protein

                                              http://www.ncbi.nlm.nih.gov/RefSeq/key.html
                                                       http://www.ncbi.nlm.nih.gov/RefSeq/
UniRef – UniProt redundancy reducing system for
proteins sequences

      Non redundant protein sequences from
       UniProt
        ~ refseq
        Hiding redundant sequences by clustering them
        •
            UniRef100 = complete identical sequences
        •
            UniRef90 = 90% identical sequences
        •
            UniRef50 = 50% identical sequences
        See http://www.uniprot.org/help/uniref
NCBI's Gene – summarizes gene information
including sequence information from primary dbs
      Example of the gene NPR1 from A. thaliana
UniGene – summarizes transcriptomic
information around genes
And a lot more derived databases with
sequence information exist
         Repbase :
         repeats (Alu, …), maintained by Jerzy Jurka at the Genetic
           Information Research Institute (Mountain View CA, USA).
           CENSOR server allows to "clean" sequences.
           http://www.girinst.org/repbase
         MiRBase → published miRNA sequences
         http://www.mirbase.org/
         Eukaryotic promoter database
         http://www.epd.isb-sib.ch/
         UniVec
         GenBank subset + some sequences from commercial sources -
           ftp://ftp.ncbi.nih.gov/pub/UniVec/
The most important sequence databases
overview

                                            Integrated
      Prim seq data
                                              Search
                      Derive    Curat
                      d         ed            Portals
           GB         GenPept   RefSeq          Entrez

           ENA        trEMBL
                                              ENA search
                                                EB-eye
          DDBJ
                      UNIPROT   SwissProt      UniProt
Common gene annotations on sequences

 Genome sequence: e.g. Chr6

           Enhancers/promotors                                        terminator

                                         Intron
 Gene sequence                    exon




 mRNA                                                 AAAAAAAAAAAAA

                                 5'UTR     CDS    3'UTR    poly(A) tail


 protein                                             Genetic code tables
Searching the database for your gene of interest

          First you have to determine for yourself
            which information you want

            - NA sequences vs. protein sequences
            - If NA, genomic sequences, or RNA derived
            - All possible sequences that exists, or curated ones
            - Protein sequences of which quality
            - ...
Entrez is a starting point for searches at NCBI
                           http://www.ncbi.nlm.nih.gov/sites/gquery
Visualising the db_xrefs in records at NCBI
ENA has its text-search portal
                                 http://www.ebi.ac.uk/ena/
Results from an ENA search are organised
following the ENA database structure
UniProt has a simple search box leading to a
sophisticated search results page
Complex searches can be achieved by using the
index codes in the database
                             e.g.

                              “oc=Primates and
                              de=complete and
                              de=cds and
                              de=MHC”

 Code usable for             Could answer: give me
   searching                   all coding sequence
                               of MHC available in
                               primates.
Meta-search tools can search different
sequence databases at once.
    MRS
   Open Source, developed by Maarten Hekkelman at Radboud U.
   (Nijmegen, the Netherlands). Allows searching in different databases at
   once, and provides also statistics on the databases.




Alternatives: ACNUC, SRS
Logical operators
         Searching involves making combinations of conditions.
         Here the difference between a logic and, or and not explained by
         venn diagrams.




            Q1 AND Q2
                &


            Q1 NOT Q2
                !



              Q1 OR Q2
                  |
Hands-on!

        Every module ends with an exercise
          session.

        We will now explore how data is stored in different
         sequence databases. You get …. minutes for this
         exercise.
            Afterwards, we summarizes some of the difficulties
              some of you might have experienced.
Summary
          This course is organised in several modules
          Module 1: Sequence databases
          Three major nucleotide databanks host primary sequence data
          These databases are filled with NA sequence information by scientists and consortia
          The batch submissions originate mostly from sequencing centers
          Each primary database stores their sequences and batch submissions in their own way...
          Batch submissions are marked and/or stored differently than single submissions
          The 'normal' submissions are a minority in primary sequence databases
          Primary sequence dbs are synchronised and every sequence receives a unique identifier
          One sequence entry contains three categories of different types of information
          This sequence information can be written in different formats
          Degree of annotation differs between entries
          SRA contains batch submitted records of which experiment information is of most importance
          How to get sequences into the db, and back out
          Primary sequence data contains a lot of redundancy!
          The primary sequences are the basis for analyses that generate derived sequence data
          Protein databases come in two kinds
          The most important protein db is UniProt and contains 'automatic' and manual entries
          Refseq - The NCBI way to reduce redundancy in primary sequence data
          RefSeq has its own identifiers, not to be mixed up with accession numbers
          UniRef – UniProt redundancy reducing system for proteins sequences
          NCBI's Gene – summarizes gene information including sequence information from primary dbs
          UniGene – summarizes transcriptomic information around genes
          And a lot more derived databases with sequence information exist
          Searching the database for your gene of interest
          Entrez is a starting point for searches at NCBI
          Visualising the db_xrefs in records at NCBI
          ENA has its text-search portal
          Results from an ENA search are organised following the ENA database structure
          UniProt has a simple search box leading to a sophisticated search results page
          Complex searches can be achieved by using the index codes in the database
          Meta-search tools can search different sequence databases at once.
          Hands-on!

More Related Content

What's hot (20)

Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Est database
Est databaseEst database
Est database
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
UniProt
UniProtUniProt
UniProt
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Fasta
FastaFasta
Fasta
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Fasta
FastaFasta
Fasta
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 

Viewers also liked

Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITS
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final PresentationShruthi Choudary
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and developmentrahul_pharma
 
L01 ecture 01-
L01 ecture 01-L01 ecture 01-
L01 ecture 01-MUBOSScz
 
Protein structure alignment beyond spatial proximity 3 dsig_2012
Protein structure alignment beyond spatial proximity 3 dsig_2012Protein structure alignment beyond spatial proximity 3 dsig_2012
Protein structure alignment beyond spatial proximity 3 dsig_2012Sheng Wang
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyJoaquin Dopazo
 
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningSTRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningLars Juhl Jensen
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...Araport
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 

Viewers also liked (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
Bioinformatics Analysis of Nucleotide Sequences
Bioinformatics Analysis of Nucleotide SequencesBioinformatics Analysis of Nucleotide Sequences
Bioinformatics Analysis of Nucleotide Sequences
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final Presentation
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
 
L01 ecture 01-
L01 ecture 01-L01 ecture 01-
L01 ecture 01-
 
Protein structure alignment beyond spatial proximity 3 dsig_2012
Protein structure alignment beyond spatial proximity 3 dsig_2012Protein structure alignment beyond spatial proximity 3 dsig_2012
Protein structure alignment beyond spatial proximity 3 dsig_2012
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningSTRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 

Similar to BITS: Basics of sequence databases

Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Sijo A
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptxrnath286
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsProf. Wim Van Criekinge
 
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...Thitichai Sripan
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformaticsNeil Saunders
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomaticsnguyenpg
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 

Similar to BITS: Basics of sequence databases (20)

Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Biological database
Biological databaseBiological database
Biological database
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 

More from BITS

RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsBITS
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsBITS
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsBITS
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsBITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 

More from BITS (20)

RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome level
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 

Recently uploaded

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationNeilDeclaro1
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 

Recently uploaded (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

BITS: Basics of sequence databases

  • 1. Basic bioinformatics concepts, databases and tools Introduction to the training and Sequence databases Joachim Jacob http://www.bits.vib.be Updated 22 February 2012 http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod1-intro_H1_2012_SeqDBs.pdf
  • 2. Scope Introductory training to Bioinformatics Exploring and understanding databases and software for everyday bioinformatics use If there is any term which is unclear, please stop me and ask me!
  • 3. Bioinformatics ... Bio all data is derived from living samples Informatics that data is stored and analyzed in and with computers to obtain understanding Extremely broad description, for which however we will extract common principles during the course
  • 4. Bioinformatics is present into every aspect of life sciences research
  • 5. Bioinformatics is present into every aspect of life sciences research
  • 6. Bioinformatics is present into every aspect of life sciences research , sequences
  • 7. Bioinformatics is present into every aspect of life sciences research
  • 8. Bioinformatics is present into every aspect of life sciences research
  • 9. Bioinformatics is present into every aspect of life sciences research
  • 10. Bioinformatics is present into every aspect of life sciences research
  • 11. Bioinformatics is present into every aspect of life sciences research
  • 12. Bioinformatics is present into every aspect of life sciences research
  • 13. Bioinformatics ... Bio - different types of living samples Informatics - storing and categorizing the information and making it easily accessible - interpreting that information reliably
  • 14. Bioinformatics … and his companion Bio - different types of living samples Informatics - storing and categorizing the information and making it easily accessible - interpreting that information reliably Statistics - large numbers, observational data
  • 15. The siblings of Bioinformatics Based on the biological component extracted from life, the measured properties and the ultimate goal of the analysis, different sub-disciplines of bioinformatics exist. DNA RNA proteins metabolites Genomics Transcriptomics Proteomics Metabolomics Epigenomics Structural bioinformatics Systems biology Microbiomics Interactomics Metagenomics Functional genomics Comparative gx
  • 16. Mere data is worth nothing CGCTACGCATATCGCT Data = symbols - Dasypus novemcinctus Information = data that are processed to be useful; - found in my garden provides answers to "who", "what", "where", and - Part of genome - sequenced on June 2010 "when" questions. Also called metadata. This species seems to be Knowledge: application of data and information; related to my neighbor's pet, because it has also this answers "how" questions sequence Has the same mother Understanding: appreciation of "why" Wisdom http://www.systems-thinking.org/dikw/dikw.htm
  • 17. ? ! Life sciences research as major 'end user' for the data knowledge bioinformatics tools and conclusions 'tool user' Tools and approaches Bioinformatics research, as a specific branch on Biology Computer Statistics the boundary of life science, mathematics and computer science 'tool manufacturer'
  • 18. This course is organised in several modules Module 1: Sequence databases: what, where, how Module 2: Sequence comparisons: searching, aligning Module 3: Sequence analysis – domains in protein sequences and predicting functionality, standardisation and useful links Module 4: Beyond sequences - additional important data sources Module 5: Genome Browsers - integrating biological data and performing reproducible bioinformatics research in the Galaxy
  • 19. Overview of the crash course
  • 20. One tip for the future Be prepared for change... Information is fluid So are bioinfo tools Learn how to accommodate for change Major resources are more stable Important concepts do not change often
  • 21. Module 1 Sequence databases
  • 22. Module 1: Sequence databases Sequence databases store DNA and RNA sequences. In Bioinformatics, they are by far (still) the largest collections of biological data, and used by many subdisciplines of bioinformatics. http://www.ebi.ac.uk/embl/Services/DBStats/
  • 23. ... and growing http://www.ebi.ac.uk/embl/Services/DBStats/
  • 24. Three major nucleotide databanks host primary sequence data European Nucleotide Archive (ENA) at EBI - http://www.ebi.ac.uk/ Division EMBL-bank (European Molecular Biology Laboratory) (single) Trace Archive SRA Archive GenBank at NCBI - http://www.ncbi.nlm.nih.gov/ maintained at NCBI (National Center for Biotechnology Information, (USA) DDBJ (DNA Data Bank of Japan) - http://www.ddbj.nig.ac.jp/ maintained at NIG/CIB (National Institute of Genetics, Center for Information Biology, Mishima, Japan)
  • 25. These databases are filled with NA sequence information by scientists and consortia Large-scale Individual Patent sequencing scientists Offices ACTGCTGCTA GCTAGCTGAT projects CTATGCTAGC TGTAGCTGAG Primary sequence data each primary sequence = one experiment Primary sequence Basically, all 'source' nucleotide material database Jennifer McDowall - http://www.biotnet.org/training-materials/nucleotide-sequence-databases-ena
  • 26. Primary NA sequence can be produced by Sanger-based technologies or NGS technologies Sanger sample Low output in number of seqs, high quality, 400-850 bp. Read profiles in .abi format. Stored in Trace Archive. RNA DNA RT NGS Different technologies. Extremely high output rate, low cDNA quality, 30 bp – 600 bp. Reads in .fastq format, stored in the SRA. These techniques can only read DNA strands, so RNA needs first to be converted to cDNA with reverse transcriptases prior to loading to the machines. Sanger overview: http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Obenrader NGS overview: http://seqanswers.com/forums/showthread.php?t=3561
  • 27. Overview major DNA reading technologies Dennis Wall, NGS Data Analysis and Computation I course, Wall Lab
  • 28. In the primary sequence dbs a major distinction can be made in two major categories High quality single submission (Sanger) - gene sequence (genomic – 'STD' data class) - mRNA sequence (via cDNA – 'STD') - BAC/YAC/cosmid sequences - genome sequencing projects (contigs, assemblies, WGS) DNA cDNA RNA - genome markers, STS (sequence tagged sites, unique short sequences from a genome) Low quality batch submissions - Expressed Sequence Tags (EST) - Genome Survey Sequences (GSS) - high-throughput sequence data (e.g. NGS) http://www.ebi.ac.uk/ena/about/formats
  • 29. The batch submissions originate mostly from sequencing centers Large-scale sequencing projects chromosome fragment sequencing library submission sequence reads e.g. whole genome shotgun submission assemble sequence submission annotation cyp30 cyp309 insv cg343
  • 30. Each primary database stores their sequences and batch submissions in their own way... - NCBI: ESTs are stored in dbEST (separate database) - ENA: ESTs are part of EMBL-bank in 'EST' data class Similar for GSS (see dbGSS at NCBI) ESTs : expressed sequence tag, often partial sequence derived from RNA in batch. See example >est1 ATCGACTAGCATCA sample >est2 TCGACTAGCGACTA RNA-seq >est3 RNA CAGCATCATCGAC
  • 31. http://www.biotnet.org/sites/biotnet.org/files/documents/17/2010_ena_v2.0.ppt Batch submissions are marked and/or stored differently than single submissions Data class ESTs are ENA-Annotation: also batch submissions Feature annotation 1) EMBL-Bank ENA-Assembly: Assembly information Batch submissions ENA-Reads: 2) Trace Archive Sequencing and - Raw data (capillary sequencing) sampling information 3) Sequence Read Archive - Raw data (Next Gen sequencing) TIER CLASS TYPE ENA structure
  • 32. The 'normal' submissions are a minority in primary sequence databases http://www.ebi.ac.uk/ena/about/statistics#embl_bases_per_dataclass
  • 33. Primary sequence dbs are synchronised and every sequence receives a unique identifier All database maintainers assign and share a unique accession number (AC) to each sequence – besides their own ID number – (info at NCBI). Sequences can get updated, and the accession number is extended with a version number, e.g. .1 (see SVA) Example of acc number: BC010109.2 http://www.insdc.org/ Collaboration on GenBank DDBJ Features, taxonomy,... + SRA Synchronized International nucleotide Sequence databases collaboration daily All use the same - Accession Ids ENA - Project Ids - Feature tables (see later) http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)
  • 34. One sequence entry contains three categories of different types of information 1. Info about sequence, submitters and literature (metadata) 2. Annotations of the sequence (metadata related to the seq) 3. Stretch of ATGC / AUGC sequence (the 'data', at the bottom) • A sequence record is called 'annotated' when biological information is added and linked to a position in the sequence • Annotations, also called 'features', are abbreviated as codes, which can be found in the Feature Tables http://www.ebi.ac.uk/embl/Documentation/FT_d
  • 35. This sequence information can be written in different formats (plain) Text format, e.g. GenBank 1. General info Official shared accession Genbank specific identifier (just sums up with each new) A lot of different identifiers! ~number of databases → conversion tools can translate identifiers needed (see exercises) *In humans: HUGO Nomenclature committee determines the right gene name http://mobyle.pasteur.fr/cgi-bin/portal.py#tutorials::seqfmt
  • 36. 2. Annotation db_xref = cross references, = links to records of other databases which are related to this record (see later). The format dbname:identifier Feature name Qualifier name
  • 37. 3. Sequence Each protein sequence receives also an accession number
  • 38. Other sequence formats Fasta (minimal metadata, basically only sequence) >genename And a description ATCGATGCAGCTATATCCTCGCGATCAGC CGGACAGCTCTCGAGCGCATCGACGACGAC ASN.1 Abstract Syntax Notation (ASN.1) EMBL :all info as in gb, online referred to as 'plain text' XML Fastq : sequence info and base 'call' quality Important 'Format' has nothing to do with which program you save your file! You don't have a choice: it needs to be 'plain text format' (.txt - not a file which can be opened with MS Word such as .doc or .rtf files). Wordpad is a good choice for this. 'Format' in bioinfo is all about how the information is structured and written down in the plain text file. http://emboss.sourceforge.net/docs/themes/SequenceFormats.html
  • 39. http://www.biotnet.org/sites/biotnet.org/files/documents/17/2010_ena_v2.0.ppt Degree of annotation differs between entries Batch submitted sequences are ENA-Annotation: annotated poorly, single Feature annotation submissions are annotated better Good seq 1) EMBL-Bank annotations ENA-Assembly: Assembly information ENA-Reads: 2)Experiment information Trace Archive is- of most(capillary sequencing) Raw data importance in Sequencing and sampling information batch submissions (e.g. 3) Sequence Read which which species, Archive - Raw data (Next Gen sequencing) technique, ...) TIER CLASS TYPE ENA structure
  • 40. SRA contains batch submitted records of which experiment information is of most importance Since the sequences are barely (not) annotated, is experiment description important: which machine, which organism, which tissue, which developmental stage, disease, treatment, …
  • 41. How to get sequences into the db, and back out Submit Retrieve Always submit your sequence data (mostly One or few sequences obliged by journals) and include your ACC number in articles (not any other number). → Use one of the numerous webbased tools GenBank: Entrez EMBL: EB-eye MRS: developed for easy Sequin (GenBank retrieval stand alone) retrieve Many sequences (Batch Bankit (GenBank submit web tool) retrieval) Webin (EMBL → use ftp (file transfer protocol) online submission) → use perl (flexible pro- gramming language) → BioMart http://www.biomart.org/
  • 42. Example of a primary NA sequence record (ENA) http://www.ebi.ac.uk/ena/about/formats
  • 43. Example of a primary NA sequence record (ENA) Text format Code usable for Data linked to that searching code http://www.ebi.ac.uk/ena/about/formats
  • 44. Primary sequence data contains a lot of redundancy! Chromosome sequence Several gene sequences from different labs EST sequences from transcripts cDNA sequence Al match to the same gene. Often you end up in your database search with all these sequences... A lot of redundancy!
  • 45. The primary sequences are the basis for analyses that generate derived sequence data Scientists/Consortia → primary databases – Source for further analyses. Which? • Create protein sequences • Curate the sequence database • Assemble genomes • Searching similarities • Aggregate information about one gene • … Results stored in derived databases
  • 46. Protein databases come in two kinds
  • 47. The most important protein db is UniProt and contains 'automatic' and manual entries UniProt Knowledge Base - 'the best annotated protein database of the world' http://www.uniprot.org/
  • 48. The most important protein db is UniProt and contains 'automatic' and manual entries
  • 49. Refseq - The NCBI way to reduce redundancy in primary sequence data RefSeq is NCBI 'Reference Sequences' (prot and nuc) Redundancy from primary sequence data is reduced both automatically and by manual annotation of NA and protein sequences. 'one natural biological molecule = one entry'. Links back to the original primary sequences. Hugely popular and a basis for a lot of analyses. Click to apply refseq filter in entrez search http://www.ncbi.nlm.nih.gov/RefSeq/
  • 50. RefSeq has its own identifiers, not to be mixed up with accession numbers Refseq entry codes looks similar as ACC numbers (but are not ACC numbers – underscore!); and RefSeq is also in GenBank format. Note: in 'Features' section one can find the raw sequences from what is was derived. (typical mistake: search with refseq code in uniprot) NC_* (curated) complete genomic element (chromosome, plasmid,...) NT_* (automated) intermediate assembly from BAC NZ_* (automated) incomplete genomic sequence from WGS NW_* (automated) intermediate assembly from WGS NG_* (curated) incomplete genomic element corresponding to gene NM_* (curated) mRNA NR_* (curated) non-coding RNA or predicted transcript of pseudogene NP_* (curated) protein ZP_* (automated) protein predicted from WGS sequence (NZ_*) YP_* (curated) other predicted protein sequences from NCBI Genome Annotation Pipeline XM_* (automated) mRNA XR_* (automated) non-coding RNA or predicted transcript of pseudogene XP_* (automated) protein http://www.ncbi.nlm.nih.gov/RefSeq/key.html http://www.ncbi.nlm.nih.gov/RefSeq/
  • 51. UniRef – UniProt redundancy reducing system for proteins sequences Non redundant protein sequences from UniProt ~ refseq Hiding redundant sequences by clustering them • UniRef100 = complete identical sequences • UniRef90 = 90% identical sequences • UniRef50 = 50% identical sequences See http://www.uniprot.org/help/uniref
  • 52. NCBI's Gene – summarizes gene information including sequence information from primary dbs Example of the gene NPR1 from A. thaliana
  • 53. UniGene – summarizes transcriptomic information around genes
  • 54. And a lot more derived databases with sequence information exist Repbase : repeats (Alu, …), maintained by Jerzy Jurka at the Genetic Information Research Institute (Mountain View CA, USA). CENSOR server allows to "clean" sequences. http://www.girinst.org/repbase MiRBase → published miRNA sequences http://www.mirbase.org/ Eukaryotic promoter database http://www.epd.isb-sib.ch/ UniVec GenBank subset + some sequences from commercial sources - ftp://ftp.ncbi.nih.gov/pub/UniVec/
  • 55. The most important sequence databases overview Integrated Prim seq data Search Derive Curat d ed Portals GB GenPept RefSeq Entrez ENA trEMBL ENA search EB-eye DDBJ UNIPROT SwissProt UniProt
  • 56. Common gene annotations on sequences Genome sequence: e.g. Chr6 Enhancers/promotors terminator Intron Gene sequence exon mRNA AAAAAAAAAAAAA 5'UTR CDS 3'UTR poly(A) tail protein Genetic code tables
  • 57. Searching the database for your gene of interest First you have to determine for yourself which information you want - NA sequences vs. protein sequences - If NA, genomic sequences, or RNA derived - All possible sequences that exists, or curated ones - Protein sequences of which quality - ...
  • 58. Entrez is a starting point for searches at NCBI http://www.ncbi.nlm.nih.gov/sites/gquery
  • 59. Visualising the db_xrefs in records at NCBI
  • 60. ENA has its text-search portal http://www.ebi.ac.uk/ena/
  • 61. Results from an ENA search are organised following the ENA database structure
  • 62. UniProt has a simple search box leading to a sophisticated search results page
  • 63. Complex searches can be achieved by using the index codes in the database e.g. “oc=Primates and de=complete and de=cds and de=MHC” Code usable for Could answer: give me searching all coding sequence of MHC available in primates.
  • 64. Meta-search tools can search different sequence databases at once. MRS Open Source, developed by Maarten Hekkelman at Radboud U. (Nijmegen, the Netherlands). Allows searching in different databases at once, and provides also statistics on the databases. Alternatives: ACNUC, SRS
  • 65. Logical operators Searching involves making combinations of conditions. Here the difference between a logic and, or and not explained by venn diagrams. Q1 AND Q2 & Q1 NOT Q2 ! Q1 OR Q2 |
  • 66. Hands-on! Every module ends with an exercise session. We will now explore how data is stored in different sequence databases. You get …. minutes for this exercise. Afterwards, we summarizes some of the difficulties some of you might have experienced.
  • 67. Summary This course is organised in several modules Module 1: Sequence databases Three major nucleotide databanks host primary sequence data These databases are filled with NA sequence information by scientists and consortia The batch submissions originate mostly from sequencing centers Each primary database stores their sequences and batch submissions in their own way... Batch submissions are marked and/or stored differently than single submissions The 'normal' submissions are a minority in primary sequence databases Primary sequence dbs are synchronised and every sequence receives a unique identifier One sequence entry contains three categories of different types of information This sequence information can be written in different formats Degree of annotation differs between entries SRA contains batch submitted records of which experiment information is of most importance How to get sequences into the db, and back out Primary sequence data contains a lot of redundancy! The primary sequences are the basis for analyses that generate derived sequence data Protein databases come in two kinds The most important protein db is UniProt and contains 'automatic' and manual entries Refseq - The NCBI way to reduce redundancy in primary sequence data RefSeq has its own identifiers, not to be mixed up with accession numbers UniRef – UniProt redundancy reducing system for proteins sequences NCBI's Gene – summarizes gene information including sequence information from primary dbs UniGene – summarizes transcriptomic information around genes And a lot more derived databases with sequence information exist Searching the database for your gene of interest Entrez is a starting point for searches at NCBI Visualising the db_xrefs in records at NCBI ENA has its text-search portal Results from an ENA search are organised following the ENA database structure UniProt has a simple search box leading to a sophisticated search results page Complex searches can be achieved by using the index codes in the database Meta-search tools can search different sequence databases at once. Hands-on!