SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Welcome to BIOINFORMATICS
                   -MiRON
Outline
   Workshops chronology on hands out
   Brief background information
   Applications & role
   Bioinformatics tools
   Practical classes
   Problem solving exercises
   What’s expected of you ?
   Questions/comments are welcome at all
    points
Aims
   To introduce the concepts and language of
    bioinformatics.
   To provide an understanding of how nucleic acid
    and protein sequence data is obtained and
    analysed.
   To develop skills in utilising online databases and
    interpreting data.
   To develop an understanding of how bioinformatics
    can be applied to solve specific problems in
    biomedical science.
   To develop transferable IT and communications
    skills.
In this workshop…..
   You will learn about how data is
    generated and analysed
   As well as what the generated data can
    tell us about the molecular biology of
    organisms
   And various practical applications of
    this knowledge
What is bioinformatics?
Why bioinformatics?
   Over the past decade massive amounts
    of sequence data have been generated
   This has more recently been joined by
    gene expression data obtained from
    microarrays and proteomic technologies
   This vast amount of data can only be
    analysed using various specialised
    computer algorithms
Main Topics (Review............)
   Genome organisation and analysis
   Functional genomics
   Advanced techniques in molecular biology
   Archives, information retrieval and alignments:
   Nucleic acid sequence databases; genome
    databases; protein sequence databases; database
    searching
   Dot plots (SIMILARITY MATRX) and sequence
    alignments (PSI BLAST);
   Genome expression: Microarray analysis,
    proteomics, eukaryotic genome expression
What bioinformatcian think
they are
What they do
Examples of Bioinformatics
    Database interfaces
        Genbank/EMBL/DDBJ, Medline, SwissProt, PDB,
         …
    Sequence alignment
        BLAST, FASTA
    Multiple sequence alignment
        Clustal W, MultAlin, DiAlign
    Gene finding
        Genscan, GenomeScan, GeneMark, GRAIL
    Protein Domain analysis and identification
        pfam, BLOCKS, ProDom,
    Pattern Identification/Characterization
        Gibbs Sampler, AlignACE, MEME
    Protein Folding prediction
        PredictProtein, SwissModeler
Five W that all biologists
    should know
   NCBI (The National Center for Biotechnology Information;
       http://www.ncbi.nlm.nih.gov/
   EBI (The European Bioinformatics Institute)
       http://www.ebi.ac.uk/
   The Canadian Bioinformatics Resource
       http://www.cbr.nrc.ca/
   SwissProt/ExPASy (Swiss Bioinformatics Resource)
       http://expasy.cbr.nrc.ca/sprot/
   PDB (The Protein Databank)
       http://www.rcsb.org/PDB/
Remember while using web
    server-based tools

   You are using someone else’s
    computer
   You are (probably) getting a reduced
    set of options or capacity
   Servers are great for sporadic or proof-
    of-principle work, but for intensive work,
    the software should be obtained and
    run locally
Human Gene Index Database
   HGI is a database of expressed DNA
    sequences, mostly made of ESTs, which are
    a type of partial cDNA
   EST stands for Expressed Sequence Tag
   These short sequences were created using
    essentially the same method used to make
    cDNAs
   As such they represent the expressed part of
    a genome and are made from mRNA which is
    ultimately expressed from GENES
Gene Structure
Similarity Searching
   There are a variety of computer
    programs that are used for making
    comparisons between DNA sequences.
   The most popular is known as BLAST
    (Basic Local Alignment Search Tool)
   BLAST is free at the NCBI website
BLAST is Complex
   Similarity searching relies on the concepts of
    alignment and distance between pairs of
    sequences.
   Distances can only be measured between
    aligned sequences (match vs. mismatch at
    each position).
   A similarity search is a process of testing the
    best alignment of a query sequence with
    every sequence in a database.
Workshop -1 (database search & inference of possible
     homology)

     Please refer to getting started with bioinformatics




    INTRO TO BLAST
   Basic Local Alignment Search Tool
   It is used to compare a query sequence with those contained in
    nucleotide databases by aligning the query sequence with
    previously characterised genes, therefore helping in identifying
    genes.
   The emphasis of this tool is to find regions of sequence
    similarity between two different genes.
   These sequence alignments can yield clues about the structure
    and function of a novel sequence, and about its evolutionary
    history and homology with other sequences in the database.
BLAST has Automatic
Translation
   BLASTX makes automatic translation (in all
    6 reading frames) of your DNA query
    sequence to compare with protein
    databanks
   TBLASTN makes automatic translation of
    an entire DNA database to compare with
    your protein query sequence
   Only make a DNA-DNA search if you are
    working with a sequence that does not code
    for protein.
A typical sequence ready for
        submission to BLAST
>THC2465887
GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGCATGATCGCCGCGCAGCTCCTGGCC
TATTACTTCACGGAGCTGAAGGATGACCAGGTCAAAAAGATTGACAAGTATCTCTATGCCATGCGGCTCTCCGATGAAAC
TCTCATAGATATCATGACTCGCTTCAGGAAGGAGATGAAGAATGGCCTCTCCCGGGATTTTAATCCAACAGCCACAGTCA
AGATGTTGCCAACATTCGTAAGGTCCATTCCTGATGGCTCTGAAAAGGGAGATTTCATTGCCCTGGATCTTGGTGGGTCT
TCCTTTCGAATTCTGCGGGTGCAAGTGAATCATGAGAAAAACCAGAATGTTCACATGGAGTCCGAGGTTTATGACACCCC
AGAGAACATCGTGCACGGCAGTGGAAGCCAGCTTTTTGATCATGTTGCTGAGTGCCTGGGAGATTTCATGGAGAAAAGGA
AGATCAAGGACAAGAAGTTACCTGTGGGATTCACGTTTTCTTTTCCTTGCCAACAATCCAAAATAGATGAGGCCATCCTG
ATCACCTGGACAAAGCGATTTAAAGCGAGCGGAGTGGAAGGAGCAGATGTGGTCAAACTGCTTAACAAAGCCATCAAAAA
GCGAGGGGACTATGATGCCAACATCGTAGCTGTGGTGAA
BLAST OUTPUT
BLAST line-up of human v canine partial cDNAs for
hexokinase 1


  Query:   3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAAGTGTAGTGGCATCCA 3086
                |||||| | |||||| ||||||||    |   ||| ||||||||||| |||||||| |||
  Sbjct:     75 TGCATGATCTGATTTCAACCTGGTCGTACGCTCCCCACGTGTGAAGTTTAGTGGCACCCA 134

  Query:   3087 TTTCTAATGTATGCATTCATCCAACAGAGTTATTTATTGGCTGGAGATGGAAAATCACAC 3146
                |||| | | | ||||||| || ||||||||||||||||||    ||||| ||| |||| |
  Sbjct:    135 TTTCCAGTCTCTGCATTCGTCTGACAGAGTTATTTATTGGCCCAAGATGAAAAGTCACGC 194

  Query:   3147 CACCTGACAGGCCTTCTGGG-CCTCCAAAGCCCATCCTTGGGGTTCCCCCTCCCTGTGTG 3205
                || | | |||||||| |||| ||||   ||||| |||||||||   | | |||||||||
  Sbjct:    195 CATCCGCCAGGCCTTATGGGGCCTCTGCAGCCCGTCCTTGGGGACACATC-CCCTGTGTG 253

  Query:   3206 AAATGTATTATCACCAGCAGACACTGCCGGGCCTCC-C-TCCCGGGGGCACTGCCTGAAG 3263
                ||||||||||||||||||||||||||||||| |||| | |||| |||||| | | |
  Sbjct:    254 AAATGTATTATCACCAGCAGACACTGCCGGGACTCCTCCTCCCAGGGGCA-T-CTTAGCT 311

  Query:   3264 GCGAG-TGTGGGCATAGCATTAGCTGCTTCCTCCCCTCCTG-GCA-CCCACTGTGGCC-T 3319
                ||    |   | | ||||     ||||| || | ||| | | | |||| | || | |
  Sbjct:    312 GCTTCCTCCCGTCCCAGCACCCACTGCTGTCTGGCGTCCCGAGGATCCCA-TCAGGACGT 370

  Query:   3320 GGC-ATCGCATCGTGGTGTGTCAATGCCACAAAATCGTGTGTCCGTGGAACCAGTCCTAG 3378
                | | || || | | ||||      | ||    || | || ||| | | ||    || |
  Sbjct:    371 GTCCATGCCACTGAGTCGTGTG--T-CCGTGGAA-C-TG-GTCAGAGCCACT--TCGTGA 422

  Query:   3379 CCGCGTGTGACAGTCTTGCATTCTGTTTGTCTCGTGGGGGGAGGTGGACAG-TCCTGCGG 3437
                | | | || || ||| | ||| | | | | ||                || ||||| ||
  Sbjct:    423 CAGTCT-TG-CATTCTGTCTGTCT--TGGGGTGGNNGGNAAGNNNNNCCANNTCCTGTGG 478

  Query:   3438 -AAAT--GTGTCTTGTCTCCATTTGGA-TAAAA-GGAA-CCAA--CCAACAAACAATGCC 3489
                 |||   | | |||| |||||||||| ||||| |||| |||| ||||||| || ||||
  Sbjct:    479 GAAAAAGGGGCCTTGGCTCCATTTGGGGTAAAAAGGAAACCAAACCCAACAA-CAGTGCC 537

  Query:   3490 A-TCACTGG-AATTTCCC-ACCG-CTTT--GTGAGCCGTG-TCGTATGA-CCTAGTAAAC 3541
                  ||| ||| |||| ||| | | |||| ||||||| || | |||||| ||||| ||
  Sbjct:    538 CCTCATTGGGAATTCCCCCATTGGCTTTTTGTGAGCCATGGTTGTATGAACCTAGGTAAA 597

  Query:   3542 TTTGT 3546
                 || |
  Sbjct:    598 CTTNT 602
Understand the
Statistics!
   BLAST produces an E-value for every match
       This is the same as the P value in a statistical test
   A match is generally considered significant if the
    E-value < 0.05 (smaller numbers are more significant)
   Very low E-values (e-100) are homologs or
    identical genes
   Moderate E-values are related genes
   Long regions of moderate similarity are more
    important than short regions of high identity.
BLAST is Approximate
   BLAST makes similarity searches very
    quickly because it takes shortcuts.
       looks for short, nearly identical “words” (11 bases)

   It also makes errors
       misses some important similarities
       makes many incorrect matches
            easily fooled by repeats or skewed composition
Bad Genome
Annotation
   Gene finding is at best only 90%
    accurate.
   New sequences are automatically
    annotated with BLAST scores.
   Bad annotations propagate
   Its going to take us 10-20 years or more
    to sort this mess out!
Conclusions
   We have only touched small parts of
    the elephant
   Trial and error (intelligently) is often
    your best tool
   Keep up with the main five sites, and
    you’ll have a pretty good idea of what is
    happening and available

Weitere ähnliche Inhalte

Was ist angesagt?

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Databricks
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]
BiotechOnline
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 

Was ist angesagt? (20)

blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
BLAST
BLASTBLAST
BLAST
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
BLAST
BLASTBLAST
BLAST
 
BLAST
BLASTBLAST
BLAST
 
Harvester I
Harvester IHarvester I
Harvester I
 
Harvester Ii
Harvester IiHarvester Ii
Harvester Ii
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/Hadoop
 
Myers CV_2015
Myers CV_2015Myers CV_2015
Myers CV_2015
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Arraygen_Brochure
Arraygen_BrochureArraygen_Brochure
Arraygen_Brochure
 
31931 31941
31931 3194131931 31941
31931 31941
 

Ähnlich wie Bioinformatics MiRON

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
Abhik Seal
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
Shruthi Choudary
 

Ähnlich wie Bioinformatics MiRON (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Article
ArticleArticle
Article
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Blasta
BlastaBlasta
Blasta
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Ncbi
NcbiNcbi
Ncbi
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 

Kürzlich hochgeladen

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

Bioinformatics MiRON

  • 2. Outline  Workshops chronology on hands out  Brief background information  Applications & role  Bioinformatics tools  Practical classes  Problem solving exercises  What’s expected of you ?  Questions/comments are welcome at all points
  • 3. Aims  To introduce the concepts and language of bioinformatics.  To provide an understanding of how nucleic acid and protein sequence data is obtained and analysed.  To develop skills in utilising online databases and interpreting data.  To develop an understanding of how bioinformatics can be applied to solve specific problems in biomedical science.  To develop transferable IT and communications skills.
  • 4. In this workshop…..  You will learn about how data is generated and analysed  As well as what the generated data can tell us about the molecular biology of organisms  And various practical applications of this knowledge
  • 6. Why bioinformatics?  Over the past decade massive amounts of sequence data have been generated  This has more recently been joined by gene expression data obtained from microarrays and proteomic technologies  This vast amount of data can only be analysed using various specialised computer algorithms
  • 7. Main Topics (Review............)  Genome organisation and analysis  Functional genomics  Advanced techniques in molecular biology  Archives, information retrieval and alignments:  Nucleic acid sequence databases; genome databases; protein sequence databases; database searching  Dot plots (SIMILARITY MATRX) and sequence alignments (PSI BLAST);  Genome expression: Microarray analysis, proteomics, eukaryotic genome expression
  • 10. Examples of Bioinformatics  Database interfaces  Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, …  Sequence alignment  BLAST, FASTA  Multiple sequence alignment  Clustal W, MultAlin, DiAlign  Gene finding  Genscan, GenomeScan, GeneMark, GRAIL  Protein Domain analysis and identification  pfam, BLOCKS, ProDom,  Pattern Identification/Characterization  Gibbs Sampler, AlignACE, MEME  Protein Folding prediction  PredictProtein, SwissModeler
  • 11. Five W that all biologists should know  NCBI (The National Center for Biotechnology Information;  http://www.ncbi.nlm.nih.gov/  EBI (The European Bioinformatics Institute)  http://www.ebi.ac.uk/  The Canadian Bioinformatics Resource  http://www.cbr.nrc.ca/  SwissProt/ExPASy (Swiss Bioinformatics Resource)  http://expasy.cbr.nrc.ca/sprot/  PDB (The Protein Databank)  http://www.rcsb.org/PDB/
  • 12. Remember while using web server-based tools  You are using someone else’s computer  You are (probably) getting a reduced set of options or capacity  Servers are great for sporadic or proof- of-principle work, but for intensive work, the software should be obtained and run locally
  • 13. Human Gene Index Database  HGI is a database of expressed DNA sequences, mostly made of ESTs, which are a type of partial cDNA  EST stands for Expressed Sequence Tag  These short sequences were created using essentially the same method used to make cDNAs  As such they represent the expressed part of a genome and are made from mRNA which is ultimately expressed from GENES
  • 14.
  • 16. Similarity Searching  There are a variety of computer programs that are used for making comparisons between DNA sequences.  The most popular is known as BLAST (Basic Local Alignment Search Tool)  BLAST is free at the NCBI website
  • 17. BLAST is Complex  Similarity searching relies on the concepts of alignment and distance between pairs of sequences.  Distances can only be measured between aligned sequences (match vs. mismatch at each position).  A similarity search is a process of testing the best alignment of a query sequence with every sequence in a database.
  • 18. Workshop -1 (database search & inference of possible homology) Please refer to getting started with bioinformatics INTRO TO BLAST  Basic Local Alignment Search Tool  It is used to compare a query sequence with those contained in nucleotide databases by aligning the query sequence with previously characterised genes, therefore helping in identifying genes.  The emphasis of this tool is to find regions of sequence similarity between two different genes.  These sequence alignments can yield clues about the structure and function of a novel sequence, and about its evolutionary history and homology with other sequences in the database.
  • 19. BLAST has Automatic Translation  BLASTX makes automatic translation (in all 6 reading frames) of your DNA query sequence to compare with protein databanks  TBLASTN makes automatic translation of an entire DNA database to compare with your protein query sequence  Only make a DNA-DNA search if you are working with a sequence that does not code for protein.
  • 20. A typical sequence ready for submission to BLAST >THC2465887 GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGCATGATCGCCGCGCAGCTCCTGGCC TATTACTTCACGGAGCTGAAGGATGACCAGGTCAAAAAGATTGACAAGTATCTCTATGCCATGCGGCTCTCCGATGAAAC TCTCATAGATATCATGACTCGCTTCAGGAAGGAGATGAAGAATGGCCTCTCCCGGGATTTTAATCCAACAGCCACAGTCA AGATGTTGCCAACATTCGTAAGGTCCATTCCTGATGGCTCTGAAAAGGGAGATTTCATTGCCCTGGATCTTGGTGGGTCT TCCTTTCGAATTCTGCGGGTGCAAGTGAATCATGAGAAAAACCAGAATGTTCACATGGAGTCCGAGGTTTATGACACCCC AGAGAACATCGTGCACGGCAGTGGAAGCCAGCTTTTTGATCATGTTGCTGAGTGCCTGGGAGATTTCATGGAGAAAAGGA AGATCAAGGACAAGAAGTTACCTGTGGGATTCACGTTTTCTTTTCCTTGCCAACAATCCAAAATAGATGAGGCCATCCTG ATCACCTGGACAAAGCGATTTAAAGCGAGCGGAGTGGAAGGAGCAGATGTGGTCAAACTGCTTAACAAAGCCATCAAAAA GCGAGGGGACTATGATGCCAACATCGTAGCTGTGGTGAA
  • 22. BLAST line-up of human v canine partial cDNAs for hexokinase 1 Query: 3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAAGTGTAGTGGCATCCA 3086 |||||| | |||||| |||||||| | ||| ||||||||||| |||||||| ||| Sbjct: 75 TGCATGATCTGATTTCAACCTGGTCGTACGCTCCCCACGTGTGAAGTTTAGTGGCACCCA 134 Query: 3087 TTTCTAATGTATGCATTCATCCAACAGAGTTATTTATTGGCTGGAGATGGAAAATCACAC 3146 |||| | | | ||||||| || |||||||||||||||||| ||||| ||| |||| | Sbjct: 135 TTTCCAGTCTCTGCATTCGTCTGACAGAGTTATTTATTGGCCCAAGATGAAAAGTCACGC 194 Query: 3147 CACCTGACAGGCCTTCTGGG-CCTCCAAAGCCCATCCTTGGGGTTCCCCCTCCCTGTGTG 3205 || | | |||||||| |||| |||| ||||| ||||||||| | | ||||||||| Sbjct: 195 CATCCGCCAGGCCTTATGGGGCCTCTGCAGCCCGTCCTTGGGGACACATC-CCCTGTGTG 253 Query: 3206 AAATGTATTATCACCAGCAGACACTGCCGGGCCTCC-C-TCCCGGGGGCACTGCCTGAAG 3263 ||||||||||||||||||||||||||||||| |||| | |||| |||||| | | | Sbjct: 254 AAATGTATTATCACCAGCAGACACTGCCGGGACTCCTCCTCCCAGGGGCA-T-CTTAGCT 311 Query: 3264 GCGAG-TGTGGGCATAGCATTAGCTGCTTCCTCCCCTCCTG-GCA-CCCACTGTGGCC-T 3319 || | | | |||| ||||| || | ||| | | | |||| | || | | Sbjct: 312 GCTTCCTCCCGTCCCAGCACCCACTGCTGTCTGGCGTCCCGAGGATCCCA-TCAGGACGT 370 Query: 3320 GGC-ATCGCATCGTGGTGTGTCAATGCCACAAAATCGTGTGTCCGTGGAACCAGTCCTAG 3378 | | || || | | |||| | || || | || ||| | | || || | Sbjct: 371 GTCCATGCCACTGAGTCGTGTG--T-CCGTGGAA-C-TG-GTCAGAGCCACT--TCGTGA 422 Query: 3379 CCGCGTGTGACAGTCTTGCATTCTGTTTGTCTCGTGGGGGGAGGTGGACAG-TCCTGCGG 3437 | | | || || ||| | ||| | | | | || || ||||| || Sbjct: 423 CAGTCT-TG-CATTCTGTCTGTCT--TGGGGTGGNNGGNAAGNNNNNCCANNTCCTGTGG 478 Query: 3438 -AAAT--GTGTCTTGTCTCCATTTGGA-TAAAA-GGAA-CCAA--CCAACAAACAATGCC 3489 ||| | | |||| |||||||||| ||||| |||| |||| ||||||| || |||| Sbjct: 479 GAAAAAGGGGCCTTGGCTCCATTTGGGGTAAAAAGGAAACCAAACCCAACAA-CAGTGCC 537 Query: 3490 A-TCACTGG-AATTTCCC-ACCG-CTTT--GTGAGCCGTG-TCGTATGA-CCTAGTAAAC 3541 ||| ||| |||| ||| | | |||| ||||||| || | |||||| ||||| || Sbjct: 538 CCTCATTGGGAATTCCCCCATTGGCTTTTTGTGAGCCATGGTTGTATGAACCTAGGTAAA 597 Query: 3542 TTTGT 3546 || | Sbjct: 598 CTTNT 602
  • 23. Understand the Statistics!  BLAST produces an E-value for every match  This is the same as the P value in a statistical test  A match is generally considered significant if the E-value < 0.05 (smaller numbers are more significant)  Very low E-values (e-100) are homologs or identical genes  Moderate E-values are related genes  Long regions of moderate similarity are more important than short regions of high identity.
  • 24. BLAST is Approximate  BLAST makes similarity searches very quickly because it takes shortcuts.  looks for short, nearly identical “words” (11 bases)  It also makes errors  misses some important similarities  makes many incorrect matches  easily fooled by repeats or skewed composition
  • 25. Bad Genome Annotation  Gene finding is at best only 90% accurate.  New sequences are automatically annotated with BLAST scores.  Bad annotations propagate  Its going to take us 10-20 years or more to sort this mess out!
  • 26. Conclusions  We have only touched small parts of the elephant  Trial and error (intelligently) is often your best tool  Keep up with the main five sites, and you’ll have a pretty good idea of what is happening and available

Hinweis der Redaktion

  1. 25
  2. 28
  3. 30
  4. 31