SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Bio153 Microbial Genomics

               Professor Mark Pallen
            University of Birmingham
Microbial Genomics
   General features of microbial genomes
   Historical overview
   Genome sequencing, annotation and analysis
   Genome evolution
   What we can learn from a genome sequence?
General features of genomes
Microbial                               Human
 Small WSIWYG genomes                     Very large genomes
  (Mbp)                                     (Gbp)
 Gene density high (>90%)
       intergenic regions short
                                           Gene density low
       very little repetitiveor non-          Only 25% is genes
        coding DNA                             Introns mean only1%
       Introns very rare                       codes
   Protein-coding genes                   Genes can span ≥30
    (CDS) short (~1kbp)
                                            kbp
   Operons with promoters
    just upstream                          Genes have ~3
   Fewer non-coding RNAs                   transcripts
                                               Splicing and splice
                                                variants
Bacterial genome organisation

Chromosomes                          Plasmids
   Most commonly single                Independent autonomous
                                         replicon, can be circular or
    circular chromosome                  linear
    (always DNA)                        may integrate into chromosome
       BUT many species have           copy number varies 1 to 10s
        linear chromosome(s) (e.g.      often carry non-essential genes
        Borrelia, Streptomyces, Rh       that confer an adaptive
        odoccus)                         advantage in certain conditions
       BUT a few species with two
        chromosomes (e.g.
        Vibriocholerae)
   Can be mix of circular and
    linear (e.g.
    Agrobacteriumtumefacien
    s, B. burgdoferi)
Bacterial Genome Size
   species which occupy restricted ecological
    niches, (e.g. obligate intracellular parasites and
    endosymbionts) tend to have smaller genomes
    (<1.5 Mb) than generalist bacteria
       smallest known bacterial genome:
        Carsonellaruddii, 160 kb! (Nakabachi et al. 2006)
       BUT mitochondrial genomes are smaller
   largest genomes found in bacteria with complex
    developmental cycles, e.g. Streptomyces
       largest bacterial genome: Sorangiumcellulosum, 13
        Mb
Bacterial genomes are made from DNA
   In 1944, Oswald Avery, Colin MacLeod, and Maclyn
    McCarty showed that DNA (not proteins) was the genetic
    material responsible for inheritance.
       Identified DNA as the "transforming principle" while studying
        Streptococcus pneumoniae
       Avery, Oswald T., Colin M. MacLeod, and Maclyn McCarty.
        Studies on the chemical nature of the substance inducing
        transformation of pneumococcal types. Journal of Experimental
        Medicine. 1944 Feb 1; 79(2): 137-158.
   In 1952, this work was supported by Alfred Hershey and
    Martha Chase who showed that only the DNA of a virus
    needs to enter a bacterium to infect it.
       Used radioactively labelled bacteriophage
       Hershey AD and Chase M. Independent functions of viral
        protein and nucleic acid in growth of bacteriophage. Journal of
        General Physiology. 1952. 36: 39-56.
Viral genomes are variable
   Use RNA or DNA but not
    both in genome
       Some have RNA genomes!
   Grouped into families
    depending on
       type of genome: DNA or
        RNA, single- or double-
        stranded
       Typically dozens of genes
        or fewer
       Large genomes in pox
        viruses (~200 kb)
       Massive genomes in
        megaviruses (1Mbp!)
Microbial Genomics Timeline

Year   Milestone
1977   Invention of dideoxy chain terminator sequencing (“Sanger sequencing”)
1979   Sequencing of the 5.3-kilobase genome of bacteriophage phiX174
1981   First human mitochondrial genome sequence*
1982   Determination of the 48.5-kilobase genome sequence of bacteriophage lambda through first use
       of shotgun sequencing
1986   Development of automated fluorescent sequencing
1995   First complete genome sequences obtained of free-living bacteria (Haemophilus influenzae and
       Mycoplasma genitalium)
1996   Mycoplasma becomes first bacterial genus that has completely sequenced genomes from two
       different species (M. genitalium and M. pneumoniae)
1997   First genome sequences from Escherichia coli and Bacillus subtilis
1998   First genome sequence from Mycobacterium tuberculosis; genome sequence from
       Rickettsiaprowazekii provides first evidence of reductive evolution
Microbial Genomics Timeline
Year    Milestone
1999    Helicobacter pylori becomes the first species with completely sequenced genomes from two
        isolates
2000    Meningococcal genome sequence primes first application of reverse vaccinology
2001    Second E. coli genome sequences reveal unexpected level of horizontal gene transfer;
        genome sequence of M. leprae provides compelling evidence of bacterial pseudogenes and
        reductive evolution; first paper reporting genome sequences of two strains from one species
        (Staphylococcus aureus) in a single publication.
2002    Genome sequencing of multiple strains of Bacillus anthracis to provide markers for forensic
        epidemiology
2003    Genome sequencing of uncultivable Tropherymawhippleileads to design of axenic growth
        medium
2004    Genome sequence of mimivirus blurs distinctions between bacteria and viruses
2005    Use of whole-genome sequencing used to identify target of new anti-tuberculosis drug
        Mycoplasma genitalium genome sequenced using pyrosequencing
2006-   Bacterial metagenomics survey of the Sargasso sea yields >1 million new genes
2011    Rise of next-generation or high-throughput sequencing
The first genome sequences
   The first sequenced gene was from bacteriophage MS2
       The gene encoding the coat protein
       1972
       Min Jou W, Haegeman G, Ysebaert M, and Fiers W. Nucleotide
        sequence of the gene coding for the bacteriophage MS2 coat
        protein. Nature. 1972 May 12; 237(5350): 82-88.
   The first sequenced genome was bacteriophage MS2
       1976
       RNA genome is 3,569 nucleotides
       Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant
        D, Merregaert J, Min Jou W, Molemans F, Raeymaekers A, Van
        den Berghe A, Volckaert G, and Ysebaert M. Complete
        nucleotide sequence of bacteriophage MS2 RNA: primary and
        secondary structure of the replicase gene. Nature. 1976 Apr 8;
        260(5551): 500-507.
The first genome sequences
   The first sequenced DNA genome was bacteriophage Φ-
    X174
       1977
       5368 base pairs
       Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes
        CA, Hutchison CA, Slocombe PM, and Smith M. Nucleotide
        sequence of bacteriophage phi X174 DNA. Nature. 1977 265
        (5596): 687-695.
   The first sequenced bacterial genome was Haemophilus
    influenzae
       1995
       1,830,140 base pairs
       Fleischmann R, Adams M, White O, Clayton R, Kirkness
        E, Kerlavage A, Bult C, Tomb J, Dougherty B, and Merrick J.
        Whole-genome random sequencing and assembly of
        Haemophilus influenzae Rd. Science, 1995. 269 (5223): 496-
        512.
Overview of a genome project
   Choose strain                       Closure and finishing
       Fresh isolate or tractable          Manually intensive
        lab strain?                         Difficulty depends on
   Choose strategy                          how repetitive
       Shotgun sequencing              Data Release
       Paired-end sequencing               Immediate or delayed?
       Draft or complete?              Annotation
   Choose chemistry                        Manually intensive bottle
       Sanger; 454; Illumina;               neck
        Ion Torrent                     Publication
   Assembly
       Automated
Methods for genome sequencing – historic
Sanger method sequencing
   Sanger F and Coulson AR. A rapid method for
    determining sequences in DNA by primed synthesis
    with DNA polymerase. Journal of Molecular Biology.
    1975 94: 441-448.
   Step 1, a sequence-specific DNA primer is radiolabeled
   Step 2, the primer is annealed to the template DNA
   Step 3, the primer is extended by DNA polymerase
       Incorporation of a deoxynucleotide - further extension possible
       Incorporation of a dideoxynucleotide – chain termination
   Four reactions set up
       ddATP, dATP, dCTP, dGTP, dTTP
       ddCTP, dATP, dCTP, dGTP, dTTP
       ddGTP, dATP, dCTP, dGTP, dTTP
       ddTTP, dATP, dCTP, dGTP, dTTP
Methods for genome sequencing – historic
Sanger method sequencing
Methods for genome sequencing –
automated Sanger sequencing
   Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C,
    Kent SBH, and Hood LE. Fluorescence detection in automated DNA
    sequence analysis. Nature. 1986 321: 674-679.
   Replaced radioisotopes with fluorescent dyes
       Safer for the researchers
       Each of the four DNA bases could be dyed a different colour
       Eliminated the need to run separate reactions in separate lanes
       The migration of the dye could be read because of the fluorescence
       This information allowed automatic gel reading
   Further improvements were made
       Improved dye chemistry using fluorescent dideoxy-terminators (DuPont): Prober
        JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ,
        Jensen MA, and Baumeister K. A system for rapid DNA sequencing with
        fluorescent chain-terminating dideoxynucleotides. Science 238: 336-341.
       Replacing slab gels with re-useable capillary tubes: Ruiz-Martinez MC, Berka J,
        Belenkii A, Foret F, Miller AW, and Karger BL. DNA sequencing by capillary
        electrophoresis with replaceable linear polyacrylamide and laser-induced
        fluorescence detection. Analytical Chemistry 1993 65: 2851-2858.
Whole-Genome Shotgun Sanger Sequencing
                            Random shearing
  bacterial
chromosome

                                                  Size selection



        plasmid vector
                                              Pick colonies to create shotgun
                            Cloning                       library




     Sequence each insert
       with two primers                            Plasmid preps
High-throughput Sequencing
   100x faster, 100x cheaper!
       A disruptive technology
   Several technologies in the marketplace from 2007
    onwards
       454 (Roche)
       Illumina
       Ion Torrent
       PacBio
   Fundamentally new approaches
       Solid-phase amplification of clonal templates in “molecular
        colonies”
           Massive increase in number of “clones” compensates for shorter
            read length
       New chemistries for sequence reading
         454: pyrophosphate detection on base addition
         Illumina: reversible de-protection of fluorescent bases
High-Throughput Shotgun Sequencing
                 Random shearing
  bacterial
chromosome

                                    Size selection




      Sequence      Amplify        Add adapters
454 sequencing


 Emulsion-based clonal amplification




Anneal sstDNA to                              Clonal amplification          Break
                   Emulsify beads and PCR
an excess of DNA                                 occurs inside       microreactors, enric
                   reagents in water-in-oil
 Capture Beads                                   microreactors       h for DNA-positive
                       microreactors
                                                                            beads
Pyrosequencing
    DNA template with primer
     mixed with the enzymes along
     with the two substrates
     adenosine 5‟-phosphosulfate
     (APS) and luciferin
1.     one of the four nucleotides
       added to reaction
2.     If complementary to base in
       template strand then DNA
       polymerase incorporates it
3.     Pyrophosphate (Ppi)
       released then converted to
       ATP by sulfurylase in the
       presence of APS.
4.     ATP serves as a substrate to
       luciferase, causing a light
       reaction.
5.     Excess nucleotides degraded
       by apyrase.
Illumina
Sequencing
The Sequence Assembly Problem
   Sequencing technologies generate reads of <1000
    bp
   These reads must be assembled into a single
    continuous genomic sequence.
   Shotgun sequencing exploits many overlapping
    sequences (high coverage) to infer ordering directly
    from the sequences themselves
The Repeat Problem
   Repeats at read ends can be assembled in multiple
    ways
    Correct
    ATTTATGTGTGTGTGGTGTG
                 GTGTGGTGTGCACTACTGCT
                           ACTACTGCTGACTACTGTGTGGTGTG
                                         GTGTGGTGTGATATCCCT

    Incorrect
     ATTTATGTGTGTGTGGTGTG
                   GTGTGGTGTGATATCCCT


                ACTACTGCTGACTACTGTGTGGTGTG
                              GTGTGGTGTGCACTACTGCT
Random shearing
    bacterial
  chromosome

                                                  Size selection for 3kb or 8kb etc

Obtain sequences from
  either side of linker

                          Paired-end
known distance apart in
        genome

                          Sequencing                                Add linkers




                                                                   Circularise
   Add adapters            Shear and select on size and
                               presence of linkers




                                                   Create long fragments of known
                                                   length
                                                   Obtain sequence from paired ends
                                                      known distance apart
                                                   Allows assembly of contigs across
                                                   repeats into scaffolds
Genome Assembly




   Contig 1            Contig 2                 Contig 3
                                      Sequence Gap

                                  Scaffold

        Physical Gap
Re-sequencing
   Short reads (<200bp)
    inefficient de novo
    assembly
   Instead they are
    mapped against a
    reference genome
   Re-sequencing is like
    assembling a jigsaw
    puzzle using the image
    on the lid
Genome annotation
   Annotation is the addition of information about the
    predicted sequence features to the flat file of DNA code
   Identification of potential coding sequences - CDS
   Homology searches to predict function
   Other features can be annotated as well
       rRNAs
       Potential promoters
       tRNAs
       Small non-coding RNAs
       Repeat sequences
       Insertion sequences (ISs), transposons, gene fragments
   Location of the origin of replication
   Determination of the number of bases, genes, and
    G+C%.
How to go from this….?
>Escherichia coli K-12 MG1655_3870656-3890655
      TGCTGCTGCCTGCTGCGCGGTGCGCTCTACGGATTGCCCGGCGCGATAGAGATCGCTGCCTAAGCCCGCCCCTGCACAACCTGCGTCTATCCACTGCGCCAGGTTTTCTGCGTCACGCCGCAAC
      GGCAAAGACTGCGATGTCCGATGGCAATACCGCTTTTAACGCTTTGATGTATTGCGGACCAAAAGCCGATGACGGAAATATTTTCAGCGCCTGCGGCGCCCGCTTCGAGCGCGGTAAAGGCTTCG
      GTCGCCGTCGCGCAGCCGGGGCAGACGTCATGCCGTAGCCCACCGCACGGCGGATCACTTCACTATGGATATTGGGCGTAACGATGAGCTGACAGCCCATCCTGGCGAGCGCATCGACCTGTT
      CAGGTTTCAGTACCGTACCTGCGCCAATCAACGCCTTGTCGCCGTACGCATCAACGATGCGGGAATGCTTTGCTCCCATTGTGGGGAATTCAGCGGGATTTCAACCGCGTCGAACCCGGCGTCAA
      TCACCGCGCCAACATGCGCCAGCGCCTCGTCGGGCGTAATACCGCGCAAAATGGCGATCAGCGGGAGTTTAGTTTGCCACTGCATGAGGATGCTCCTTATACCAGCCTGAAATGCCGTGTCGCC
      CGCCACCGCCGTCACGTCGCAACCCATCGCCTGAAAGGCTTGCTGGTAGCGCGCGGTCAGCGATGTTCCGGCGACAAGGGTGATGGCGTGTTGATGGGCCACATAGTCGCGCATACTGGGACC
      TCTGCGCCAATCAACAAACCAGAGAGAAATTCGCTGACCTGTTCGCGGGGAAGTGTTCCCAGCACATGCGAGGCGCGAACTTCAAAAAGCTGCGGCAATATGGCGGGCGTATTAAGACCACGCT
      CAAGGCCAGCTGTGAAGGCATCGGCAGGTTTTCCTGCGGCGGCAAACCTGCGCCAATCAATGAGTGATTTAACAGTAAATGATGTAATTCACCGGTCATCACGGTGCGAAAATCGTTGATTTGCTG
      GCTATCGGCCTGCACCCATTTGCAATGGGTTCCGGGCATGACATAAAGAGAGGAAGAGCCAGAGCTCGCGCGCCGATCAATTGTGTTTCTTCGCCGCGCATCACATTGTGGTTATCGTCATGAGA
      GACACATAATCCGGGAATAATCCAGATATTGTCGCCAACTGACGTTAATTGTTCGCCAATAGACGAAAAACAGGCAGGAACAGATAATACGGTGCAACTTTCCAGCCGACGTTGCTGCCAACCATT
      CCTGCCATTACCACTGGCGTTTTCTCTTCACGCCAGTCGGTCGTGACTTCTGCTAACACCGCAGCCGGAGATTTTCCGTTCAGGCGCGTGACGCCTGCTTCTGATTGCCTGCTCTCAGGCAGTGG
      TCGCCCTGATAAAGCCAGGCGCGCAGATTGGTCGATCCCCAGTCAATTGCGATGTAGCGAGCTGTCATGTGATTTCCTTTAACCTTCGTGTCGAGCTGGCGATCATGGTAAGCGCCGCCTGCTCT
      GCCGCATCGCCGTCCTGATGCGTATCGCATCGAACAGCGCCTTATGTTCCTGGAGCGTTTGCGGCATGTTGGCCTCATCGCCCATCCAGGTTCGTTCAAAAACCGCCCGCTGCAGCGAACTGATC
      GCAATGCTAAGTTGCTGTAACACCGGGTTATGCACCGACTGCAGCACCGCTCGTGGTAGCGAATATCCGCTTCGTTAAACGCTTCGCGGTCCTGATTGTTGGCAATCATCTCGTTCAGCGCCGATT
      CAATCTGCGCCAGATCGCTGGAAGTCGCGCGCTCTGCTCCCAACGGGCAATCGCCGGTTCCACCAGATTTCGCACTTCGTCATGGCACTGATAAGCCGTGGGTCGTAGTCATTTTCCAGCACCCA
      TTGCAGTACGTCAGTGTCGAGGTAATTCCACTGGTTACGCGGTGCCACAAACGCCCCGCGATAACGTTTCATTTCAATCAGCCGCTTCGCCATCAGCGAACGGAACACCCACGGATGATGTTGCG
      CGAGGTTGCAAACTCCTCACAGAGTTCCGCCTCAGCCGGAAGCGGCGAGCCTGGCACGTATTTGCCGTGAACGATCTGTTTACCCAGCGTAATGACAATGCGATCGGTTTTATTGAGAGTCATGG
      AGAGTCCTTGTGCTTGTATGTTCTTCTCTACTTTACCCCGATCGATGCATAACGCGGCAACTTTGTAGTACCAGCGTGATGACGTTCGCGTTTGCCGTGCGTGTAATGTAGTACAAACTTATATTGTT
      GTACTACAATTTAGATCACAAAAAGAACAATGCATAAAAAATGACATGCGTCGGGCAGAAATCTGAAAAGGGATATCAGGCGCTAAACAGGAGGGAAAGAAGAGTATGCTTTCAACGGCTTAGCTA
      CTCGTTTAAAGGATTAATCATGAAGTTGAATTTTAAGGGATTTTTTAAGGCTGCCGGTTTATTCCCACTGCGCTGATGCTTTCAGGCTGTATCTCGTATGCTCTGGTTTCCCATACCGCAAAGGGTAG
      TTCAGGAAAGTATCAATCGCAGTCAGACACCATCACTGGGCTATCGCAGGCAAAAGATAGTAATGGAACAAAAGGCTATGTTTTTGTAGGGGAATCGTGGATTACCTTATCACTGATGGTGCCGAT
      GACATCGTTAAGATGCTCAATGATCCAGCACTTAACCGGCACAATATTCAGGTTGCCGATGACGCAAGATTTGTTTTAAATGCGGGGAAAAAGAAATTTACCGGCACAATATCGCTTTACTACTACG
      GAATAACGAAGAAGAAAAGGCACTGGCAACGCATTATGGTTTTGCCTGTGGTGTTCAACACTGTACCAGGTCACTGGAAAACCTAAAAGGCACAATCCATGAGAAAAATAAAAACATGGATTACTCA
      AAGGTGATGGCGTTCTACCATCCATTTAAGTGCGATTTTATGAATACTATTCACCCAGAGGCATTCCGGGATGGTGTTTCCGCAGCATTACTGCCAGTGACTGTTACGCTGGACATCATTACTGCAC
      CGCTGCAATTTCTGGTTGTATATGCAGTAAACCAATAATCAGTAAGCGGGCAAACCGTTTATGCTGTTTGCCCGCCCACAGATTAATTCAGCACATACTTCTCAATAGCAAACGCCACGCCATCTTCA
      AGGTTAGATTTGGTGACAAAGTTCGCCACTTCTTTCACTGAAGGAATAGCGTTATCCATCGCCACACCGACGCCTGCATATTAATCATTGCGATATCGTTTTCCTGATCGCCAATCGCCATGATTTCT
      TCCGGTTTAATACCTAACACGTCGGCCAGTGATTTCACCCCCGTACCTTTGTTAACGCGTTTATCGAGGATTTCGAGGAAGTACGGCGCACTTTTCAGCACGGTATATTCTCTTTCACTTCCTGCGG
      AATACGCGCGATAGCCTGGTCGAGGATGGCGGGTTCATCAATCATCATCACTTTCAGGAACTGGGTATTGGGGTCCATTTTCTCCGCTTCGCAGAACACCAGCGGAATGGTGGCAACGAAGGATT
      CATGCACCGTGTGTAGCTGATATCACGGTTGGCGGTGTACAGCGTGGTGCGGTCCAGGGCGTGGAAATGAGAACCGACTTCGCGAGAGAGTTTTTCCAGGAAACGATAGTCGTCATAGCTGAGA
      GCAGTTTGCGCCACGGTGCTACCATCAGCGGCCTTCTGTACCACGCGCCGTTATAAGTAATGCAGTAGTCGCCCGGCTGTTCCATATGCAGCTCTTTCAGGTAGTTGTGCACACCTGCATACGGG
      CGACCCGTCGTTAGCACGACATTCACGCCACGGGCGCGAGCTGCGGCAATCGCATTTTTAACGGCGGGTGAAAGGTGTGATCGGGCAGCAGAAGGGTGCCATCCATATCGATAGCAATGAGTTT
      AATAGCCATGAGTTCCCCAGGTAGATTGGTTCCTGACCCATGCTAACGCGATTCCGCTCAAAAATCAGTACAACACCCGAGGGAAAAGGGGGATGCAACGCGCGTGCGTGCTCCCTTTTTGCTTA
      GCGGAAGAGTTTCCCTTTCAGCAGTTCCATGCCTGCGGAAAGCAGATCGTTATTGGCTTGTGGTGACACTTCACCTTGCGGTGAGAGCGCATCAATAATCTTCGGCAATTGTTCTGCCAGTAAACT
      GGAAGCTGACTGGTATCCACGCCAAGTTTTTGCCCGAGATCGGACACCGCATTTGTGCCGAGCGCCGATTCCAGTTGCTCGCCACTAACCGATTGATTGCCCTGTTGATTACTCAGCCAGGTTGA
      GAGAATGGCCCCTAAGCCGCCACTTTGCAGTTTTTCCACAGCACCTGAATGCCGCCCTGCTCCTCAACCCAACTTAAAATAGCCTGATATTTCCCCGCATCGCCTTTCAGAAAGGCACCGACAACTT
      CATCAAAAAGCCCCATGATAATCACCTGTAAAGCGTTACGTGTTGACCCAAAAAGTATAGATTTGCGGATGATAATTGCGGATTGCAGAAATAAAAAGGGCGGAGATGATCTCCGCCCTTTTCTTAT
      AGCTTCTTGCCGGATGCGGCGTGAACGCCTTATCCGGCCTACAAAATCATGAAAATTCAATACATTGCAAGATTTTCGTAGGCCTGATAAGCGTGCGCATCAGGCACGCTCGCATGGTTAGCGCCA
      TTAAATATCGATATTCGCCGCTTTCAGGGCGTTCTCTTCAATAAACGCACGGCGCGGTTCAACGGCGTCGCCCATCAGCGTGGTGAACAACTGGTCGGCAGCAATCGCATCTTTAACGGTAACCG
      CAGCATACGACGACTTTCCGGGTCCATAGTGGTTTCCCACAGCTGTTCCGGGTTCATCTCGCCCAGACCTTTATAACGCTGGATGGAGAGGCCGCGACGGGACTCTTTCACCAGCCAGTCCAGCG
      CCTGCTCGAAGCTGGCTACCGGCTGACGCGCTCGCCACGTTCGATAAACGCATCTTCTTCCAGCAAGCCACGCAGTTTCTCACCCAGCGTGCAGATACGACGATATTCGCCACCGGTGATAAACT
      CGTGATCCAGCGGATAGTCAGTATCCACACCGTGGGTACGCACGCGAACAATCGGCTCAACAGGTTTTGCTCAGCATTGGTGTGAACATCAAACTTCCACTGGCTGCCGTGCTGTTCTTTGTCGTT
      CAGTTCGCTGACCAGCGCGTTCACCCAGCGGGTAACGGTCTGCTCATCAGAAAGGTCAGCTTCCGTCAACGTCGGCTGATAGATAAGTCTTTCAGCATTGCTTTCGGATAACGACGCTCCATACG
      ATTGATCATTTTCTGCGTCGCGTTGTACTCAGATACCAGTTTCTCTAACGCTTCGCCAGCCAATGCCGGTGCACTGGCGTTGGTGTGCAGCGTTGCGCCGTCCAGCGCGATAGAGATTGGTACTG
      ATCCATCGCTTCGTCGTCTTTAATGTACTGTTCCTGCTTGCCTTTCTTCACTTTGTACAGCGGCGGCTGAGCGATGTAGACGTGACCGCGTTCAACGATTTCCGGCATCTGACGATAGAAGAAGGT
      CAACAGCAGCGTACGAATGTGGAGCCGTCGACGTCCGCATCGGTCATGATGATGATGCTGTGATAACGCAGTTTGTCCGGGTTGTACTCGTCACGACCGATACCACAGCCAAGCGCGGTGATAA
      GCGTCGCCACTTCCTGAGAAGAGAGCATCTTATCGAAGCGCGCTTTCTCGACTTGAGGATTTTACCCTTCAGCGGCAGAATCGCCTGGTTCTTGCGGTTACGCCCCTGCTTCGCAGAGCCGCCCG
      CGGAGTCCCCTTCCACCAGGTACAGTTCGGAAAGCGCCGGATCGCGTTCCTGGCAGTCTGCCAGTTTGCCCGGCAGGCCCGCAAGTCGAGCGCACCTTTACGGCGGGTCATTTCACGCGCGCG
      ACGCGGCGCTTCACGGGCACGGGCAGCATCGATAATTTTGCCAACCACGATTTTCGCGTCGGTTGGGTTTTCCAGCAGGTATTCTGCCAGCAGTTCGTTCATCTGCTGTTCAACGCCGATTTCACC
      TCAGAAGAAACCAGTTTGTCTTTGGTCTGGGAGGAGAATTTCGGGTCCGGCACTTTCACGGAAACGACCGCAATCAGGCCTTCACGCGCATCGTCACCGGTGGCGCTGACTTTGGCTTTTTTGCT
      GTAGCCTTCTTTGTCCATTAGGCGTTCAGGGTACGGGTCATCGCCGCACGGAAGCCTGCCAGGTGAGTACCGCCGTCACGCTGCGGAATGTTGTTGGTAAAGCAGTAGATGTTTTCCTGGAAGCC
      ATCGTTCCACTGCAACGCCACTTCGACGCCAATACCGTCTTTTTCAGTGAGAAGTAGAAGATATTCGGGTGGATCGGCGTTTTGTTCTTGTTCAGATATTCAACGAACGCCTTGATGCCGCCTTCAT
      AGTGGAAGTGGTCTTCTTTGCCGTCGCGCTTGTCGCGCAGACGAATGGAAACGCCGGAGTTGAGGAACGACAACTCCGCAGACGTTTCGCCAGAATTTCATATTCGAACTCGGTCACATTGGTGA
      AGGTTTCGAGGCTGGGCCAGAAACGCACCATGGTGCCGGTTTTTTCAGTCTCGCCGGTAACCGCCAGCGGGGCCTGCGGTACACCGTGTTCGTAGATCTGACGGTGATTTTACCCTCGCGCTGG
      ATAACCAGCTCCAGTTTTTGCGACAGGGCGTTTACTACCGAAACACCAACGCCGTGCAGACCGCCGGACACTTTATAGGAGTTATCGTCAAATTTACCGCCTGCGTGCAGAACGGTCATGATCACT
      TCCGCCGCCGA
…to this?
   FT gene complement(9299..10702)
   FT /db_xref="GenBank:2367266”
   FT /gene="dnaA”
   FT /note="b3702”
   FT CDS complement(9299..10702)
   FT /db_xref="GI:2367267”
   FT /db_xref="PID:g2367267”
   FT /function="putative regulator; DNA - replication, repair,
   FT restriction/modification”
   FT /codon_start=1
   FT /protein_id="AAC76725.1”
   FT /gene="dnaA”
   FT /translation="MSLSLWQQCLARLQDELPATEFSMWIRPLQAELSDNTLALYAPNR
   FT FVLDWVRDKYLNNINGLLTSFCGADAPQLRFEVGTKPVTQTPQAAVTSNVAAPAQVAQT
   FT QPQRAAPSTRSGWDNVPAPAEPTYRSNVNVKHTFDNFVEGKSNQLARAAARQVADNPGG
   FT AYNPLFLYGGTGLGKTHLLHAVGNGIMARKPNAKVVYMHSERFVQDMVKALQNNAIEEF
   FT KRYYRSVDALLIDDIQFFANKERSQEEFFHTFNALLEGNQQIILTSDRYPKEINGVEDR
   FT LKSRFGWGLTVAIEPPELETRVAILMKKADENDIRLPGEVAFFIAKRLRSNVRELEGAL
   FT NRVIANANFTGRAITIDFVREALRDLLALQEKLVTIDNIQKTVAEYYKIKVADLLSKRR
   FT SRSVARPRQMAMALAKELTNHSLPEIGDAFGGRDHTTVLHACRKIEQLREESHDIKEDF
   FT SNLIRTLSS”
   FT /product="DNA biosynthesis; initiation of chromosome
   FT replication; can be transcription regulator”
   FT /transl_table=11
   FT /note="f467; 100 pct identical to DNAA_ECOLI SW: P03004;
   FT CG Site No. 851”

Or this?
An ORF is not a CDS!
An ORF is just an open reading frame
There are many more ORFs than protein coding genes (CDSs) in a
genome


                                                        Non-coding ORFs




                                                           CDSs
                                                    (note ORF can extend
                                                   upstream of start codon)
The Problem of Frameshift Errors
      Actual sequence

     10      20   30    40   50    60   70
     |     |    |    |    |   |    |
 ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA
 M S T A K L V K S K A T N L L Y T R N D V S D S E K
 • V P L N • L N Q K R P I C F I P A T M S P T A R K
  E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K


     10      20   30    40    50    60   70
     |     |    |    |    |    |    |
 ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA
 M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E
 • V P L N • L N Q K A T N L L Y T R N D V S D S E K
  E Y R • I S • I K K R P I C F I P A T M S P T A R K


      Frameshifted sequence after single base error
Homology
   Similarities in form       the cat sat on the mat
    (sequence) allow us        die Katze sass auf der Matte
    to infer similarities in
    “meaning” (structure
    and function)
   Homology is not just
    sequence similarity
       Two sequences can
        be similar without
        any common
        ancestry, particularly
        if low complexity      vge|GBant88-2      ITLITCVSVKDNSKRYVVAG
                               vge|GEfae9-178     LTLITCDQATKTTGRIIVIA
                               vge|GSpne1-403     MTLITCDPIPTFNKRLLVNF
                               sortase_staur      LTLITCDDYNEKTGVWEKRK
Types of Homology
   Homologues can be
    divided into
       Orthologues: lines of
        descent congruent with
        whole genome
       Paralogues: result of
        gene duplication
       Xenologues: result of
        HGT
Homology Searches
   The aim of homology searches is to identify sequences
    within these databases that are homologous to your
    sequence.
   This involves comparing your sequence with all the
    database sequences
       looking for stretches of sequence that appear to be similar
       then scoring the matches and ranking them
       a measure of the significance of the match is given
   Most common program used for homology searches is
    BLAST
Bacterial Genome Dynamics
       Gene Loss                         Gene Duplication
                                                                         Gene Gain

      Drastic downsizing in isolated
      intracellular niches                                              Horizontal gene transfer
                                                                        by phage, plasmids,
                                                                        pathogenicity islands




                                          Bacterial                       Rapid emergence of
Accumulation of
                                                                          genetically uniform
pseudogenes and IS                        Genome                          pathogens from variable
elements after shift to                   Dynamics                        ancestral populations
new niche




           Recombination and
           rearrangements                                   single nucleotide polymorphisms (SNPs)




                                       Gene Change
Horizontal gene transfer
   Horizontal (or lateral) gene transfer denotes any
    transfer, exchange or acquisition of genetic material that
    differs from the normal mode of transmission from
    parents to offspring (vertical transmission).

         Vertical gene transfer
                              Horizontal gene
Bacterial mobile genetic elements
   Transposons
       pieces of DNA that act as „jumping genes‟ that change
        location on chromosome or plasmid chromosomal
        localization.
       encode transposase that catalyses the transposition
        event
       can carry resistance or virulence genes
   Insertion sequences (IS elements)
       transposable elements that encode only the transposase
       multiple copies of same IS within genome provide targets
        for homologous recombination, rearrangements and
        replicon fusions
   Conjugative transposons
       normally integrated into the chromosome
       excise then transferred to recipient cells by conjugation
Bacterial mobile genetic elements
   Plasmids
       self-replicating extrachromosomalreplicons
       usually circular but can be linear
       Can carry resistance or virulence genes
   Bacteriophages
       bacterial virusescan carry virulence genes
       can insert into bacterial chromosome as prophages
        (lysogeny)
   Integrons
       complex natural cloning and gene expression systems
        able to capture promoterless gene cassettes by site-
        specific recombination
       allow formation of large arrays of gene cassettes
        transferred as a whole between different replicons.
Genomic islands
   large chromosomal regions, part of the flexible gene
    pool
   previously transferred by other mobile genetic
    elements
   present in some bacteria but absent in close
    relatives
   carry multiple genes that increase phenotypic
    versatility
   contribute to dynamic character of bacterial
    chromosomes and can be excised from the
    chromosome and transferred to other recipients
   pathogenicity islands contain dozens of genes that
    allow quantum leap to complex new virulence
Core genomes and Pangenomes
   Core genome
       pool of genes shared by all members of a bacterial
        species
   Accessory or dispensable genome
       pool of genes present in some but not all genomes within
        the same bacterial species
   Pangenome
       global gene repertoire of a bacterial species, comprised of
        core genome + accessory genome
   Metagenome
       global gene repertoire of mixed microbial population
Escherichia coli Core and Pan-genomes




                         Welch et al. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):17020-4
Metagenomics
   Environmental shotgun
    sequencing
       DNA extracted from
        mixed microbial
        communities sequenced
        en masse
   Assembled into contigs
       Typically only small
        contigs can be obtained
Uses of a genome sequence
   Gene discovery
       Fuelling hypothesis driven research on pathogen biology
   Comparative genomics
       SNP discovery and genomic epiemiology
   Functional genomics
       Transcriptomics
       Proteomics
       Interactome
       Structural Genomics
       Mass Mutagenesis
Haemolytic-uraemic syndrome
   Shiga-toxin-producing E. coli (STEC)
       bloody diarrhoea; damage to kidneys and brain
       anaemia; loss of platelets
German E. coli O104:H4 outbreak

   May-July 2011
   >4000 cases
   >40 deaths
   Link to sprouting seeds
   High risk of haemolytic-
    uraemic syndrome
   Females particularly at risk


        Frank et al DOI: 10.1056/NEJMoa1106483
Take-away messages from the genome
   Pathogens don‟t bother with passports!
       Not a new strain: something similar seen in Germany ten
        years ago and in Korea
       closest genome-sequenced strain was isolated from Central
        African Republic in late 1990s, belongs to an
        enteroaggregative lineage
   German STEC probably comes from a lineage
    circulating in human populations rather than from an
    animal source (unlike E. coli O157)
Take-away messages
   Bacteria evolve
    quickly
       Virulence factors in E.
        coli can jump from one
        lineage to another on
        mobile genetic
        elements
       Pathotypes can
        overlap and evolve
       Antibiotic resistance
        seen where no
        obvious prior use of
        antibiotics
Take-away messages from genome sequence
   Genome sequencing brings the advantages of
       open-endedness (revealing the “unknown unknowns”),
       universal applicability
       ultimate in resolution
   Bench-top sequencing platforms now generate data
    sufficiently quickly and cheaply to have an impact on
    real-world clinical and epidemiological problems
Comprehensive Coverage of Human Microbiome
Comprehensive coverage of tree of life
What will you do when you can sequence
everything?

Weitere ähnliche Inhalte

Was ist angesagt?

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
hemantbreeder
 

Was ist angesagt? (20)

e. coli
e. colie. coli
e. coli
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Genomic library construction
Genomic library constructionGenomic library construction
Genomic library construction
 
Transcriptomics
TranscriptomicsTranscriptomics
Transcriptomics
 
M13 phage
M13 phageM13 phage
M13 phage
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Yeast Artificial Chromosomes (YACs)
Yeast Artificial Chromosomes (YACs)Yeast Artificial Chromosomes (YACs)
Yeast Artificial Chromosomes (YACs)
 
PPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOMEPPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOME
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Expression vectors
Expression vectorsExpression vectors
Expression vectors
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 
Map based cloning of genome
Map based cloning of genomeMap based cloning of genome
Map based cloning of genome
 
Microbial sequencing
Microbial sequencingMicrobial sequencing
Microbial sequencing
 
Labelling of dna
Labelling of dnaLabelling of dna
Labelling of dna
 
S1 Nuclease Mapping
S1 Nuclease MappingS1 Nuclease Mapping
S1 Nuclease Mapping
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Bacterial, viral genome organisation
Bacterial, viral genome organisation Bacterial, viral genome organisation
Bacterial, viral genome organisation
 
Lectut btn-202-ppt-l25. introduction of dna into host cells
Lectut btn-202-ppt-l25. introduction of dna into host cellsLectut btn-202-ppt-l25. introduction of dna into host cells
Lectut btn-202-ppt-l25. introduction of dna into host cells
 

Andere mochten auch

Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkoda
maths00001
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland interviewees
katyfleury
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ?
Jeremaya
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation
mhaimel
 

Andere mochten auch (20)

SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Shapes of bacteria
Shapes of bacteriaShapes of bacteria
Shapes of bacteria
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkoda
 
Ducky momo
Ducky momoDucky momo
Ducky momo
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal view
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafrica
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland interviewees
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ?
 
Postgresql 9.3-a4
Postgresql 9.3-a4Postgresql 9.3-a4
Postgresql 9.3-a4
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the dead
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation
 
Genome Assembly Forensics
Genome Assembly ForensicsGenome Assembly Forensics
Genome Assembly Forensics
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer Phylogenomics
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
 

Ähnlich wie Bio153 microbial genomics 2012

NILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptxNILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptx
TanmoyBanerjee44
 
Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)
Drladlikishore2015
 
PCR, RT-PCR, FISH
PCR, RT-PCR, FISHPCR, RT-PCR, FISH
PCR, RT-PCR, FISH
tcha163
 

Ähnlich wie Bio153 microbial genomics 2012 (20)

THE human genome
THE human genomeTHE human genome
THE human genome
 
Genomics
GenomicsGenomics
Genomics
 
Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) converted
 
Modern genetics
Modern geneticsModern genetics
Modern genetics
 
Molecular tagging
Molecular tagging Molecular tagging
Molecular tagging
 
NILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptxNILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptx
 
Recombination Technology
Recombination TechnologyRecombination Technology
Recombination Technology
 
Recombinant DNA.pptx
 Recombinant DNA.pptx Recombinant DNA.pptx
Recombinant DNA.pptx
 
Fungal genomics
Fungal genomicsFungal genomics
Fungal genomics
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)
 
Eisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009bEisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009b
 
Unit7_MolecularGenetics
Unit7_MolecularGeneticsUnit7_MolecularGenetics
Unit7_MolecularGenetics
 
Ap Bio Ch 13 Power Point
Ap Bio Ch 13 Power PointAp Bio Ch 13 Power Point
Ap Bio Ch 13 Power Point
 
Cloning dna f inal
Cloning dna f inalCloning dna f inal
Cloning dna f inal
 
Microbial genomes.ppt
Microbial genomes.pptMicrobial genomes.ppt
Microbial genomes.ppt
 
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptxDNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
 
PCR, RT-PCR, FISH
PCR, RT-PCR, FISHPCR, RT-PCR, FISH
PCR, RT-PCR, FISH
 
0.PDF
0.PDF0.PDF
0.PDF
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 

Mehr von Mark Pallen

Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics
Mark Pallen
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Mark Pallen
 

Mehr von Mark Pallen (13)

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of Evolution
 
Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coli
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infection
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging Infections
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering Gene
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populations
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Bio153 microbial genomics 2012

  • 1. Bio153 Microbial Genomics Professor Mark Pallen University of Birmingham
  • 2. Microbial Genomics  General features of microbial genomes  Historical overview  Genome sequencing, annotation and analysis  Genome evolution  What we can learn from a genome sequence?
  • 3. General features of genomes Microbial Human  Small WSIWYG genomes  Very large genomes (Mbp) (Gbp)  Gene density high (>90%)  intergenic regions short  Gene density low  very little repetitiveor non-  Only 25% is genes coding DNA  Introns mean only1%  Introns very rare codes  Protein-coding genes  Genes can span ≥30 (CDS) short (~1kbp) kbp  Operons with promoters just upstream  Genes have ~3  Fewer non-coding RNAs transcripts  Splicing and splice variants
  • 4. Bacterial genome organisation Chromosomes Plasmids  Most commonly single  Independent autonomous replicon, can be circular or circular chromosome linear (always DNA)  may integrate into chromosome  BUT many species have  copy number varies 1 to 10s linear chromosome(s) (e.g.  often carry non-essential genes Borrelia, Streptomyces, Rh that confer an adaptive odoccus) advantage in certain conditions  BUT a few species with two chromosomes (e.g. Vibriocholerae)  Can be mix of circular and linear (e.g. Agrobacteriumtumefacien s, B. burgdoferi)
  • 5. Bacterial Genome Size  species which occupy restricted ecological niches, (e.g. obligate intracellular parasites and endosymbionts) tend to have smaller genomes (<1.5 Mb) than generalist bacteria  smallest known bacterial genome: Carsonellaruddii, 160 kb! (Nakabachi et al. 2006)  BUT mitochondrial genomes are smaller  largest genomes found in bacteria with complex developmental cycles, e.g. Streptomyces  largest bacterial genome: Sorangiumcellulosum, 13 Mb
  • 6. Bacterial genomes are made from DNA  In 1944, Oswald Avery, Colin MacLeod, and Maclyn McCarty showed that DNA (not proteins) was the genetic material responsible for inheritance.  Identified DNA as the "transforming principle" while studying Streptococcus pneumoniae  Avery, Oswald T., Colin M. MacLeod, and Maclyn McCarty. Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Journal of Experimental Medicine. 1944 Feb 1; 79(2): 137-158.  In 1952, this work was supported by Alfred Hershey and Martha Chase who showed that only the DNA of a virus needs to enter a bacterium to infect it.  Used radioactively labelled bacteriophage  Hershey AD and Chase M. Independent functions of viral protein and nucleic acid in growth of bacteriophage. Journal of General Physiology. 1952. 36: 39-56.
  • 7. Viral genomes are variable  Use RNA or DNA but not both in genome  Some have RNA genomes!  Grouped into families depending on  type of genome: DNA or RNA, single- or double- stranded  Typically dozens of genes or fewer  Large genomes in pox viruses (~200 kb)  Massive genomes in megaviruses (1Mbp!)
  • 8. Microbial Genomics Timeline Year Milestone 1977 Invention of dideoxy chain terminator sequencing (“Sanger sequencing”) 1979 Sequencing of the 5.3-kilobase genome of bacteriophage phiX174 1981 First human mitochondrial genome sequence* 1982 Determination of the 48.5-kilobase genome sequence of bacteriophage lambda through first use of shotgun sequencing 1986 Development of automated fluorescent sequencing 1995 First complete genome sequences obtained of free-living bacteria (Haemophilus influenzae and Mycoplasma genitalium) 1996 Mycoplasma becomes first bacterial genus that has completely sequenced genomes from two different species (M. genitalium and M. pneumoniae) 1997 First genome sequences from Escherichia coli and Bacillus subtilis 1998 First genome sequence from Mycobacterium tuberculosis; genome sequence from Rickettsiaprowazekii provides first evidence of reductive evolution
  • 9. Microbial Genomics Timeline Year Milestone 1999 Helicobacter pylori becomes the first species with completely sequenced genomes from two isolates 2000 Meningococcal genome sequence primes first application of reverse vaccinology 2001 Second E. coli genome sequences reveal unexpected level of horizontal gene transfer; genome sequence of M. leprae provides compelling evidence of bacterial pseudogenes and reductive evolution; first paper reporting genome sequences of two strains from one species (Staphylococcus aureus) in a single publication. 2002 Genome sequencing of multiple strains of Bacillus anthracis to provide markers for forensic epidemiology 2003 Genome sequencing of uncultivable Tropherymawhippleileads to design of axenic growth medium 2004 Genome sequence of mimivirus blurs distinctions between bacteria and viruses 2005 Use of whole-genome sequencing used to identify target of new anti-tuberculosis drug Mycoplasma genitalium genome sequenced using pyrosequencing 2006- Bacterial metagenomics survey of the Sargasso sea yields >1 million new genes 2011 Rise of next-generation or high-throughput sequencing
  • 10. The first genome sequences  The first sequenced gene was from bacteriophage MS2  The gene encoding the coat protein  1972  Min Jou W, Haegeman G, Ysebaert M, and Fiers W. Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature. 1972 May 12; 237(5350): 82-88.  The first sequenced genome was bacteriophage MS2  1976  RNA genome is 3,569 nucleotides  Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D, Merregaert J, Min Jou W, Molemans F, Raeymaekers A, Van den Berghe A, Volckaert G, and Ysebaert M. Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature. 1976 Apr 8; 260(5551): 500-507.
  • 11. The first genome sequences  The first sequenced DNA genome was bacteriophage Φ- X174  1977  5368 base pairs  Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, and Smith M. Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 1977 265 (5596): 687-695.  The first sequenced bacterial genome was Haemophilus influenzae  1995  1,830,140 base pairs  Fleischmann R, Adams M, White O, Clayton R, Kirkness E, Kerlavage A, Bult C, Tomb J, Dougherty B, and Merrick J. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995. 269 (5223): 496- 512.
  • 12. Overview of a genome project  Choose strain  Closure and finishing  Fresh isolate or tractable  Manually intensive lab strain?  Difficulty depends on  Choose strategy how repetitive  Shotgun sequencing  Data Release  Paired-end sequencing  Immediate or delayed?  Draft or complete?  Annotation  Choose chemistry  Manually intensive bottle  Sanger; 454; Illumina; neck Ion Torrent  Publication  Assembly  Automated
  • 13. Methods for genome sequencing – historic Sanger method sequencing  Sanger F and Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Journal of Molecular Biology. 1975 94: 441-448.  Step 1, a sequence-specific DNA primer is radiolabeled  Step 2, the primer is annealed to the template DNA  Step 3, the primer is extended by DNA polymerase  Incorporation of a deoxynucleotide - further extension possible  Incorporation of a dideoxynucleotide – chain termination  Four reactions set up  ddATP, dATP, dCTP, dGTP, dTTP  ddCTP, dATP, dCTP, dGTP, dTTP  ddGTP, dATP, dCTP, dGTP, dTTP  ddTTP, dATP, dCTP, dGTP, dTTP
  • 14. Methods for genome sequencing – historic Sanger method sequencing
  • 15. Methods for genome sequencing – automated Sanger sequencing  Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SBH, and Hood LE. Fluorescence detection in automated DNA sequence analysis. Nature. 1986 321: 674-679.  Replaced radioisotopes with fluorescent dyes  Safer for the researchers  Each of the four DNA bases could be dyed a different colour  Eliminated the need to run separate reactions in separate lanes  The migration of the dye could be read because of the fluorescence  This information allowed automatic gel reading  Further improvements were made  Improved dye chemistry using fluorescent dideoxy-terminators (DuPont): Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ, Jensen MA, and Baumeister K. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238: 336-341.  Replacing slab gels with re-useable capillary tubes: Ruiz-Martinez MC, Berka J, Belenkii A, Foret F, Miller AW, and Karger BL. DNA sequencing by capillary electrophoresis with replaceable linear polyacrylamide and laser-induced fluorescence detection. Analytical Chemistry 1993 65: 2851-2858.
  • 16. Whole-Genome Shotgun Sanger Sequencing Random shearing bacterial chromosome Size selection plasmid vector Pick colonies to create shotgun Cloning library Sequence each insert with two primers Plasmid preps
  • 17. High-throughput Sequencing  100x faster, 100x cheaper!  A disruptive technology  Several technologies in the marketplace from 2007 onwards  454 (Roche)  Illumina  Ion Torrent  PacBio  Fundamentally new approaches  Solid-phase amplification of clonal templates in “molecular colonies”  Massive increase in number of “clones” compensates for shorter read length  New chemistries for sequence reading  454: pyrophosphate detection on base addition  Illumina: reversible de-protection of fluorescent bases
  • 18. High-Throughput Shotgun Sequencing Random shearing bacterial chromosome Size selection Sequence Amplify Add adapters
  • 19. 454 sequencing Emulsion-based clonal amplification Anneal sstDNA to Clonal amplification Break Emulsify beads and PCR an excess of DNA occurs inside microreactors, enric reagents in water-in-oil Capture Beads microreactors h for DNA-positive microreactors beads
  • 20. Pyrosequencing  DNA template with primer mixed with the enzymes along with the two substrates adenosine 5‟-phosphosulfate (APS) and luciferin 1. one of the four nucleotides added to reaction 2. If complementary to base in template strand then DNA polymerase incorporates it 3. Pyrophosphate (Ppi) released then converted to ATP by sulfurylase in the presence of APS. 4. ATP serves as a substrate to luciferase, causing a light reaction. 5. Excess nucleotides degraded by apyrase.
  • 22. The Sequence Assembly Problem  Sequencing technologies generate reads of <1000 bp  These reads must be assembled into a single continuous genomic sequence.  Shotgun sequencing exploits many overlapping sequences (high coverage) to infer ordering directly from the sequences themselves
  • 23. The Repeat Problem  Repeats at read ends can be assembled in multiple ways Correct ATTTATGTGTGTGTGGTGTG GTGTGGTGTGCACTACTGCT ACTACTGCTGACTACTGTGTGGTGTG GTGTGGTGTGATATCCCT Incorrect ATTTATGTGTGTGTGGTGTG GTGTGGTGTGATATCCCT ACTACTGCTGACTACTGTGTGGTGTG GTGTGGTGTGCACTACTGCT
  • 24. Random shearing bacterial chromosome Size selection for 3kb or 8kb etc Obtain sequences from either side of linker Paired-end known distance apart in genome Sequencing Add linkers Circularise Add adapters Shear and select on size and presence of linkers Create long fragments of known length Obtain sequence from paired ends known distance apart Allows assembly of contigs across repeats into scaffolds
  • 25. Genome Assembly Contig 1 Contig 2 Contig 3 Sequence Gap Scaffold Physical Gap
  • 26. Re-sequencing  Short reads (<200bp) inefficient de novo assembly  Instead they are mapped against a reference genome  Re-sequencing is like assembling a jigsaw puzzle using the image on the lid
  • 27. Genome annotation  Annotation is the addition of information about the predicted sequence features to the flat file of DNA code  Identification of potential coding sequences - CDS  Homology searches to predict function  Other features can be annotated as well  rRNAs  Potential promoters  tRNAs  Small non-coding RNAs  Repeat sequences  Insertion sequences (ISs), transposons, gene fragments  Location of the origin of replication  Determination of the number of bases, genes, and G+C%.
  • 28. How to go from this….? >Escherichia coli K-12 MG1655_3870656-3890655 TGCTGCTGCCTGCTGCGCGGTGCGCTCTACGGATTGCCCGGCGCGATAGAGATCGCTGCCTAAGCCCGCCCCTGCACAACCTGCGTCTATCCACTGCGCCAGGTTTTCTGCGTCACGCCGCAAC GGCAAAGACTGCGATGTCCGATGGCAATACCGCTTTTAACGCTTTGATGTATTGCGGACCAAAAGCCGATGACGGAAATATTTTCAGCGCCTGCGGCGCCCGCTTCGAGCGCGGTAAAGGCTTCG GTCGCCGTCGCGCAGCCGGGGCAGACGTCATGCCGTAGCCCACCGCACGGCGGATCACTTCACTATGGATATTGGGCGTAACGATGAGCTGACAGCCCATCCTGGCGAGCGCATCGACCTGTT CAGGTTTCAGTACCGTACCTGCGCCAATCAACGCCTTGTCGCCGTACGCATCAACGATGCGGGAATGCTTTGCTCCCATTGTGGGGAATTCAGCGGGATTTCAACCGCGTCGAACCCGGCGTCAA TCACCGCGCCAACATGCGCCAGCGCCTCGTCGGGCGTAATACCGCGCAAAATGGCGATCAGCGGGAGTTTAGTTTGCCACTGCATGAGGATGCTCCTTATACCAGCCTGAAATGCCGTGTCGCC CGCCACCGCCGTCACGTCGCAACCCATCGCCTGAAAGGCTTGCTGGTAGCGCGCGGTCAGCGATGTTCCGGCGACAAGGGTGATGGCGTGTTGATGGGCCACATAGTCGCGCATACTGGGACC TCTGCGCCAATCAACAAACCAGAGAGAAATTCGCTGACCTGTTCGCGGGGAAGTGTTCCCAGCACATGCGAGGCGCGAACTTCAAAAAGCTGCGGCAATATGGCGGGCGTATTAAGACCACGCT CAAGGCCAGCTGTGAAGGCATCGGCAGGTTTTCCTGCGGCGGCAAACCTGCGCCAATCAATGAGTGATTTAACAGTAAATGATGTAATTCACCGGTCATCACGGTGCGAAAATCGTTGATTTGCTG GCTATCGGCCTGCACCCATTTGCAATGGGTTCCGGGCATGACATAAAGAGAGGAAGAGCCAGAGCTCGCGCGCCGATCAATTGTGTTTCTTCGCCGCGCATCACATTGTGGTTATCGTCATGAGA GACACATAATCCGGGAATAATCCAGATATTGTCGCCAACTGACGTTAATTGTTCGCCAATAGACGAAAAACAGGCAGGAACAGATAATACGGTGCAACTTTCCAGCCGACGTTGCTGCCAACCATT CCTGCCATTACCACTGGCGTTTTCTCTTCACGCCAGTCGGTCGTGACTTCTGCTAACACCGCAGCCGGAGATTTTCCGTTCAGGCGCGTGACGCCTGCTTCTGATTGCCTGCTCTCAGGCAGTGG TCGCCCTGATAAAGCCAGGCGCGCAGATTGGTCGATCCCCAGTCAATTGCGATGTAGCGAGCTGTCATGTGATTTCCTTTAACCTTCGTGTCGAGCTGGCGATCATGGTAAGCGCCGCCTGCTCT GCCGCATCGCCGTCCTGATGCGTATCGCATCGAACAGCGCCTTATGTTCCTGGAGCGTTTGCGGCATGTTGGCCTCATCGCCCATCCAGGTTCGTTCAAAAACCGCCCGCTGCAGCGAACTGATC GCAATGCTAAGTTGCTGTAACACCGGGTTATGCACCGACTGCAGCACCGCTCGTGGTAGCGAATATCCGCTTCGTTAAACGCTTCGCGGTCCTGATTGTTGGCAATCATCTCGTTCAGCGCCGATT CAATCTGCGCCAGATCGCTGGAAGTCGCGCGCTCTGCTCCCAACGGGCAATCGCCGGTTCCACCAGATTTCGCACTTCGTCATGGCACTGATAAGCCGTGGGTCGTAGTCATTTTCCAGCACCCA TTGCAGTACGTCAGTGTCGAGGTAATTCCACTGGTTACGCGGTGCCACAAACGCCCCGCGATAACGTTTCATTTCAATCAGCCGCTTCGCCATCAGCGAACGGAACACCCACGGATGATGTTGCG CGAGGTTGCAAACTCCTCACAGAGTTCCGCCTCAGCCGGAAGCGGCGAGCCTGGCACGTATTTGCCGTGAACGATCTGTTTACCCAGCGTAATGACAATGCGATCGGTTTTATTGAGAGTCATGG AGAGTCCTTGTGCTTGTATGTTCTTCTCTACTTTACCCCGATCGATGCATAACGCGGCAACTTTGTAGTACCAGCGTGATGACGTTCGCGTTTGCCGTGCGTGTAATGTAGTACAAACTTATATTGTT GTACTACAATTTAGATCACAAAAAGAACAATGCATAAAAAATGACATGCGTCGGGCAGAAATCTGAAAAGGGATATCAGGCGCTAAACAGGAGGGAAAGAAGAGTATGCTTTCAACGGCTTAGCTA CTCGTTTAAAGGATTAATCATGAAGTTGAATTTTAAGGGATTTTTTAAGGCTGCCGGTTTATTCCCACTGCGCTGATGCTTTCAGGCTGTATCTCGTATGCTCTGGTTTCCCATACCGCAAAGGGTAG TTCAGGAAAGTATCAATCGCAGTCAGACACCATCACTGGGCTATCGCAGGCAAAAGATAGTAATGGAACAAAAGGCTATGTTTTTGTAGGGGAATCGTGGATTACCTTATCACTGATGGTGCCGAT GACATCGTTAAGATGCTCAATGATCCAGCACTTAACCGGCACAATATTCAGGTTGCCGATGACGCAAGATTTGTTTTAAATGCGGGGAAAAAGAAATTTACCGGCACAATATCGCTTTACTACTACG GAATAACGAAGAAGAAAAGGCACTGGCAACGCATTATGGTTTTGCCTGTGGTGTTCAACACTGTACCAGGTCACTGGAAAACCTAAAAGGCACAATCCATGAGAAAAATAAAAACATGGATTACTCA AAGGTGATGGCGTTCTACCATCCATTTAAGTGCGATTTTATGAATACTATTCACCCAGAGGCATTCCGGGATGGTGTTTCCGCAGCATTACTGCCAGTGACTGTTACGCTGGACATCATTACTGCAC CGCTGCAATTTCTGGTTGTATATGCAGTAAACCAATAATCAGTAAGCGGGCAAACCGTTTATGCTGTTTGCCCGCCCACAGATTAATTCAGCACATACTTCTCAATAGCAAACGCCACGCCATCTTCA AGGTTAGATTTGGTGACAAAGTTCGCCACTTCTTTCACTGAAGGAATAGCGTTATCCATCGCCACACCGACGCCTGCATATTAATCATTGCGATATCGTTTTCCTGATCGCCAATCGCCATGATTTCT TCCGGTTTAATACCTAACACGTCGGCCAGTGATTTCACCCCCGTACCTTTGTTAACGCGTTTATCGAGGATTTCGAGGAAGTACGGCGCACTTTTCAGCACGGTATATTCTCTTTCACTTCCTGCGG AATACGCGCGATAGCCTGGTCGAGGATGGCGGGTTCATCAATCATCATCACTTTCAGGAACTGGGTATTGGGGTCCATTTTCTCCGCTTCGCAGAACACCAGCGGAATGGTGGCAACGAAGGATT CATGCACCGTGTGTAGCTGATATCACGGTTGGCGGTGTACAGCGTGGTGCGGTCCAGGGCGTGGAAATGAGAACCGACTTCGCGAGAGAGTTTTTCCAGGAAACGATAGTCGTCATAGCTGAGA GCAGTTTGCGCCACGGTGCTACCATCAGCGGCCTTCTGTACCACGCGCCGTTATAAGTAATGCAGTAGTCGCCCGGCTGTTCCATATGCAGCTCTTTCAGGTAGTTGTGCACACCTGCATACGGG CGACCCGTCGTTAGCACGACATTCACGCCACGGGCGCGAGCTGCGGCAATCGCATTTTTAACGGCGGGTGAAAGGTGTGATCGGGCAGCAGAAGGGTGCCATCCATATCGATAGCAATGAGTTT AATAGCCATGAGTTCCCCAGGTAGATTGGTTCCTGACCCATGCTAACGCGATTCCGCTCAAAAATCAGTACAACACCCGAGGGAAAAGGGGGATGCAACGCGCGTGCGTGCTCCCTTTTTGCTTA GCGGAAGAGTTTCCCTTTCAGCAGTTCCATGCCTGCGGAAAGCAGATCGTTATTGGCTTGTGGTGACACTTCACCTTGCGGTGAGAGCGCATCAATAATCTTCGGCAATTGTTCTGCCAGTAAACT GGAAGCTGACTGGTATCCACGCCAAGTTTTTGCCCGAGATCGGACACCGCATTTGTGCCGAGCGCCGATTCCAGTTGCTCGCCACTAACCGATTGATTGCCCTGTTGATTACTCAGCCAGGTTGA GAGAATGGCCCCTAAGCCGCCACTTTGCAGTTTTTCCACAGCACCTGAATGCCGCCCTGCTCCTCAACCCAACTTAAAATAGCCTGATATTTCCCCGCATCGCCTTTCAGAAAGGCACCGACAACTT CATCAAAAAGCCCCATGATAATCACCTGTAAAGCGTTACGTGTTGACCCAAAAAGTATAGATTTGCGGATGATAATTGCGGATTGCAGAAATAAAAAGGGCGGAGATGATCTCCGCCCTTTTCTTAT AGCTTCTTGCCGGATGCGGCGTGAACGCCTTATCCGGCCTACAAAATCATGAAAATTCAATACATTGCAAGATTTTCGTAGGCCTGATAAGCGTGCGCATCAGGCACGCTCGCATGGTTAGCGCCA TTAAATATCGATATTCGCCGCTTTCAGGGCGTTCTCTTCAATAAACGCACGGCGCGGTTCAACGGCGTCGCCCATCAGCGTGGTGAACAACTGGTCGGCAGCAATCGCATCTTTAACGGTAACCG CAGCATACGACGACTTTCCGGGTCCATAGTGGTTTCCCACAGCTGTTCCGGGTTCATCTCGCCCAGACCTTTATAACGCTGGATGGAGAGGCCGCGACGGGACTCTTTCACCAGCCAGTCCAGCG CCTGCTCGAAGCTGGCTACCGGCTGACGCGCTCGCCACGTTCGATAAACGCATCTTCTTCCAGCAAGCCACGCAGTTTCTCACCCAGCGTGCAGATACGACGATATTCGCCACCGGTGATAAACT CGTGATCCAGCGGATAGTCAGTATCCACACCGTGGGTACGCACGCGAACAATCGGCTCAACAGGTTTTGCTCAGCATTGGTGTGAACATCAAACTTCCACTGGCTGCCGTGCTGTTCTTTGTCGTT CAGTTCGCTGACCAGCGCGTTCACCCAGCGGGTAACGGTCTGCTCATCAGAAAGGTCAGCTTCCGTCAACGTCGGCTGATAGATAAGTCTTTCAGCATTGCTTTCGGATAACGACGCTCCATACG ATTGATCATTTTCTGCGTCGCGTTGTACTCAGATACCAGTTTCTCTAACGCTTCGCCAGCCAATGCCGGTGCACTGGCGTTGGTGTGCAGCGTTGCGCCGTCCAGCGCGATAGAGATTGGTACTG ATCCATCGCTTCGTCGTCTTTAATGTACTGTTCCTGCTTGCCTTTCTTCACTTTGTACAGCGGCGGCTGAGCGATGTAGACGTGACCGCGTTCAACGATTTCCGGCATCTGACGATAGAAGAAGGT CAACAGCAGCGTACGAATGTGGAGCCGTCGACGTCCGCATCGGTCATGATGATGATGCTGTGATAACGCAGTTTGTCCGGGTTGTACTCGTCACGACCGATACCACAGCCAAGCGCGGTGATAA GCGTCGCCACTTCCTGAGAAGAGAGCATCTTATCGAAGCGCGCTTTCTCGACTTGAGGATTTTACCCTTCAGCGGCAGAATCGCCTGGTTCTTGCGGTTACGCCCCTGCTTCGCAGAGCCGCCCG CGGAGTCCCCTTCCACCAGGTACAGTTCGGAAAGCGCCGGATCGCGTTCCTGGCAGTCTGCCAGTTTGCCCGGCAGGCCCGCAAGTCGAGCGCACCTTTACGGCGGGTCATTTCACGCGCGCG ACGCGGCGCTTCACGGGCACGGGCAGCATCGATAATTTTGCCAACCACGATTTTCGCGTCGGTTGGGTTTTCCAGCAGGTATTCTGCCAGCAGTTCGTTCATCTGCTGTTCAACGCCGATTTCACC TCAGAAGAAACCAGTTTGTCTTTGGTCTGGGAGGAGAATTTCGGGTCCGGCACTTTCACGGAAACGACCGCAATCAGGCCTTCACGCGCATCGTCACCGGTGGCGCTGACTTTGGCTTTTTTGCT GTAGCCTTCTTTGTCCATTAGGCGTTCAGGGTACGGGTCATCGCCGCACGGAAGCCTGCCAGGTGAGTACCGCCGTCACGCTGCGGAATGTTGTTGGTAAAGCAGTAGATGTTTTCCTGGAAGCC ATCGTTCCACTGCAACGCCACTTCGACGCCAATACCGTCTTTTTCAGTGAGAAGTAGAAGATATTCGGGTGGATCGGCGTTTTGTTCTTGTTCAGATATTCAACGAACGCCTTGATGCCGCCTTCAT AGTGGAAGTGGTCTTCTTTGCCGTCGCGCTTGTCGCGCAGACGAATGGAAACGCCGGAGTTGAGGAACGACAACTCCGCAGACGTTTCGCCAGAATTTCATATTCGAACTCGGTCACATTGGTGA AGGTTTCGAGGCTGGGCCAGAAACGCACCATGGTGCCGGTTTTTTCAGTCTCGCCGGTAACCGCCAGCGGGGCCTGCGGTACACCGTGTTCGTAGATCTGACGGTGATTTTACCCTCGCGCTGG ATAACCAGCTCCAGTTTTTGCGACAGGGCGTTTACTACCGAAACACCAACGCCGTGCAGACCGCCGGACACTTTATAGGAGTTATCGTCAAATTTACCGCCTGCGTGCAGAACGGTCATGATCACT TCCGCCGCCGA
  • 29. …to this?  FT gene complement(9299..10702)  FT /db_xref="GenBank:2367266”  FT /gene="dnaA”  FT /note="b3702”  FT CDS complement(9299..10702)  FT /db_xref="GI:2367267”  FT /db_xref="PID:g2367267”  FT /function="putative regulator; DNA - replication, repair,  FT restriction/modification”  FT /codon_start=1  FT /protein_id="AAC76725.1”  FT /gene="dnaA”  FT /translation="MSLSLWQQCLARLQDELPATEFSMWIRPLQAELSDNTLALYAPNR  FT FVLDWVRDKYLNNINGLLTSFCGADAPQLRFEVGTKPVTQTPQAAVTSNVAAPAQVAQT  FT QPQRAAPSTRSGWDNVPAPAEPTYRSNVNVKHTFDNFVEGKSNQLARAAARQVADNPGG  FT AYNPLFLYGGTGLGKTHLLHAVGNGIMARKPNAKVVYMHSERFVQDMVKALQNNAIEEF  FT KRYYRSVDALLIDDIQFFANKERSQEEFFHTFNALLEGNQQIILTSDRYPKEINGVEDR  FT LKSRFGWGLTVAIEPPELETRVAILMKKADENDIRLPGEVAFFIAKRLRSNVRELEGAL  FT NRVIANANFTGRAITIDFVREALRDLLALQEKLVTIDNIQKTVAEYYKIKVADLLSKRR  FT SRSVARPRQMAMALAKELTNHSLPEIGDAFGGRDHTTVLHACRKIEQLREESHDIKEDF  FT SNLIRTLSS”  FT /product="DNA biosynthesis; initiation of chromosome  FT replication; can be transcription regulator”  FT /transl_table=11  FT /note="f467; 100 pct identical to DNAA_ECOLI SW: P03004;  FT CG Site No. 851” 
  • 31. An ORF is not a CDS! An ORF is just an open reading frame There are many more ORFs than protein coding genes (CDSs) in a genome Non-coding ORFs CDSs (note ORF can extend upstream of start codon)
  • 32. The Problem of Frameshift Errors Actual sequence 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M S T A K L V K S K A T N L L Y T R N D V S D S E K • V P L N • L N Q K R P I C F I P A T M S P T A R K E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E • V P L N • L N Q K A T N L L Y T R N D V S D S E K E Y R • I S • I K K R P I C F I P A T M S P T A R K Frameshifted sequence after single base error
  • 33. Homology  Similarities in form the cat sat on the mat (sequence) allow us die Katze sass auf der Matte to infer similarities in “meaning” (structure and function)  Homology is not just sequence similarity  Two sequences can be similar without any common ancestry, particularly if low complexity vge|GBant88-2 ITLITCVSVKDNSKRYVVAG vge|GEfae9-178 LTLITCDQATKTTGRIIVIA vge|GSpne1-403 MTLITCDPIPTFNKRLLVNF sortase_staur LTLITCDDYNEKTGVWEKRK
  • 34. Types of Homology  Homologues can be divided into  Orthologues: lines of descent congruent with whole genome  Paralogues: result of gene duplication  Xenologues: result of HGT
  • 35. Homology Searches  The aim of homology searches is to identify sequences within these databases that are homologous to your sequence.  This involves comparing your sequence with all the database sequences  looking for stretches of sequence that appear to be similar  then scoring the matches and ranking them  a measure of the significance of the match is given  Most common program used for homology searches is BLAST
  • 36. Bacterial Genome Dynamics Gene Loss Gene Duplication Gene Gain Drastic downsizing in isolated intracellular niches Horizontal gene transfer by phage, plasmids, pathogenicity islands Bacterial Rapid emergence of Accumulation of genetically uniform pseudogenes and IS Genome pathogens from variable elements after shift to Dynamics ancestral populations new niche Recombination and rearrangements single nucleotide polymorphisms (SNPs) Gene Change
  • 37. Horizontal gene transfer  Horizontal (or lateral) gene transfer denotes any transfer, exchange or acquisition of genetic material that differs from the normal mode of transmission from parents to offspring (vertical transmission). Vertical gene transfer Horizontal gene
  • 38. Bacterial mobile genetic elements  Transposons  pieces of DNA that act as „jumping genes‟ that change location on chromosome or plasmid chromosomal localization.  encode transposase that catalyses the transposition event  can carry resistance or virulence genes  Insertion sequences (IS elements)  transposable elements that encode only the transposase  multiple copies of same IS within genome provide targets for homologous recombination, rearrangements and replicon fusions  Conjugative transposons  normally integrated into the chromosome  excise then transferred to recipient cells by conjugation
  • 39. Bacterial mobile genetic elements  Plasmids  self-replicating extrachromosomalreplicons  usually circular but can be linear  Can carry resistance or virulence genes  Bacteriophages  bacterial virusescan carry virulence genes  can insert into bacterial chromosome as prophages (lysogeny)  Integrons  complex natural cloning and gene expression systems able to capture promoterless gene cassettes by site- specific recombination  allow formation of large arrays of gene cassettes transferred as a whole between different replicons.
  • 40. Genomic islands  large chromosomal regions, part of the flexible gene pool  previously transferred by other mobile genetic elements  present in some bacteria but absent in close relatives  carry multiple genes that increase phenotypic versatility  contribute to dynamic character of bacterial chromosomes and can be excised from the chromosome and transferred to other recipients  pathogenicity islands contain dozens of genes that allow quantum leap to complex new virulence
  • 41. Core genomes and Pangenomes  Core genome  pool of genes shared by all members of a bacterial species  Accessory or dispensable genome  pool of genes present in some but not all genomes within the same bacterial species  Pangenome  global gene repertoire of a bacterial species, comprised of core genome + accessory genome  Metagenome  global gene repertoire of mixed microbial population
  • 42. Escherichia coli Core and Pan-genomes Welch et al. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):17020-4
  • 43. Metagenomics  Environmental shotgun sequencing  DNA extracted from mixed microbial communities sequenced en masse  Assembled into contigs  Typically only small contigs can be obtained
  • 44. Uses of a genome sequence  Gene discovery  Fuelling hypothesis driven research on pathogen biology  Comparative genomics  SNP discovery and genomic epiemiology  Functional genomics  Transcriptomics  Proteomics  Interactome  Structural Genomics  Mass Mutagenesis
  • 45. Haemolytic-uraemic syndrome  Shiga-toxin-producing E. coli (STEC)  bloody diarrhoea; damage to kidneys and brain  anaemia; loss of platelets
  • 46. German E. coli O104:H4 outbreak  May-July 2011  >4000 cases  >40 deaths  Link to sprouting seeds  High risk of haemolytic- uraemic syndrome  Females particularly at risk Frank et al DOI: 10.1056/NEJMoa1106483
  • 47.
  • 48. Take-away messages from the genome  Pathogens don‟t bother with passports!  Not a new strain: something similar seen in Germany ten years ago and in Korea  closest genome-sequenced strain was isolated from Central African Republic in late 1990s, belongs to an enteroaggregative lineage  German STEC probably comes from a lineage circulating in human populations rather than from an animal source (unlike E. coli O157)
  • 49. Take-away messages  Bacteria evolve quickly  Virulence factors in E. coli can jump from one lineage to another on mobile genetic elements  Pathotypes can overlap and evolve  Antibiotic resistance seen where no obvious prior use of antibiotics
  • 50.
  • 51. Take-away messages from genome sequence  Genome sequencing brings the advantages of  open-endedness (revealing the “unknown unknowns”),  universal applicability  ultimate in resolution  Bench-top sequencing platforms now generate data sufficiently quickly and cheaply to have an impact on real-world clinical and epidemiological problems
  • 52. Comprehensive Coverage of Human Microbiome
  • 54. What will you do when you can sequence everything?