3. TIGRTIGR
Topics of Discussion
• Introduction to genome sequencing and
analysis
• Need for “phylogenomic” approaches
• Phylogenomic examples
– Species evolution
– Lateral vs. vertical evolution
– Gene function
– Gene duplications
– Genome rearrangements
4. TIGRTIGR
The Institute for Genomic Research
(TIGR)
• A not for profit institution, staff ~350
• Funded primarily by government
grants
• Departments:
– Research Departments
– Bioinformatics
– Sequencing Core
8. TIGRTIGR
General Steps in Analysis of
Complete Genomes
• Identification/prediction of genes
• Characterization of gene features
• Characterization of genome features
• Prediction of gene function
• Prediction of pathways
• Integration with known biological data
9. TIGRTIGR
Comparative Genomics
• Comparison of genomes between species
• Identify differences
– SNPs
– Indels
– Rearrangements
– Presence/absence of genes, pathways, features
• Correlating with phenotypic differences
• Can be used to improve on every step in genome
analysis
14. TIGRTIGR
Evolution and Genomics Overlap
• Genome sequences contain a record of the
evolution of a species and all its genes
• Evolutionary analysis is the key to
interpreting genome sequences and making
the most use out of them
15. TIGRTIGR
Phylogenomics?
Evolutionary information improves genome analysis
-Classification of multigene families
-Predicting functions
-Origins of genes and pathways
Genomics information improves evolutionary
reconstructions
-More sequences of genes
-Unbiased sampling
-Presence/absence needed to infer certain events
Feedback loop between two types of analysis
TIGRTIGR
17. TIGRTIGR
Why Completeness is Important
• Improves characterization of genome features
– Gene order, replication origins
• Better comparative genomics
– Genome duplications, inversions
• Determination of presence and absence of particular genes and
features is less subjective
• Missing sequence might be important (e.g., centromere)
• Allows researchers to focus on biology not sequencing
• Facilitates large scale correlation studies
• Controls for contamination
18. TIGRTIGR
Uses of Phylogenomics
• Species evolution and systematics
• Lateral versus vertical evolution
• Gene function
• Gene and genome duplications
• Genome rearrangements
20. TIGRTIGR
Species EvolutionI I:
Major Evolutionary Transitions
• Analysis of S. pombe genome (Wood et al 2002)
• Compared the genomes of eukaryotes to those of
prokaryotes
• Asked: “Are there genes in all eukaryotes with no obvious
homologs in any prokaryote?”
• Found ~200 genes which included many with know major
roles in “eukaryotic” features like the cytoskeleton and
chromatin as well as many with no known function
22. TIGRTIGR
Evolutionary Transitions II:
Single- vs. Multi-Cellularity
• Analysis of S. pombe genome (Wood et al. 2002)
• Compared multi-cellular vs. single-cellular species
• Asked “Are there genes in all multi-cellular and not in any
single-cellular?”
• Found only 3
• Concluded that the multicellularity was likely the result of
gene regulatory processes
24. TIGRTIGR
Species Evolution III:
Uncultured Microbes
• Vast majority of
microbes have never
been cultured
• Usually studied
indirectly by cloning
rRNA genes and using
position within rRNA
tree to predict biology
• These predictions are
frequently inaccurate
25. TIGRTIGR
Genomics does not require initial
culturing step.
• Isolate, by filtration, all bacteria in a water sample
• Extract total DNA in very large pieces
• Clone those pieces as BACs into E.coli to get enough.
• Sequence the BACs like a bacterial genome.
Natural
Water
Filter
concentrate
Extract
DNA
Clone
Into
BACs
Sequence
Gene
List
27. TIGRTIGR
Using a rRNA anchor
allowed the
identification of a new
form of phototrophy:
Proteorhodopsin
Beja et al. 2000
28. TIGRTIGR
Uses of Phylogenomics
• Species evolution and systematics
• Lateral versus vertical evolution
• Gene function
• Gene and genome duplications
• Genome rearrangements
30. TIGRTIGR
Examples of Horizontal
Transfers
• Antibiotic resistance genes
• Insertion sequences
• Agrobacterium Ti plasmid
• Toxin degradation genes on plasmids
• Virus and phage gene acquisition and
transfer
• Organelle to nucleus transfers
31. TIGRTIGR
Why Gene Transfers Are Useful to Identify
• Laterally transferred genes frequently involved in
environmental adaptations and/or pathogenicity
• Identification of vectors of gene transfer (e.g.,
transposons, integrons, phage)
• Identify species associations in the environment
(e.g., Thermotoga and Archaea, Nelson et al.)
• Identify organelle derived genomes in eukaryotic
genomes
33. TIGRTIGR
• Claim
– “Hundreds of human genes appear likely to have resulted from
horizontal transfer from bacteria at some point in the vertebrate
lineage.”
• Evidence
– Genes match bacteria but not non-vertebrate eukaryotes
– Or genes have stronger match to bacteria than to non-vertebrates
38. TIGRTIGR
Number of pBVTs Depends
on # of Genomes Analyzed
1 2 3 4 5 Other
0
200
400
600
800
1000
1200
1400
1600
1800
Number of protein sets
Fruit fly
C. elegans
Arabidopsis
Yeast
Parasites
Salzberg et al. 2001
44. TIGRTIGR
Mitochondrial Genome
Integration into A. thaliana chrII
3.2E+063.3E+063.4E+063.5E+063.6E+06D’1 A. thaliana
Mitochondrial
Alternative
Genome
Possible
Insertion
Point
3 D’1A’3C1B3B.C.D.Chromosome II1E+052E+053E+054E+05Alternative Mitochondrial Form03CBA’
45. TIGRTIGR
A. thaliana Nuclear Proteins:
Best Matches to Complete Genomes
0
1000
2000
3000
4000
BestMatches
CHLTE
PORGI
BACSU
MCYTU
BBUR
TREPA
CHLPN
ECOLI
NEIME
RICPR
CAUCR
HELPY
SYNSP
AQUAE
DEIRA
THEMA
AERPE
ARCFU
METJA
METTH
PYRAB
CELEG
YEAST
DROME
B A E
49. TIGRTIGR
A. thaliana T1E2.8 is a
Chloroplast Derived HSP60ARATH -T1E2.8**********ECOLHAEINVIBCHVIBCHRICPRYEASTCHLPNCHLTRAQUAECAMJEHELPYBBURTREPATHEMABACSUDEIRAMCYTUMCYTUSYNSPSYNSPODONT CPSTMYCGEMYCPNCHLPNCHLTRCHLPNCHLTRARCFUARCFUMETJAPYRHOMETTHMETTHYEASTYEASTYEASTYEASTCELEGYEASTYEASTYEASTCELEGYEASTYEASTCELEGYEASTCELEGCELEG
EukaryaArchaeaBacteriaCyano/Cpst
50. TIGRTIGR
Uses of Phylogenomics
• Species evolution and systematics
• Lateral versus vertical evolution
• Gene function
• Gene and genome duplications
• Genome rearrangements
51. TIGRTIGR
Predicting Function
• Identification of motifs
– Short regions of sequence similarity that are indicative of
general activity
– e.g., ATP binding
• Homology/similarity based methods
– Gene sequence is searched against a databases of other
sequences
– If significant similar genes are found, their functional
information is used
• Problem
– Genes frequently have similarity to hundreds of motifs
and multiple genes, not all with the same function
TIGRTIGR
53. TIGRTIGR
Blast Search of H. pylori “MutS”
Score E
Sequences producing significant alignments: (bits) Value
sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10
sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09
sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08
sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07
sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07
• Blast search pulls up Syn. sp MutS#2 with
much higher p value than other MutS
homologs
55. TIGRTIGR
H. pylori and MutS
• Prior to this genome, all species that
encoded a MutS homolog also encoded
a MutL homolog
• Experimental studies have shown
MutS and MutL always work together
in mismatch repair
• Problem: what do we conclude about
H. pylori mismatch repair
58. TIGRTIGR
Phylogenetic Tree of MutS Family
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
66. TIGRTIGR
Evolutionary
Method
PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWN
FUNCTIONS ONTO TREE
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
1234563531A2A3A1B2B3B2A1B1A3A1B2B3BALIGN SEQUENCESCALCULATE GENE TREE1246CHOOSE GENE(S) OF INTEREST2A2A53Species 3Species 1Species 211222311A3A1A2A3A1A2A3A464564562B3B1B2B3B1B2B3B ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Duplication?EXAMPLE AEXAMPLE BDuplication?Duplication?Duplication5 METHODAmbiguous
69. TIGRTIGR
Arabidopsis thalianaGP9651815g
Drosophila melanogasterGP72929
Homo sapiensSPP49917DNL4 HUMAN
Gallus gallusGP15778121dbjBAB6
Xenopus laevisGP18029886gbAAL5
Candida albicansSPP52496DNLI C
Saccharomyces cerevisiaeGP1151
Schizosaccharomyces pombeGP700
Camelpox virusGP18483081gbAAL7
Variola major virusGP439074gbA
Cowpox virusGP20153167gbAAM136
Vaccinia virusGP2772802gbAAB96
VIRUS vaccinia 9791118refNP 06
Vaccinia virus strain Tian Tan
Monkeypox virusGP17529940gbAAL
Homo sapiensSPP49916DNL3 HUMAN
Mus musculusGP1794221gbAAC5300
Xenopus laevisGP18029884gbAAL5
lumpy skin disease virusGP1514
Swinepox virusGP18448623gbAAL6
Myxoma virusGP6523988gbAAF1502
Rabbit fibroma virusGP392838gb
Fowlpox virusGP453602embCAA828
Drosophila melanogasterGP72996
Arabidopsis thalianaSPQ42572DN
Oryza sativaGP16905197gbAAL310
Crithidia fasciculataGP312384e
Caenorhabditis elegansSPQ27474
Drosophila melanogasterGP72916
Homo sapiensSPP18858DNL1 HUMAN
Mus musculusSPP37913DNL1 MOUSE
Rattus norvecusSPQ9JHY8DNL1 RA
Xenopus laevisSPP51892DNL1 XEN
Plasmodium falciparumGP1815859
Schizosaccharomyces pombeSPP12
Saccharomyces cerevisiaeSPP048
Aeropyrum pernixSPQ9YD18DNLI A
Acidianus ambivalensSPQ02093DN
Sulfolobus solfataricusSPQ980T
Sulfolobus shibataeSPQ9P9K9DNL
Sulfolobus tokodaiiSPQ976G4DNL
Aquifex aeolicusGP2983805gbAAC
Aquifex aeolicusSPO67398DNLI A
Pyrobaculum aerophilumGP409906
uncultured crenarchaeote 74A4G
Thermoplasma acidophilumSPQ9HJ
Thermoplasma volcaniumOMNINTL0
Methanosarcina acetivorans str
Archaeoglobus fuldusSPO29632DN
A METAC 19916535gbAAM05952.1 D
Pyrococcus abyssiSPQ9V185DNLI
Pyrococcus horikoshiiSPO59288D
Pyrococcus furiosusSPP56709DNL
Thermococcus kodakaraensisGP10
Thermococcus fumicolansSPQ9HH0
Methanopyrus kandleri AV19GP19
Methanococcus jannaschiiSPQ576
Halobacterium sp.SPQ9HR35DNLI
Streptomyces coelicolorSPQ9FCB
Lymantria dispar nucleopolyhed
Ligase IV
Viral ligases
Ligase I
Archaeal Ligase
DNA Ligase Tree
70. TIGRTIGR
Problems with Similarity Based
Functional Prediction
• Prone to database error propagation.
• Cannot identify orthologous groups reliably.
• Perform poorly in cases of evolutionary rate variation and
non-hierarchical trees (similarity will not reflect evolutionary
relationships in these cases)
• May be misled by modular proteins or large
insertion/deletion events.
• Are not set up to deal with expanding data sets.
TIGRTIGR
73. TIGRTIGR
AlkA Domain (O6-Me-G glycosylase)Ogt Domain (O6-Me-G alkyltransferase)Ada Domain (transcriptions regulator)Ada E. coliAda H. inflOgt E. coliOgt H. inflOgt Gram+Ogt D. radioAlkA Gram+AlkA E. coliMGMT Euks
Alkylation Repair Genes
75. TIGRTIGR
Types of Molecular Homology
• Homologs: genes that are descended from a common ancestor (e.g.,
all globins)
• Orthologs: homologs that have diverged after speciation events (e.g.,
human and chimp β-globins)
• Paralogs: homologs that have diverged after gene duplication events
(e.g., α and β globin).
• Xenologs: homologs that have diverged after lateral transfer events
• Positional homology: common ancestry of specific amino acid or
nucleotide positions in different genes
76. TIGRTIGR
Caution: Homology Based
Predictions Have Many Flaws
• Not all orthologs have the same function
• Homology cannot be used to characterize
novel pathways (e.g., D. radiodurans)
• Absence of genes can be important to
phenotypes (e.g., pathogenicity)
80. TIGRTIGR
Unusual Features of D. radiodurans
DNA Repair Genes
Process Genes
Nucleotide excision repair Two UvrAs
Base excision repair Four MutY-Nths
Recombination RecD but not RecBC
Replication Four Pol genes
dNTP pools Many MutTs, two RRases
Other UVDE
81. TIGRTIGR
Problem:
List of DNA repair gene homologs
in D. radiodurans genome is not
significantly different from other
bacterial genomes of the similar size
82. TIGRTIGR
Repair Studies in Different Species
(determined by Medline searches as of 1998)
Humans 7028
E. coli 3926
S. cerevisiae 988
Drosophila 387
B. subtilits 284
S. pombe 116
Xenopus 56
C. elegans 25
A. thaliana 20
Methanogens 16
Haloferax 5
Giardia 0
88. TIGRTIGR
Chlorobium tepidum Strain TLS
C. tepidum mat in highly sulfidic
“Travelodge Stream”,
Rotorua, New Zealand
(from Castenholz and Pierson, 1995)
Phase contrast photomicrograph
of the 48-hours culture and electron
micrograph of thin cell section
(from Wahlund et al, 1991)
89. TIGRTIGR
Phylogenetic Profile -
C. tepidum Chlorophyll
Synthesis
Wu and Eisen, unpublished
5002_cobalamin biosynthesis protein CbiG/precorrin-4 C11-methyltransferase3939_precorrin-3B C17-methyltransferase/precorrin-8X methylmutase cbiJH882_cobyric acid synthase cbiP3160_dsrN protein dsrN862_cobyrinic acid a,c-diamide synthase cbiA-14010_cobN protein, putative2641_magnesium-protoporphyrin methyltransferase bchH-31498_magnesium-protoporphyrin methyltransferase bchH-14003_cobN protein, putative2636_magnesium-protoporphyrin methyltransferase bchH-24008_magnesium-chelatase, subunit I chlI-24007_magnesium-chelatase, subunit D/I family1504_magnesium-chelatase, subunit I chlI-1
93. TIGRTIGR
Uses of Phylogenomics
• Species evolution and systematics
• Lateral versus vertical evolution
• Gene function
• Gene and genome duplications
• Genome rearrangements
94. TIGRTIGR
Uses of Phylogenomics
• Species evolution and systematics
• Lateral versus vertical evolution
• Gene function
• Gene and genome duplications
• Genome rearrangements
95. TIGRTIGR
Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Improves functional predictions
• Helps identify mechanisms of duplication
• Can be used to study mutation processes in
different parts of a genome
• Lineage specific duplications may be indicative
of species’ specific adaptations
96. TIGRTIGR
Levels of Paralogy Within A Genome
• All
– All members of a gene family are linked together
• Top matches
– Only top matching pairs are linked together.
Therefore, if in a large gene family, only the pair
from the most recent duplication event is included
• Recent
– Operational definition based on comparison to other
species. Only pairs which are more similar to each
other than to selected other species are included.
97. TIGRTIGR
C. pneumoniae Paralogs by Position
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
98. TIGRTIGR
C. pneumoniae Paralogs -
Lineage Specific
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
100. TIGRTIGR
B. anthracis lineage specific duplications
ORF04205 molybdopterin biosynthesis protein MoeA (moeA)
ORF05907 molybdopterin biosynthesis protein MoeA (moeA)
ORF02636 molybdopterin biosynthesis protein MoeA (moeA)
ORF04204 molybdopterin biosynthesis protein MoeB, putative
ORF05908 molybdopterin biosynthesis protein MoeB, putative
ORF02634 molybdopterin biosynthesis protein MoeB, putative
ORF05904 molybdopterin converting factor, subunit 1 (moaD)
ORF02639 molybdopterin converting factor, subunit 1 (moaD)
ORF04206 molybdopterin converting factor, subunit 2 (moaE)
ORF05905 molybdopterin converting factor, subunit 2 (moaE)
ORF02638 molybdopterin converting factor, subunit 2 (moaE)
Based on Read et al. submitted
101. TIGRTIGR
S. aureus Lineage Specific Duplications
ORF02715 4-diphosphocytidyl-2C-methyl-D-erythritol synthase, putative
ORF02712 alcohol dehydrogenase, zinc-containing
ORF00701 alpha-hemolysin precursor (2X)
ORF00717 antibacterial protein
ORF02597 capsular polysaccharide biosynthesis proteins CapABC (2X)
ORF00804 cell wall hydrolase (3X)
ORF00657 cell wall surface anchor family protein (2X)
ORF00358 clumping factor (2X)
ORF01758 deoxyribose-phosphate aldolase (deoC)
ORF02579 purine nucleoside phosphorylase (deoD)
ORF01031 drug transporter, putative
ORF00805 endopeptidase resistance gene (eprH)
ORF00706 exotoxin 1,3,4,5, unknown (2X)
ORF02184 fibronectin(2X)
ORF00097 glycosyl transferase, group 1 family protein (3X)
ORF02086 IgG-binding protein (2X)
ORF02431 integrase/recombinase, core domain family (3X)
Analysis done with S. Gill
102. TIGRTIGR
S. aureus Lineage Specific Duplications
ORF00137 conserved hypothetical protein
ORF00138 conserved hypothetical protein
ORF00139 conserved hypothetical protein
ORF00140 conserved hypothetical protein
ORF00141 conserved hypothetical protein
ORF00142 conserved hypothetical protein
ORF00143 conserved hypothetical protein
ORF00144 conserved hypothetical protein
ORF00145 conserved hypothetical protein
ORF00146 conserved hypothetical protein
ORF00148 conserved hypothetical protein
ORF00667 conserved hypothetical protein
ORF01251 conserved hypothetical protein
ORF02160 conserved hypothetical protein
ORF02166 conserved hypothetical protein
ORF02170 conserved hypothetical protein
ORF02171 conserved hypothetical protein
ORF02507 conserved hypothetical protein
ORF02745 conserved hypothetical protein
ORF02760 conserved hypothetical protein
ORF02762 conserved hypothetical protein
ORF02763 conserved hypothetical protein
ORF02766 conserved hypothetical protein
ORF02768 conserved hypothetical protein
ORF02769 conserved hypothetical protein
ORF02770 conserved hypothetical protein
ORF02771 conserved hypothetical protein
ORF02772 conserved hypothetical protein
ORF02773 conserved hypothetical protein
ORF02774 conserved hypothetical protein
ORF02896 conserved hypothetical protein
ORF02974 conserved hypothetical protein
ORF02711 conserved hypothetical protein UPF0007
ORF02614 conserved hypothetical protein, authentic frameshift
ORF00286 hypothetical protein
ORF00338 hypothetical protein
ORF00361 hypothetical protein
ORF00412 hypothetical protein
ORF00415 hypothetical protein
ORF00614 hypothetical protein
ORF00697 hypothetical protein
ORF00703 hypothetical protein
ORF00705 hypothetical protein
ORF00875 hypothetical protein
ORF00876 hypothetical protein
ORF00877 hypothetical protein
ORF00879 hypothetical protein
ORF00888 hypothetical protein
ORF00889 hypothetical protein
ORF01024 hypothetical protein
ORF01041 hypothetical protein
ORF01089 hypothetical protein
ORF01091 hypothetical protein
ORF01092 hypothetical protein
ORF01093 hypothetical protein
ORF01095 hypothetical protein
ORF01446 hypothetical protein
ORF01462 hypothetical protein
ORF01918 hypothetical protein
ORF02099 hypothetical protein
ORF02102 hypothetical protein
ORF02158 hypothetical protein
ORF02159 hypothetical protein
ORF02172 hypothetical protein
ORF02430 hypothetical protein
ORF02434 hypothetical protein
ORF02530 hypothetical protein
ORF02531 hypothetical protein
ORF02532 hypothetical protein
ORF02533 hypothetical protein
ORF02534 hypothetical protein
Analysis done with S. Gill
103. TIGRTIGR
Lineage Specific Duplications in Wolbachia wMel
Annotation
ankyrin repeat domain protein
ankyrin repeat domain protein
ankyrin repeat domain protein
ankyrin repeat domain protein
ankyrin repeat domain protein
ankyrin repeat domain protein
ankyrin repeat domain protein
conserved domain protein
conserved domain protein
conserved domain protein
conserved domain protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
conserved hypothetical protein
FRAMESHIFT
conserved hypothetical protein
POINT MUTATION
conserved hypothetical protein,
degenerate
conserved hypothetical protein,
FRAMESHIFT
conserved hypothetical protein,
FRAMESHIFT
conserved hypothetical protein,
FRAMESHIFT
conserved hypothetical protein,
FRAMESHIFT
conserved hypothetical protein,
interruption-C
conserved hypothetical protein,
POINT MUTATION
conserved hypothetical protein,
POINT MUTATION
conserved hypothetical protein,
truncated
conserved hypothetical protein,
truncation
DNA mismatch repair protein
MutL (mutL)
DNA repair protein RadC,
putative
DNA repair protein RadC,
putative, truncation
DNA repair protein RadC,
truncation
DnaJ domain protein
DnaJ domain protein
exopolysaccharide synthesis
protein ExoD-related protein
exopolysaccharide synthesis
protein ExoD-related protein
HNH endonuclease family
protein
HNH endonuclease family
protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
hypothetical protein
major facilitator family
transporter
major facilitator family
transporter
major facilitator family
transporter
membrane protein, putative
membrane protein, putative
membrane protein, putative
MutL family protein
Na+/H+ antiporter family protein
Na+/H+ antiporter, putative
permease, putative
portal protein, FRAMESHIFT
portal protein, FRAMESHIFT
prophage LambdaW1, DNA
methylase
prophage LambdaW1, terminase
large subunit, putative
prophage LambdaW2, ankyrin
repeat domain protein
prophage LambdaW2, ankyrin
repeat domain protein
prophage LambdaW2, baseplate
assembly protein J, putative
prophage LambdaW2, baseplate
assembly protein V, putative
FRAMESHIFT
prophage LambdaW2, baseplate
assembly protein V, putative
FRAMESHIFT
prophage LambdaW2, baseplate
assembly protein W, putative
prophage LambdaW2, minor tail
protein Z, putative,
FRAMESHIFT
prophage LambdaW2, site-
specific recombinase, resolvase
family
prophage LambdaW4, ankyrin
repeat domain protein
prophage LambdaW4, DNA
methylase
prophage LambdaW4, portal
protein, FRAMESHIFT
prophage LambdaW4, portal
protein, FRAMESHIFT
prophage LambdaW4, terminase
large subunit, putative
prophage LambdaW5, ankyrin
repeat domain protein
prophage LambdaW5, ankyrin
repeat domain protein
prophage LambdaW5, ankyrin
repeat domain protein
prophage LambdaW5, baseplate
assembly protein J, putative,
FRAMESHIFT
prophage LambdaW5, baseplate
assembly protein V, putative
prophage LambdaW5, baseplate
assembly protein W, putative
prophage LambdaW5, minor tail
protein Z, putative, degenerate,
FRAMESHIFT
prophage LambdaW5, site-
specific recombinase, resolvase
family
regulatory protein RepA, putative
regulatory protein RepA, putative
reverse transcriptase, putative
reverse transcriptase, putative
reverse transcriptase, putative
sodium/alanine symporter family
protein
sodium/alanine symporter family
protein
TenA/THI-4 family protein
transcriptional regulator
transcriptional regulator
transcriptional regulator
transcriptional regulator
transcriptional regulator
transcriptional regulator
transcriptional regulator, putative
translation elongation factor Tu
(tuf)
translation elongation factor Tu
(tuf)
transposase, degenerate
transposase, IS4 family
transposase, IS4 family
transposase, IS4 family
transposase, IS5 family,
interruption-N
transposase, IS5 family,
truncation
transposase, putative, degenerate
transposase, putative, degenerate
transposase, putative, degenerate
type IV secretion system protein
VirB4, putative
UDP-N-acetylglucosamine
pyrophosphorylase-related
protein
104. TIGRTIGR
MutL Duplication in Wolbachia wMel
ORF01096 DNA mismatch repair protein MutL (mutL)
ORF00446 MutL family protein
106. TIGRTIGR
Superoxide Dismutase Duplication
in D. radiodurans
D. radiodurans 2
D. radiodurans 1
V. cholerae
E. coli
M. tuburculosis
B. subtilis
A. aeolicus 1
A. aeolicus 2
C. elegans
Yeast
see White et al. (1999)
109. TIGRTIGR
Uses of Phylogenomics
• Species evolution and systematics
• Lateral versus vertical evolution
• Gene function
• Gene and genome duplications
• Genome rearrangements
110. TIGRTIGR
X-files
Eisen et al. 2000. Genome Biology 1(6): 11.1-11.9
Also see Tillier and Collins. 2000. Nature Genetics
26(2):195-7 and Suyama and Bork. 2001. Trends Genetics
17: 10-13.
111. TIGRTIGR
V. cholerae vs. E. coli All
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
112. TIGRTIGR
V. cholerae vs. E. coli Best
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
113. TIGRTIGR
V. cholerae vs. E. coli if Top
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
114. TIGRTIGR
V. cholerae vs. E. coli
Top Matches, Rotated
0
1000000
2000000
3000000
4000000
5000000
E. coli
ORF Coordinates
0 500000 1000000 1500000 2000000 2500000 3000000
V. cholerae ORF Coordinates
115. TIGRTIGR
Duplication and Gene Loss Model
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
E
F
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
A
C
D
F
A’
B’
E’
E. coli
E. coli
B
C
D
F
A’
B’
D’
E’
V. cholerae
A
B
C
D
E
F
A’
B’
C’
D’
E’
F’
117. TIGRTIGR
V. cholerae vs. E. coli
Best Matching Proteins by Location
0
1000000
2000000
3000000
4000000
5000000
E. coli
ORF Coordinates
0 500000 1000000 1500000 2000000 2500000 3000000
V. cholerae ORF Coordinates
118. TIGRTIGR C. trachomatis MoPn
C.pneumoniaeAR39
Origin
Terminus
C. trachomatis vs C. pneumoniae Dot Plot
119. TIGRTIGR
M. leprae vs. M. tuberculosis Whole
Genome Alignment
0
1000000
2000000
3000000
4000000
Mycobacterium tuberculosis
0 1000000 2000000 3000000
Mycobacterium leprae
120. TIGRTIGR
B. subtilis vs. S. auerus
0
500
1000
1500
2000
2500
3000
2632200 2632700 2633200 2633700 2634200 2634700 2635200 2635700 2636200 2636700
analysis w/ S. Gill
121. TIGRTIGR
P. putida vs. P.aeruginosa Orthologs
9945700
9946700
9947700
9948700
9949700
9950700
9951700
0 2000 4000 6000 8000
Series1
analysis w/ K. Nelson
124. TIGRTIGR
Why are Inversions Symmetrical
Around Origin
• Genetic studies in Salmonella and E. coli
suggest that there may be strong selection
against other inversions
• See:
– Mahan, Segall, Schmid and Roth
– Liu and Sanderson
– Rebollo, Francois, and, Louarn