SlideShare ist ein Scribd-Unternehmen logo
1 von 148
TIGRTIGR
Topics of Discussion
• Introduction to phylogenomics
• Phylogenomics Examples
– Functional prediction
– Identifying “unusual” genes in genomes
– Gene duplication
– Genetic exchange within genomes
– Gene loss
– Horizontal gene transfer
– Specialization
– Comparing close relatives
– Species evolution
TIGRTIGRTIGRTIGR
“Nothing in biology makes sense
except in the light of evolution.”
T. H. Dobzhansky (1973)
TIGRTIGR
TIGRTIGR
Uses of Evolutionary Analysis in
Molecular Biology
• Identification of mutation patterns (e.g., ts/tv ratio)
• Amino-acid/nucleotide substitution patterns useful in
structural studies (e.g., rRNA)
• Sequence searching matrices (e.g., PAM, Blosum)
• Motif analysis (e.g., Blocks)
• Functional predictions
• Classifying multigene families
• Evolutionary history puts other information into
perspective (e.g., duplications, gene loss)
TIGRTIGR
TIGRTIGR
Phylogenomic Analysis
Phylogenomics involves combining evolutionary
reconstructions of genes, proteins, pathways, and
species with analysis of complete genome
sequences.
TIGRTIGR
Why use Phylogenomics?
Evolutionary information improves genome analysis
-Classification of multigene families
-Predicting functions
-Origins of genes and pathways
Genomics information improves evolutionary
reconstructions
-More sequences of genes
-Unbiased sampling
-Presence/absence needed to infer certain events
Feedback loop between two types of analysis
TIGRTIGR
TIGRTIGR
TIGRTIGR
Uses of Phylogenomics I:
Functional Predictions
TIGRTIGR
Predicting Function
• Identification of motifs
• Homology/similarity based methods
– Highest hit
– Top hits
– Clusters of orthologous groups
– HMM models
– Structural threading and modeling
– Evolutionary reconstructions
TIGRTIGR
TIGRTIGR
Types of Molecular Homology
• Homologs: genes that are descended from a common
ancestor (e.g., all globins)
• Orthologs: homologs that have diverged after speciation
events (e.g., human and chimp β-globins)
• Paralogs: homologs that have diverged after gene
duplication events (e.g., α and β globin).
• Xenologs: homologs that have diverged after lateral
transfer events
• Positional homology: common ancestry of specific amino
acid or nucleotide positions in different genes
TIGRTIGR
TIGRTIGR
Blast Search of H. pylori “MutS”
Score E
Sequences producing significant alignments: (bits) Value
sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10
sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09
sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08
sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07
sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07
• Blast search pulls up Syn. sp MutS#2 with
much higher p value than other MutS
homologs
TIGRTIGR
H. pylori and MutS
• Prior to this genome, all species that
encoded a MutS homolog also encoded a
MutL homolog
• Experimental studies have shown MutS and
MutL always work together in mismatch
repair
• Problem: what do we conclude about H.
pylori mismatch repair
TIGRTIGR
Table 3. Presence of MutS Homologs in Complete Genomes Sequences
Species # of MutS
Homologs
Bacteria
Escherichia coli K12 1
Haemophilus influenzae Rd KW20 1
Neisseria gonorrhoeae 1
Helicobacter pylori 26695 1
Mycoplasma genitalium G-37 0
Mycoplasma pneumoniae M129 0
Bacillus subtilis 169 2
Streptococcus pyogenes 2
Synechocystis sp. PCC6803 2
Treponema pallidum Nichols 1
Borrelia burgdorferi B31 2
Aquifex aeolicus 2
Deinococcus radiodurans R1 2
Archaea
Archaeoglobus fulgidus VC-16, DSM4304 0
Methanococcus janasscii DSM 2661 0
Methanobacterium thermoautotrophicum ∆H 1
Eukaryotes
Saccharomyces cerevisiae 6
Homo sapiens 5
TIGRTIGR
MutS Alignment
EEDLKNRLCQKF . DA . HYNT IWMPT IQA I SN IDCLLA I TRTSEYLGAPSC
DTSLKDCMRRLFCNFDKNHKDWQSAVEC IAVLDVLLCLANYSQGGDGPMC
CSAEWLDFLEK . FS . . EHYHSLCKAVHHLATVDC I FSLAKV . . AKQGDYC
SELQYKEFLNK . I T . . AEYTELRK I TLNLAQYDC I LSLAAT . . SCNVNYV
EYELYKELRER . VV . . KELDKVGNNASAVAEVDF IQSLAQ I . . AYEKDWA
EYELFTELREK . VK . . QY I PRLQQLAKQMSELDALQCFAT I . . SENRHYT
EYE I FTEVRAT . VA . . EKAQP IRDVAKAVAA IDVLAGLAEV . . AVYQGYC
EQRVLKS I TDE . IV . . SHHKTLRSLANALDELD I STSLATL . . AQEQDFV
EAN I IDLFKRK . F I . . DRSNVVRQVATTLGYLDTLSSFAVL . . ANERNLV
QDA IVKE IVN I . SS . . GYVEPMQTLNDVLAQLDAVVSFAHVSNGAPVPYV
QSALVRE I IN I . TL . . TYTPVFEKLSLVLAHLDV IASFAHTSSYAP I PY I
EEER I LRQLSDQVL . . EVLLDLEHLLA IATRLDLATARVRY . . . SFWLGA
EVRKVLQR I TEY IG . . DYAKELLESFEACVEVDFQQCKYRF . . SKLVEGS
E I ER I LRVLTEKTA . . EYTEELFLDLQVLQTLDF I FAKARY . . AKAVKAT
TYMIVCKLLSE . IY . . EH IHCLYKLSDTVSMLDMLLSFAHA . . CTLSDYV
SEETVDELLDK . IA . . TH I SELFMIAEAVA I LDLVCSFTYN . . LKENNYT
ETLLMYQLQCQ . VL . . ARAAVLTRVLDLASRLDVLLALASA . . ARDYGYS
E I E I LFSLQEQ . I L . . RRKTQLTAYN I LLSELE I LLSFAQV . . SAERNYA
RPT IVDEVDSKTNTQLNGFLKFKSLRHPCFNLGA . . . TTA . KDF I PND I E
RPE IVLP . . GEDTHP . . . FLEFKGSRHPC I TKTF . . . FG . . DDF I PND I L
RPTVQEE . . . . . . R . . . . K IV IKNGRHPV IDVLL . . GEQ . . DQYVPNNTD
RPTFVNG . . . . . QQ . . . . A I IAKNARNP I I ESLD . . . . . . . VHYVPND IM
KPQ IHE . . . . . . GY . . . . EL I I EEGRHPV I EEF . . . . . V . . ENYVPNDTK
KPEFSK . . . . . . D . . . . . EVEV I EGRHPVVEKVM. . . DS . . QEYVPNNCM
RP IMQM. . . . . EPG . . . . L ID I EAGRHPVVEQSL . . . GA . . GFFVANDTQ
RPVVDD . . . . . SH . . . . . AHTV IQGRHP IVEKGL . . SHKL . I PFTPNDCF
CPKVDE . . . . . SN . . . . . KLEVVNGRHLMVEEGL . . SARSLETFTANNCE
RPA I LEK . . . . GQG . . . . R I I LKASR . . . VEVQD . . . . E . . IAF I PNDVY
RPKLHPM. . . DSER . . . . RTHL I SSRHPVLEMQD . . . . D . . I SF I SNDVT
HPPQWL . . . TPGDEK . . . P I TLRQLRHPLLHWQA . . EKEGGPAVVP I TLT
FPDFGE . . . . .WVE . . . . . . . LYEARHPVLVLVKED . . . . . VVPVG I LLK
KP IMND . . . . . TG . . . . . F IRLKKARHPLLPP . . . . . . . . . DQVVAND I E
RPEFTD . . . . . . . . . . . . TLA IKQGWHP I LEK I S . . . . A . . EKP IANNTY
I P I FTN . . . . . . . . . . . . NLL IRDSRHPLLEKVL . . . . . . . KNFVPNT I S
RPRYSPQ . . . . VL . . . . . GVR IQNGRHPLMELCA . . . . . . . RTFVPNSTE
EPQLVE . . . . . DEC . . . . I LE I INGRHALYETFL . . . . . . . DNY I PNSTM
LGKE . . . . . . QPR . . . . . .
IGCE . . . EEAEEHGKAY . .
LSED . . . . . . SER . . . . . .
MSPE . . . . . . NGK . . . . . .
LDRD . . . . . . SF . . . . . . .
MGDN . . . . . . RQ . . . . . . .
LGHD . . . . . . HWHPD . . . .
VGNGNV . . . . N . . . . . . . .
LAKD . . . . . . N . . . . . . . .
FEKD . . . . . . KQM. . . . . .
LESG . . . . . . KGD . . . . . .
IDSQ . . . . . . IR . . . . . . .
EKKG . . . . . . . . . . . . . . .
LGRD . . . . . . FS . . . . . . .
VTE . . . . . . . GSN . . . . . .
STKH . . . . . . SSS . . . . . .
CGGD . . . . . . KGR . . . . . .
IDGG . . LFSELSWCEQNKG
. LGLLTGANAAGKST I LRMAC IAV IMAQMGC
. CVLVTGPNMGGKSTL IRQAGLLAVMAQLGC
. VMI I TGPNMGGKSSY IKQVAL I T IMAQ IGS
. IN I I TGPNMGGKSSY IRQVALLT IMAQ IGS
. IHV I TGPNMAGKSSY IRQVGVLTLLSH IGS
.MLL I TGPNMSGKSTYMRQ IAL I S IMAQ IGC
. LV I LTGPNASGKSCYLRQVGL IQLMAQTGS
. IWL I TGPNMAGKSTFLRQNA I I S I LAQ IGS
. LWV I TGPNMGGKSTFLRQNA I IV I LAQ IGC
. FH I I TGPNMGGKSTY IRQTGV IVLMAQ IGC
. FL I I TGPNMGGKSTY IRQVGV I SLMAQ IGC
. V IA I TGPNTGGKTVTLKTLGLVALMAKVGL
. . L I LTGPNTGGKTVALKTLGLSVLMFQSA I
. T IV I TGPNTGGKTVTLKTLGLLTLMAQSGL
. FL I I TGPNMSGKSTYLKQ IALCQ IMAQ IGS
. LQ I I TGCNMSGKSVYLKQVAL IC IMAQMGS
. VKV I TGPNSSGKS IYLKQVGL I TFMALVGS
R I IVVTGANASGKSVYLTQNGL IVYLAQ IGC
YVPCESA . VLTP IDR IMTRLGANDN IMQGKSTFFVELAETKK I LD . . . . .
YVPAEKC . RLTPVDRVFTRLGASDR IMSGESTFFVELSETAS I LR . . . . .
YVPAEEA . T IG IVDG I FTRMGAADN IYKGRSTFMEELTDTAE I IR . . . . .
FVPAEE I . RLS I FENVLTR IGAHDD I INGDSTFKVEMLD I LH I LK . . . . .
F I PARRA . K I PVVDALFTR IGSGDVLALGVSTFMNEMLEVSN I LN . . . . .
FVPAKKA . VLP I FDQ I FTR IGAADDL I SGQSTFMVEMLEAKNA IV . . . . .
F I PAKTA . TLS ICDR I FTRVGAVDDLATGQSTFMVEMNETAN I LN . . . . .
FVPASNA . R IG IVDQ I FSR IGSADNLYQQKSTFMVEMMETSF I LK . . . . .
FVPCSKA . RVG IVDKLFSRVGSADDLYNEMSTFMVEMI ETSF I LQ . . . . .
FVPCESA . EVS IVDC I LARVGAGDSQLKGVSTFMAEMLETAS I LR . . . . .
FVPCEEA . E IA IVDA I LCRVGAGDSQLKGVSTFMVE I LETAS I LK . . . . .
Y I PAKETVEMPWFAQ I LAD IGDEQSLQQNLSTFSGH ICR I IR I LQALPSG
PVPASPNSKLPLFEKVFTD IGDEQS I EQNLSTFSAHVKNMAEFLP . . . . .
H I PADEGSEAAVFEHVFAD IGDEQS I EQSLSTFSSHMVN IVG I LE . . . . .
YVPAEYS . SFR IAKQ I FTR I STDDD I ETNSSTFMKEMKE IAY I LH . . . . .
G I PALYG . SFPVFKRLHARV . CNDSMELTSSNFGFEMKEMAYFLD . . . . .
FVPAEEA . E IGAVDA I FTR IHSCES I SLGLSTFMIDLNQVAKAVN . . . . .
FVPAERA . R IG IADK I LTR IRTQETVYKTQSSFLLDSQQMAKSLS . . . . .
C
S
A
A
A
S
A
A
A
A
A
A
A
G
A
S
L
. . . . . . . . . . . . .MATNRSLLVVDELGRGGSSSDGFA I
. . . . . . . . . . . . . HATAHSLVLVDELGRGTATFDGTA I
. . . . . . . . . . . . . KATSQSLV I LDELGRGTSTHDG IA I
. . . . . . . . . . . . . NCNKRSLLLLDEVGRGTGTHDG IA I
. . . . . . . . . . . . . NATEKSLV I LDEVGRGTSTYDG IA I
. . . . . . . . . . . . . NATKNSL I LFDE IGRGTSTYDGMAL
. . . . . . . . . . . . . HATAKSLVLLDE IGRGTATFDGLA I
. . . . . . . . . . . . . NATRRSFV IMDE IGRGTTASDG IA I
. . . . . . . . . . . . . GATERSLA I LDE IGRGTSGKEG I S I
. . . . . . . . . . . . . SATKDSL I I IDELGRGTSTYDGFGL
. . . . . . . . . . . . . NASKNSL I IVDELGRGTSTYDGFGL
VQDVLDPE IDSPNHP I FPSLVLLDEVGAGTDPTEGSAL
. . . . . . . . . . . . . KSDENTLVL IDELGAGTDP I EGSAL
. . . . . . . . . . . . . QVNENSLVLFDELGAGTDPQEGAAL
. . . . . . . . . . . . . NANDKSL I L IDELGRGTNTEEG IG I
. . . . . . . . . . . . . D INTETLL I LDELGRGSS IADGFCV
. . . . . . . . . . . . . NATAQSLVL IDEFGKGTNTVDGLAL
. . . . . . . . . . . . . LATEKSL I L IDEYGKGTD I LDGPSLF
Y
ESVLHHVATH I
SAVVKELAET I
YATLEYF IRDV
AL IKYFSELS
KA IVKY I SEKL
QA I I EYVHDH I
WSVAEYLAGE I
YGCLKYLST IN
YATLKYLLENN
WA I SEY IATK I
WA IAEH IASK I
IALLRHLADQP
IG I LEYLKKKK
MS I LDDVHRTN
YAVCEYLLSLK
LAVTEHLLRTE
AAVLRHWLARG
GS IMLNMSKSE
. QSLGF . FATHYGTLASSFKHHPQ . VRPLKMS I L . . . VDE . . . . . A . . . .
. KCRTL . FSTHYHSLVEDYSKSVC . VRLGHMACM. . . VENECEDPS . . . .
. KSLTL . FVTHYPPVCELEKNYSHQVGNYHMGFL . . . VSEDESKLDPGAA
. DCPL I LFTTHFPMLGE IKSPL . . . IRNYHMDYV . . . . EEQKTGED . . . .
. KAKTL . LATHFLE I TELEGK I EG . VKNYHMEVE . . . . . . . . . . . KT . . .
. GAKTL . FSTHYHELTVLEDKLPQ . LKNVHVRAE . . . . . . . . . . . EY . . .
. QART I . FATHYHELNELASLLEN . VANFQVTVK . . . . . . . . . . . EL . . .
. HSRTL . FATHAHQLTNLTKSFKN . VECYCTNLS . . . . . IDRD . . . . . . .
. QCRTL . FATHFGQELKQ I IDNKC . SKGMSEKVK . . . . . . FYQSG I TDLG
. GAFCM. FATHFHELTALANQ I PT . VNNLHVTALT . . . . . . . . . . . . . . .
. GCFAL . FATHFHELTELSEKLPN . VKNMHVVAH I . . . . . EKNLKEQKH .
. . CLTV . ATTHYGELKALKYQDAR . FENASVEFD . . . . . . . . . . . . . . . .
. . AWVF . VTTHHTP IKLYSTNSDY . YTPASVLFD . . . . . . . . . . . . . . . .
. . ARVL . ATTHYPELKAYGYNREG . VMNASVEFD . . . . . . . . . . . . . . . .
. . AFTL . FATHFLELCH IDALYPN . VENMHFEVQ . . . . . . . . HVK . . . NT
. . ATVF . LSTHFQD I PK IMSKKPA . VSHLHMDAV . . . . . . . . LLN . . . . .
PTCPH I FVATNFLSLVQLQLLPQGPLVQYLTMET . . . . . . . . . . . . . . . .
. KCPR I IACTHFHELFNENVLTEN IKG IKHYCTD I L I SQKYNLLETAHVG
. . . . TRNVTFLYKMLEGQSEGSFGMHVASMCG I SKE I IDNAQ IAAD
. . . . QET I TFLYKF IKGACPKSYGFNAARLANLPEEV IQKGHRKAR
EQV . PDFVTFLYQ I TRG IAARSYGLNVAKLADVPGE I LKKAAHKSK
. . . .WMSV I FLYKLKKGLTYNSYGMNVAKLARLDKD I INRAFS I SE
. . . . PEG IRFLY I LKEGKAEGSFG I EVAKLAGLPEEVVEEARK I LR
. . . . NGTVVFLHQ IKEGAADKSYG IHVAQLAELPGDL IARAQD I LK
. . . . PEE I I FLHQVTPGGADKSYG I EAGRLAGLPSSV I TRARQVMA
. . . . DHTFSFDYKLKKGVNYQSHGLKVAEMAG I PKNVLLAAEEVLT
. . . . GNNFCYNHKLKPG ICTKSDA IRVAELAGFPMEALKEARE I LG
. . . TEETLTMLYQVKKGVCDQSFG IHVAELANFPKHV I ECAKQKAL
. . . DDED I TLLYKVEPG I SDQSFG IHVAEVVQFPEK IVKMAKRKAN
. . . . DQSLSPTYRLLWG I PGRSNALA IAQRLGLPLA IVEQAKDKLG
. . . . RETLKPLYK IAYNTVGESMAFY IAQKYG I PSEV I E IAKRHVG
. . . . I ETLSPTYKLL IGVPGRSNAFE I SKRLGLPDH I IGQAKSEMT
SRNKEA I LYTYKLSKGLTEEKNYGLKAAEVSSLPPS IVLDAKE I TT
. . . . DNSVKMNYQLTQKSVA I ENSG IRVVKK I FNPD I IAEAYNMDS
. . . CEDGNDLVFFYQVCEGVAKASHASHTAAQAGLPDKLVARGKEV
EDHESEG I TFLFKVKEG I SKQSFG IYCAKVCGLSRD IVERAEELSR
----------------I------------------ -----------II------------ ------------III------------
------IV------
MSH6__Yeast
MSH6__Mouse
MSH3__Human
MSH3__Yeast
MutS__Aquae
MutS__Bacsu
MutS__Synsp
MSH1__Pombe
MSH1__Yeast
MSH2__Human
MSH2__Yeast
MutS2_Synsp
MutS2_Aquae
MutS2_Bacsu
MSH4__Human
MSH4__Yeast
MSH5__Human
MSH5__Yeast
MSH6__Yeast
MSH6__Mouse
MSH3__Human
MSH3__Yeast
MutS__Aquae
MutS__Bacsu
MutS__Synsp
MSH1__Pombe
MSH1__Yeast
MSH2__Human
MSH2__Yeast
MutS2_Synsp
MutS2_Aquae
MutS2_Bacsu
MSH4__Human
MSH4__Yeast
MSH5__Human
MSH5__Yeast
MSH6__Yeast
MSH6__Mouse
MSH3__Human
MSH3__Yeast
MutS__Aquae
MutS__Bacsu
MutS__Synsp
MSH1__Pombe
MSH1__Yeast
MSH2__Human
MSH2__Yeast
MutS2_Synsp
MutS2_Aquae
MutS2_Bacsu
MSH4__Human
MSH4__Yeast
MSH5__Human
MSH5__Yeast
TIGRTIGR
Phylogenetic Tree of MutS Family
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
TIGRTIGR
MutS Subfamilies
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
TIGRTIGR
MutS Subfamilies
• MutS1 Bacterial MMR
• MSH1 Euk - mitochondrial MMR
• MSH2 Euk - all MMR in nucleus
• MSH3 Euk - loop MMR in nucleus
• MSH6 Euk - base:base MMR in nucleus
• MutS2 Bacterial - function unknown
• MSH4 Euk - meiotic crossing-over
• MSH5 Euk - meiotic crossing-over
TIGRTIGR
Table 3. Presence of MutS Homologs in Complete Genomes Sequences
Species # of MutS
Homologs
Which
Subfamilies?
Bacteria
Escherichia coli K12 1 MutS1
Haemophilus influenzae Rd KW20 1 MutS1
Neisseria gonorrhoeae 1 MutS1
Helicobacter pylori 26695 1 MutS2
Mycoplasma genitalium G-37 0 -
Mycoplasma pneumoniae M129 0 -
Bacillus subtilis 169 2 MutS1,MutS2
Streptococcus pyogenes 2 MutS1,MutS2
Synechocystis sp. PCC6803 2 MutS1,MutS2
Treponema pallidum Nichols 1 MutS1
Borrelia burgdorferi B31 2 MutS1,MutS2
Aquifex aeolicus 2 MutS1,MutS2
Deinococcus radiodurans R1 2 MutS1,MutS2
Archaea
Archaeoglobus fulgidus VC-16, DSM4304 0 -
Methanococcus janasscii DSM 2661 0 -
Methanobacterium thermoautotrophicum ∆H 1 MutS2
Eukaryotes
Saccharomyces cerevisiae 6 MSH1-6
Homo sapiens 5 MSH2-6
TIGRTIGR
Overlaying Functions onto Tree
Aquae Trepa
Rat
Fly
Xenla
Mouse
Human
Yeast
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Strpy
Bacsu
Ecoli
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Human
Celeg
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
TIGRTIGR
Functional Prediction Using Tree
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
MSH1
Repair
in Mictochondria
MSH3
Repair of Loops
in Nucleus
MSH6
Repair of Mismatches
in Nucleus
MutS1
Repair of Loops and Mismatches
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
Meiotic Crossing-Over
MSH5
Meiotic Crossing-Over
MutS2 Unknown Functions
MSH2
Repair of Loops and Mismatches
in Nucleus
TIGRTIGR
Table 3. Presence of MutS Homologs in Complete Genomes Sequences
Species # of MutS
Homologs
Which
Subfamilies?
MutL
Homologs
Bacteria
Escherichia coli K12 1 MutS1 1
Haemophilus influenzae Rd KW20 1 MutS1 1
Neisseria gonorrhoeae 1 MutS1 1
Helicobacter pylori 26695 1 MutS2 -
Mycoplasma genitalium G-37 - - -
Mycoplasma pneumoniae M129 - - -
Bacillus subtilis 169 2 MutS1,MutS2 1
Streptococcus pyogenes 2 MutS1,MutS2 1
Mycobacterium tuberculosis - - -
Synechocystis sp. PCC6803 2 MutS1,MutS2 1
Treponema pallidum Nichols 1 MutS1 1
Borrelia burgdorferi B31 2 MutS1,MutS2 1
Aquifex aeolicus 2 MutS1,MutS2 1
Deinococcus radiodurans R1 2 MutS1,MutS2 1
Archaea
Archaeoglobus fulgidus VC-16, DSM4304 - - -
Methanococcus janasscii DSM 2661 - - -
Methanobacterium thermoautotrophicum ∆H 1 MutS2 -
Eukaryotes
Saccharomyces cerevisiae 6 MSH1-6 3+
Homo sapiens 5 MSH2-6 3+
TIGRTIGR
Why was the MutS2 Family Missed?
Blast Search of Syn. sp. MutS#2
Sequences producing significant alignments: (bits) Value
sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17
sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16
sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14
sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14
sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14
• Blast search pulls up standard MutS genes
but with only a moderate p value (10-17
)
TIGRTIGR
Problems with Similarity Based
Functional Prediction
• Prone to database error propagation.
• Cannot identify orthologous groups reliably.
• Perform poorly in cases of evolutionary rate
variation and non-hierarchical trees (similarity will
not reflect evolutionary relationships in these cases)
• May be misled by modular proteins or large
insertion/deletion events.
• Are not set up to deal with expanding data sets.
TIGRTIGR
TIGRTIGR
Non-hierarchical Tree
2 31 4 5 6
TIGRTIGR
Evolutionary Rate Variation
2
3
1
4
5
6
TIGRTIGR
Rate Variation and Duplication
Species 3
Species 1
Species 2
1A
2A
3A
1B
2B
3B
Duplication
TIGRTIGR
AlkA Domain (O6-Me-Gglycosylase)
Ogt Domain (O6-Me-Galkyltransferase)
Ada Domain (transcriptions regulator)
Ada E. coli
Ada H. infl
Ogt E. coli
Ogt H. infl
Ogt Gram+
Ogt D. radio
AlkA Gram+
AlkAE. coli
MGMTEuks
Alkylation Repair Genes
TIGRTIGR
Evolutionary
Method
P H Y L O G E N E N E T IC P R E D IC T IO N O F G E N E F U N C T IO N
ID E N T IFY H O M O L O G S
O V E R L A Y K N O W N
FU N C T IO N S O N T O T R E E
IN FE R L IK E L Y FU N C T IO N
O F G E N E (S) O F IN T E R E ST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
A L IG N SE Q U E N C E S
C A L C U L A T E G E N E T R E E
1
2
4
6
C H O O S E G E N E (S) O F IN T E R E ST
2A
2A
5
3
S pecies 3S pecies 1 S pecies 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
A C T U A L E V O L U T IO N
(A SSU M E D T O B E U N K N O W N )
Duplication?
E X A M P L E A E X A M P L E B
D uplication?
D uplication?
D uplication
5
M E T H O D
A m biguous
TIGRTIGR
MutS.Aquae
orf.Trepa
SPE1.Drome
MSH2.Xenla
MSH2.Rat
MSH2.Mouse
MSH2.Human
MSH2.Yeast
MSH2.Neucr
atMSH2.Arath
MutS.Borbu
orf.Strpy
MutS.Bacsu
MutS
SynspMutS
Ecoli orf
Neigo
MutS
Thema
MutS
Theaq
orf.Deira
orf.Chltr
MSH1.Spombe
MSH1.Yeast
MSH3.Yeast
Swi4.Spombe
Rep3.Mouse
hMSH3.Human
orf.Arath
MSH6.Yeast
GTBP.Human
GTBP.Mouse
MSH6.Arath
orf
Strpy
yshD
Bacsu
MSH5
Caeel
hMHS5
human
MSH5
Yeast
MutS.Metth
orf
Borbu
MutS2
Aquae MutS
Synsporf
Deira
MutS.Helpy
sgMutS.Saugl
MSH4.Yeast
MSH4.Caeel
hMSH4.Human
A.
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
MutS2.Metth
MutS2.Saugl
StrpyBacsu
Caeel
Human
Yeast
Borbu
Aquae
Synsp
Deira Helpy
Yeast
Caeel
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
B.
Aquae Trepa
Xenla
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Deira
Chltr
Spombe
Spombe
Arath
Mouse
Mouse
Fly
Rat
Mouse
Human
Yeast
Strpy
Bacsu
Ecoli
Theaq
Yeast
Yeast
Human
Yeast
Human
Arath
StrpyBacsu
Human
MutS2-MetthBorbu
Aquae
Synsp
Deira Helpy
MutS2-Saugl
Caeel
Yeast
Yeast
Caeel
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
C. MutS2StrpyBacsu
MutS2.MetthBorbu
Aquae
Synsp
Deira Helpy
MutS2.Saugl
Caeel
Yeast
Yeast
Caeel
Human
Human
MSH4
Segregation &
Crossover
MSH5
Segregation &
Crossover
Fly
Mouse
Human
Yeast
Aquae Trepa
Xenla
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Deira
Chltr
Spombe
Spombe
Arath
Arath
MutS1
All MMR
(Bacteria)
Rat
Strpy
Bacsu
Ecoli
Theaq
Yeast
Yeast
Mouse
Human
Yeast
Human
Mouse
MSH1
MMR in
Mitochondria
MSH3
MMR of
Large Loops
in Nucleus
MSH6
MMR of
Mismatches and
Small Loops
in Nucleus
MSH2
All MMR
in Nucleus
D.
TIGRTIGR
Clustering vs. Neighbor-joining
MutS2.Syns
MutS2.Bacs
MutS2.Help
MutS2.Deir
Mutsl.Mett
MSH4.Celeg
MSH4.Yeast
MSH4.human
mMutS.Saco
MSH3.yeast
C23C11.Spo
MSH1.Yeast
MSH3.Human
REP1.Mouse
GTBP.Mouse
GTBP.Human
MSH6.Yeast
MSH5.Human
MSH5.Celeg
MSH5.Yeast
MSH2.Human
MSH2.Mouse
MSH2.Yeast
MutS.Ecoli
MutS.Synsp
MutS.Deira
MutS.Bacsu
M utS.Ecoli
M utS.Synsp
M utS.B acsu
M utS.Deira
M SH 2.H uman
M SH 2.M ouse
M SH 2.Yeast
M SH 3.H uman
R EP1.M ouse
G TB P.M ouse
G TB P.H uman
M SH 6.Yeast
C 23C 11.Sp o
M SH 1.Yeast
M SH 3.yeast
M SH 4.C eleg
M SH 4.human
M SH 5.C eleg
M SH 5.Yeast
mM utS.Saco
M SH 5.H uman
M SH 4.Yeast
M utS2.Syns
M utS2.B acs
M utS2.Deir
M utS2.H elp
M utsl.M ett
UPGMANeighbor-Joining
TIGRTIGR
Deinococcus radiodurans
TIGRTIGR
UvrA Gene Family
• UvrA has conserved role in nucleotide excision repair in
bacteria (part of UvrABCD complex)
• UvrA homologs found in all complete bacterial genomes
• Some UvrA homologs have been found to be involved in
resistance to DNA damaging antibiotics
• UvrA accumulates at membrane after DNA damage
• All UvrAs are members of the ABC transporter family
• Possible role in DNA damage export?
TIGRTIGR
UvrAs in D. radiodurans
• UvrA homolog in D. radiodurans shown to be
part of UV endonuclease α complex
• D. radiodurans genome sequence reveals a second
UvrA gene - on the large megaplasmid
• D. radiodurans known to export DNA repair
products (e.g., damaged bases) out of cell after
damage
• Export may be important for radiation resistance
(Battista 1997)
TIGRTIGR
UvrA Evolution
• Originated by gene duplication of an ABC transporter
• Subsequently, there was a tandem duplication of the
ABC transporter motif within UvrA
• Ancient duplication into UvrA1 and UvrA2
subfamilies
• UvrA1s - conserved role in NER
• UvrA2s - transport of DNA damage?
• UvrA2 in D. radiodurans may be from lateral transfer
TIGRTIGR
Evolution of UvrA Family
UvrA2
UvrA2 S. coelicolor
DrrC S. peuceteus
UvrA2 D. radiodurans
Duplication
in UvrA
family
UvrA1
UvrA H. influenzae
UvrA E. coli
UvrA N. gonorrhoaea
UvrA R. prowazekii
UvrA S. mutans
UvrA S. pyogenes
UvrA S. pneumoniae
UvrA B. subtilis
UvrA M. luteus
UvrA M. tuberculosis
UvrA M. hermoautotrophicum
UvrA H. pylori
UvrA C. jejuni
UvrA P. gingivalis
UvrA C. tepidum
uvra1 D. radiodurans
UvrA T. thermophilus
UvrA T. pallidum
UvrA B. burgdorefi
UvrA T. maritima
UvrA A. aeolicus
UvrA Synechocystis sp.
UvrA1
UvrA2
OppDF
UUP
NodI
LivF
XylG
NrtDC
PstB
MDR
HlyB
TAP1
CFTR, SUR
A. ABC Transporters B. UvrA Subfamily
TIGRTIGR
UvrA Evolution
Diversification of ABC family
UvrA
UvrAC UvrAN
UvrA1C UvrA1N UvrA2C UvrA2N
ABC1ABC2
ABC
Tandem Duplication
Gene Duplication
TIGRTIGR
Three V. cholerae Photolyases
Phr.S thyp
PHR E. coli
ORFA00965* * * * * * * * *
phr.neucr
Phr.Tricho
Phr.Yeast
Phr.B firm
phr.strpy
phr.haloba
PHR STRGR
pCRY1.huma
phr.mouse
phr2.human
phr2.mouse
phr.drosop
phr3.Synsp
O RF02295.V ibch* * * * * * * *
phr.neigo
ORF01792.V ibch* * * * * * *
Phr.Adiant
Phr2.Adian
Phr3.Adian
phr.tomato
CRY1 ARATH
phr.phycom
CRY2 ARATH
PHH1.arath
PHR1 SINAL
phr.chlamy
PHR ANANI
phr.Synsp
PHR SYNY3
phr.Theth
Rh.caps
MTHF type
Class I CPD
Photolyases
6-4
Photolyases
Blue
Light
Receptors
8-HDF type
CPD
Photolyases
Three Photolyase Homologs inV. cholerae
TIGRTIGR
MFS phylogenetic tree
Bmr Bsu
TetB Eco
Vmt1 Rno
Mmr Sco
EmrB Eco
QacA Sau
Sge1 Sce
TetK Sau
NarK Bsu
NasA Bsu
CrnA Eni
NapO Ocu
Ykh4 Cel
Hup1 Cke
AraE Eco
Itr1 Sce
Gtr1 Hsa
ProP EcoKgtP Eco
CitA Sty
HI1104 Hin
NanT Eco
YjhB Eco
Ycy8 Sce
YaeC Spo
Pho84 Sce
UhpT Eco
PgtP Sty
UhpC Eco
GlpT Bsu
NupG Eco
XapB Eco
LacY Eco
LacY Kpn
RafB Eco
CscB Eco
YhjX Eco
Y38K Tte
OxlT Ofo
T02G5 Cel
XpcT Hsa
Mct2 Rno
Gal Bab
FucP Eco
Yhe7 Sce
Yhe0 Sce
YK86 Sce
OFA
OHS
NHS
OPA
PHS
SHS
MHS
SPACS
NNP
DHA14
DHA12
UMF
FGHS
MCP
TIGRTIGR
Uses of Phylogenomics II:
Knowing when to Not Predict
Functions
TIGRTIGR
DNA Repair Genes in D.
radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair UvrABCD, UvrA2
Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,
MPG
AP Endonuclease Xth
Mismatch Excision Repair MutS, MutL
Recombination
Initiation
Recombinase
Migration and resolution
RecFJNRQ, SbcCD, RecD
RecA
RuvABC, RecG
Replication PolA, PolC, PolX, phage Pol
Ligation DnlJ
dNTP pools, cleanup MutTs, RRase
Other LexA, RadA, HepA, UVDE, MutS2
TIGRTIGR
Recombination Genes in Genomes
Pathway |------------------------------Bacteria---------------------------| |---Archaea---| Euks
Protein Name(s)
Initiation
RecBCD pathway
RecB + + - - - - - - + + - + - - - - - - - -
RecC + + - - - - - - + ±+ - ± - - - - - - - -
RecD + + - - ± - - - + ±+ - ++ - ± ±+ - - - - -
RecF pathway
RecF + + + - + - - + + - + ± - - + - - ± ± ±
RecJ + + + + + - - + - + + + + + + - - - - -
RecO + + - - + - - + + - - - - - ± - - - - -
RecR + + + ±+ + - - + + - + + - + + - - - - -
RecN + + + + + - - + + - + - ± + + - - ± ± -
RecQ + + - - + - - + - - + - - - + - - - - + ++
RecE pathway
RecE/ExoVIII + - - - - - - - - - - - - - - - - - - -
RecT + - - - + - - - - - - - - - - - - - - -
SbcBCD pathway
SbcB/ExoI + + - - - - - - - - - - - - - - - - - -
SbcC + - - - + - - + - + + - + + + ± ± ± ± ± ±
SbcD + - - - + - - + - + + - + + + ± ± ± ± ± ±
AddAB Pathway
AddA/RexA - - + - + - - - - - + + - ± - - - - - -
AddB/RexB - - - - + - - - - - - - - - - - - - - -
Rad52 pathway
Rad52, Rad59 - - - - - - - - - - - - - - - - - - - ++ +
Mre11/Rad32 ± - - - ± - - ± - ± ± - ± ± ± + + + + + +
Rad50 ± - - - ± - - ± - ± ± - ± ± ± + + + ± + +
Recombinase
RecA, Rad51 + + + + + + + + + + + + + + + + + + + ++ ++
Branch migration
RuvA + + + + + + + + + + + + + - + - - - - -
RuvB + + + + + + + + + + + + + - + - - - - -
RecG + + + + + - - + + + + - + + + - - - - -
Resolvases
RuvC + + + + - - - + + - + + + - + - - - - -
RecG + + + + + - - + + + + - + + + - - - - -
Rus + - - - - - - - ±+ - - - - ±+ - - - - - -
CCE1 - - - - - - - - - - - - - - - - - - - +
Other recombination proteins
Rad54 - - - - - - - - - - - - - - - - - - - + +
Rad55 - - - - - - - - - - - - - - - - - - - + +
Rad57 - - - - - - - - - - - - - - - - - - - + +
Xrs2 - - - - - - - - - - - - - - - - - - - +
TIGRTIGR
Unusual Features of D. radiodurans
DNA Repair Genes
Process Genes
Nucleotide excision repair Two UvrAs
Base excision repair Four MutY-Nths
Recombination RecD but not RecBC
Replication Four Pol genes
dNTP pools Many MutTs, two RRases
Other UVDE
TIGRTIGR
Problem:
List of DNA repair gene homologs
in D. radiodurans genome is not
significantly different from other
bacterial genomes of the similar size
TIGRTIGR
Repair Studies in Different Species
(determined by Medline searches as of 1998)
Humans 7028
E. coli 3926
S. cerevisiae 988
Drosophila 387
B. subtilits 284
S. pombe 116
Xenopus 56
C. elegans 25
A. thaliana 20
Methanogens 16
Haloferax 5
Giardia 0
TIGRTIGR
-Ogt
-RecFRQN
-RuvC
-Dut
-SMS
-PhrI
-AlkA
-Nfo
-Vsr
-SbcCD
-LexA
-UmuC
-PhrI
-PhrII
-AlkA
-Fpg
-Nfo
-MutLS
-RecFORQ
-SbcCD
-LexA
-UmuC
-TagI
-PhrI
-Ogt
-AlkA
-Xth
-MutLS
-RecFJORQN
-Mfd
-SbcCD
-RecG
-Dut
-PriA
-LexA
-SMS
-MutT
-PhrI
-PhrII?
-AlkA
-Fpg
-Nfo
-RecO
-LexA
-UmuC
-PhrI
-Ung?
-MutLS
-RecQ?
-Dut
-UmuC
-PhrII
-Ogg
-Ogt
-AlkA
-TagI
-Nfo
-Rec
-SbcCD
-LexA
-Ogt
-AlkA
-Nfo
-RecQ
-SbcD?
-Lon
-LexA
-AlkA
-Xth
-Rad25?
-AlkA
-Rad25
-Nfo
-Ogt
-Ung
-Nfo
-Dut
-Lon
-Ung
-PhrII
-PhrI
Ecoli
Haein
Neigo
Helpy
Bacsu
Strpy
Mycge
Mycpn
Borbu
Trepa
Synsp
Metjn
Arcfu
Metth
Human
Yeast
BACTERIA ARCHAEA EUKARYOTES
from mitochondria
+Ada
+MutH
+SbcB
dPhr
+TagI?
+Fpg
+UvrABCD
+Mfd
+RecFJNOR
+RuvABC
+RecG
+LigI
+LexA
+SSB
+PriA
+Dut?
+Rus
+UmuD
+Nei?
+RecE
tRecT?
+Vsr
+RecBCD?
+RFAs
+TFIIH
+Rad4,10,14,16,23,26
+CSA
+Rad52,53,54
+DNA-PK, Ku
dSNF2
dMutS
dMutL
dRecA
+Rad1
+Rad2
+Rad25?
+Ogg
+LigII
+Ung?
+SSB,
+Dut?
+PhrI, PhrII
+Ogt
+Ung, AlkA, MutY-Nth
+AlkA
+Xth, Nfo?
+MutLS?
+SbcCD
+RecA
+UmuC
+MutT
+Lon
dMutSI/MutSII
dRecA/SMS
dPhrI/PhrII
+Spr
t3MG
+Rad7
+CCE1
+P53
dRecQ
dRad23
+MAG?
-PhrII
-RuvC
tRad25
+TagI?
+RecT
tUvrABCD
tTagI ?
Gain and Loss of Repair Genes
TIGRTIGR
TIGRTIGR
Evolution of Uracil Glycosylase
• Ung activity has evolve many times (many non-
homologous proteins have uracil-DNA glycosylase
activity)
• Therefore, absence of homologs of these genes
should not be used to infer likely absence of
activity
• However, presence of homologs of Ung and MUG
genes can be used to indicate presence of activity
because all homologs of these genes have this
activity
TIGRTIGR
Evolution of Photoreactivation
• All known enzymes that perform photoreactivation are part of
a single large photolyase gene family
• Some members of the family do not function as photolyases,
but instead work as blue-light receptors
• If a species does not encode a member of the photolyase gene
family, it likely does not have photoreactivation capability
• If a species encodes a photolyase, one cannot conclude it has
photolyase activity
• Position of photolyase homologs within photolyase tree helps
predict what activities they have
TIGRTIGR
Evolution of Alkyltransferases
• All known alkyltransferases share a conserved,
homologous alkyltransferase domain
• Therefore, if a species does not encode any
protein with this domain, it likely does not have
alkyltransferase activity
• If a species does encode an member of this gene
family, it likely has alkyltransferase activity
TIGRTIGR
Uses of Phylogenomics III:
Gene Duplication
TIGRTIGR
Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Aids functional predictions
• Recent duplications may be indicative of species’
specific adaptations
• Helps identify mechanisms of duplication
• Can be used to study mutation processes in
different parts of genome
TIGRTIGR
MurA Homologs in A. thaliana
A RA TH I F5F19.7
A R A TH II F26C24.7
A RA TH IV F7N22.13
A R A TH IV T3H13.11
A R A TH II F23M 2.29
A RA TH II T13H18.11
A RA TH IV T24H24.6
A R A TH IV T3H13.13A R A TH I F22O13.23
A R A TH II F9B22.14
A RA TH II F27C21.3
A R A TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4
A R A TH II T13E11.2
A RA TH IV T24M 8.2
A R A TH IV F7N22.10A R A TH IV T3F12.3
A R A TH II T13E11.15
A R A TH IV T7M 24.1
A R A TH IV T3F12.8
A RA TH V T21B04.1
A R A TH II F27L4.10
A R A TH II F26B6.15
A RA TH II F23M 2.24
A R A TH I F1N21.16
A R A TH IV F9D12.2
A RA TH II F9B22.8
A R A TH IV F28J12.70
A RA TH IV T3F12.12
A RA TH II T13P21.20
A R A TH II T13E11.10
A RA TH V T21B04.16
A R A TH V T19K24.12
A R A TH V T19K24.13
A R A TH V T19K24.17 A R A TH V T21B04.11
A R A TH V T21B04.14
A R A TH V T21B04.10
A R A TH II T13P21.21
A R A TH V T21B04.13
A R A TH V T21B04.12
A RA TH II T13P21.3
A R A TH V T19K24.15
A RA TH V T19K24.16
A R A TH II T13E11.20
A RA TH V T19K24.11
A R A TH II T13E11.21
A R A TH II T13E11.9
A R A TH V T19K24.10
A RA TH V T21B04.15
A RA TH V T19K24.14
A R A TH II T11J7.3
TIGRTIGR
MurA Homologs in A. thaliana
colored by chromosome
A R A TH I F5F19.7
A RA TH I F22O13.23
A R A TH I F1N21.16
A RA TH V T21B04.1
A RA TH V T21B04.16
A R A TH V T19K24.12
A R A TH V T19K24.13
A RA TH V T19K24.17 A RA TH V T21B04.11
A R A TH V T21B04.14
A R A TH V T21B04.10
A RA TH V T21B04.13
A RA TH V T21B04.12
A RA TH V T19K24.15
A R A TH V T19K24.16
A R A TH V T19K24.11
A R A TH V T19K24.10
A RA TH V T21B04.15
A R A TH V T19K24.14
A R A TH IV F7N22.13
A RA TH IV T3H13.11
A RA TH IV T24H24.6
A R A TH IV T3H13.13
A R A TH IV T24M 8.2
A R A TH IV F7N22.10
A R A TH IV T3F12.3
A RA TH IV T7M 24.1
A R A TH IV T3F12.8
A RA TH IV F9D12.2
A R A TH IV F28J12.70
A R A TH IV T3F12.12
A R A TH II F26C24.7
A R A TH II F23M 2.29
A R A TH II T13H18.11
A R A TH II F9B22.14
A R A TH II F27C21.3
A RA TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4A R A TH II T13E11.2
A R A TH II T13E11.15
A R A TH II F27L4.10
A R A TH II F26B6.15
A RA TH II F23M 2.24
A R A TH II F9B22.8
A RA TH II T13P21.20
A R A TH II T13E11.10
A R A TH II T13P21.21
A R A TH II T13P21.3
A RA TH II T13E11.20
A RA TH II T13E11.21
A R A TH II T13E11.9
A RA TH II T11J7.3
TIGRTIGR
Recent Duplications
• Gene duplication is frequently accompanied
by functional divergence
• Evolutionary analysis can identify recent
duplications with no bias towards type of
gene
• Location of duplicates can help identify
mechanisms of duplication
TIGRTIGR
MutY-Nth
DEIRAORF00829
DEIRAORF02784
DEIRA
AQUAE
METJA
METTH
THEMA
CHLTR
HAEIN
MCYTU
THEMA
METTH
PYRHO
AQUAE
METJA
ARCFU
CELEG
VIBCH
ECOLI
HAEIN
TREPA
RICPR
AQUAE
BACSU
CAMJE
HELPY
MCYTU
SYNSP
CHLPN
CHLTR
BBUR
TIGRTIGR
Expansion of MCP Family in V. cholerae
E.coligi1787690
B.subtilisgi2633766
Synechocystissp. gi1001299
Synechocystissp. gi1001300
Synechocystissp. gi1652276
Synechocystissp.gi1652103
H.pylori gi2313716
H.pylori99 gi4155097
C.jejuniCj1190c
C.jejuniCj1110c
A.fulgidusgi2649560
A.fulgidusgi2649548
B.subtilisgi2634254
B.subtilisgi2632630
B.subtilisgi2635607
B.subtilisgi2635608
B.subtilisgi2635609
B.subtilisgi2635610
B.subtilisgi2635882
E.coligi1788195
E.coligi2367378
E.coligi1788194
E.coligi1789453
C.jejuniCj0144
C.jejuniCj0262c
H.pylori gi2313186
H.pylori99 gi4154603
C.jejuniCj1564
C.jejuniCj1506c
H.pylori gi2313163
H.pylori99 gi4154575
H.pylori gi2313179
H.pylori99 gi4154599
C.jejuniCj0019c
C.jejuniCj0951c
C.jejuniCj0246c
B.subtilisgi2633374
T.maritima TM0014
T.pallidumgi3322777
T.pallidumgi3322939
T.pallidumgi3322938
B.burgdorferi gi2688522
T.pallidumgi3322296
B.burgdorferi gi2688521
T.maritima TM0429
T.maritima TM0918
T.maritima TM0023
T.maritima TM1428
T.maritima TM1143
T.maritima TM1146
P.abyssiPAB1308
P.horikoshiigi3256846
P.abyssiPAB1336
P.horikoshiigi3256896
P.abyssiPAB2066
P.horikoshiigi3258290
P.abyssiPAB1026
P.horikoshiigi3256884
D.radiodurans DR A00354
D.radiodurans DRA0353
D.radiodurans DRA0352
P.abyssiPAB1189
P.horikoshiigi3258414
B.burgdorferi gi2688621
M.tuberculosisgi1666149
V .c hole ra eV C0 5 1 2
V . c hol e ra eV CA1 0 3 4
V .c hole ra eV CA 0 9 7 4
V .c hole raeV CA 0 06 8
V . chol e ra eV C0 8 2 5
V . c hol e ra eV C0 28 2
V .c hol e raeV CA 0 9 0 6
V . chol e ra eV CA0 9 7 9
V .c hol e raeV CA 1 0 5 6
V . c hol e ra eV C1 64 3
V . c hol e ra eV C2 1 6 1
V .c hole ra eV CA 09 2 3
V .c hole raeV C0 5 1 4
V . c hol e ra eV C1 8 6 8
V . c hol era eV CA0 7 7 3
V .c hole raeV C1 3 1 3
V . c hol era eV C1 8 5 9
V . c hole ra eV C14 1 3
V .c hol e raeV CA 0 2 6 8
V .c hol e raeV CA0 6 5 8
V . c hole ra eV C14 0 5
V . c hol e ra eV C1 2 9 8
V . c hol e ra eV C1 2 4 8
V . c hol era eV CA0 8 6 4
V . c hole ra eV CA0 1 7 6
V. c hol e ra eV CA0 2 2 0
V .c hole ra eV C1 2 8 9
V .c hole ra eV CA 10 6 9
V . c hol e ra eV C2 43 9
V . chol e ra eV C1 9 6 7
V . chol e ra eV CA0 0 3 1
V . c hole ra eV C18 9 8
V . chol e ra eV CA0 6 6 3
V .c hole ra eV CA 0 9 8 8
V . c hol era eV C0 2 1 6
V . c hol era eV C0 4 4 9
V .c hole ra eV CA 0 0 0 8
V . c hole ra eV C14 0 6
V . chol e ra eV C1 5 3 5
V .c hole ra eV C0 8 4 0
V . c hol e raeV C0 0 98
V .c hole ra eV CA 1 0 9 2
V .c hole ra eV C1 4 0 3
V .c hole ra eV CA1 0 8 8
V . c hol e ra eV C1 3 9 4
V .c hole ra eV C0 6 2 2
NJ
* *
* *
* *
*
* *
* *
* *
* *
* *
*
* *
* *
* *
* *
*
* *
* *
* *
* *
* *
* *
* *
* *
* *
* *
*
* *
* *
* *
* ** *
* *
*
*
*
*
* *
*
* *
* *
* *
*
* *
* *
*
TIGRTIGR
Phosphate Transporters
ARCFU
SYNSP
THEMA
AQUAE
METJA
MCYTU
MCYTU
VIBCH
ECOLI
DEIRA_ORF00198
DEIRA_ORFA00139
DEIRA_ORF00510
TIGRTIGR
Levels of Paralogy Within A Genome
• All
– All members of a gene family are linked together
• Top matches
– Only top matching pairs are linked together.
Therefore, if in a large gene family, only the pair
from the most recent duplication event is included
• Recent
– Operational definition based on comparison to other
species. Only pairs which are more similar to each
other than to selected other species are included.
TIGRTIGR
TIGRTIGR
C. pneumoniae Paralogs - All
0
250000
500000
750000
1000000
1250000
SubjectOrfPosition
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
TIGRTIGR
C. pneumoniae Paralogs - Top
0
250000
500000
750000
1000000
1250000
SubjectOrfPosition
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
TIGRTIGR
C. pneumoniae Paralogs – Recent
0
250000
500000
750000
1000000
1250000
SubjectOrfPosition
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
TIGRTIGR
E. coli Paralogs - All
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
SubjectCoordinates
0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000
Query Coordinates
TIGRTIGR
TIGRTIGR
E. coli Paralogs - Top
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
SubjectCoordinates
0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000
Query Coordinates
TIGRTIGR
TIGRTIGR
E. coli Paralogs - Recent
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
SubjectCoordinates
0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000
Query Coordinates
TIGRTIGR
TIGRTIGR
0
1000
2000
3000
4000
ChromosomePositionofRecentDuplicate
0 1000 2000 3000 4000
Chromosome Position ofQuery
Recent Duplications in
N. meningitidis
TIGRTIGR
TIGRTIGR
0
500000
1000000
1500000
Query ORF Chromosome Position
C. pneumoniae AR39
BestMatchChromosomePosiion
0
500000
1000000
C. trachomatis MoPn
Query ORF Chromosome Position
BestMatchChromosomePosiion
A. B.
0 500000 1000000 1500000 0 500000 1000000
TIGRTIGR
Uses of Phylogenomics IV:
Genetic Exchange within Genomes
TIGRTIGR
Circular Maps
TIGRTIGR
D. radiodurans Transposase Family
DEIRA_ORF01427_transposase__ps
DEIRA_ORF01431_transposase_{Sy
DEIRA_ORF03257_transposase_{Sy
DEIRA_ORFB01001_transposase__p
DEIRA_ORFB01020_transposase_{S
DEIRA_ORFB01025_transposase_{S
DEIRA_ORFB01012_transposase_{S
DEIRA_ORFB01035_transposase_{S
DEIRA_ORFC0021_transposase_{Sy
DEIRA_ORFC0025_hypothetical_pr
DEIRA_ORFC0018_transposase__ps
ORFB
ORF0
ORFC
TIGRTIGR
TIGRTIGR
Uses of Phylogenomics V:
Gene Loss
TIGRTIGR
Why Gene Loss is Useful to Identify
• Indicates that gene is not absolutely required for
survival
• Helps distinguish likelihood of gene transfers
• Correlated loss of same gene in different species
may indicate selective advantage of loss of that
gene
• Correlated loss of genes in a pathway indicates a
conserved association among those genes
TIGRTIGR
EuksArch Bacteria
Loss
Evolutionary O rigin of Gene
MT MJ SC HS AA DR TA BS MG MP BB TP HP HI EC SS MT
Presence ( ) or Absence of Gene
Species Abbreviation
Kingdom
Example of Tracing Gene Loss
TIGRTIGR
TIGRTIGR
5
1
2
3
4
E.coli
H.influenzae
N.gonorrhoeae
H.pylori
Syn.sp
B.subtilis
S.pyogenes
M.pneumoniae
M.genitalium
A.aeolicus
D.radiodurans
T.pallidum
B.burgdorferi
A.aeolicus
Spyogenes
B.subtilis
Syn.sp
D.radiodurans
B.burgdorferi
Syn.sp
B.subtilis
S.pyogenes
A.aeolicus
D.radiodurans
B.burgdorferi
MutS2
MutS1
A. B.
Gene
Duplication
Gene
Duplication
Ancient Duplication in MutS Family
TIGRTIGR
Need for Phylogenomics Example:
Gene Duplication and Loss
• Genome analysis required to determine number of
homologs in different species
• Evolutionary analysis required to divide into
orthology groups and identify gene duplications
• Genome analysis is then required to determine
presence and absence of orthologs
• Then loss of orthologs can be traced onto
evolutionary tree of species
TIGRTIGR
Uses of Phylogenomics VII:
Specialization
TIGRTIGR
Circular Maps
TIGRTIGR
Species Distribution of Homologs of
D. radiodurans Genes
0
10
20
30
40
50
60
0 5 10 15 20
0
50
100
150
0 5 10 15 20
NumberofSpeciesWithHighHits
0
50
100
150
200
250
Frequency
0 5 10 15 20
PapaBear MamaBear BabyBear
0
100
200
300
400
500
0 5 10 15 20
E.coli
TIGRTIGR
Megaplasmid I:
Iron Utilization/Iron Transport
ORFB040 Na+/H+ antiporterORFB040 Na+/H+ antiporter
ORFB042 iron ABC transporter, ATP-binding proteinORFB042 iron ABC transporter, ATP-binding protein
ORFB044 iron ABC transporter, permease proteinORFB044 iron ABC transporter, permease protein
ORFB045 iron ABC transporter, permease proteinORFB045 iron ABC transporter, permease protein
ORFB046 iron-chelator utilization proteinORFB046 iron-chelator utilization protein
ORFB047 iron ABC transporter, periplasmic substrate bpORFB047 iron ABC transporter, periplasmic substrate bp
ORFB067 putative metal binding proteinORFB067 putative metal binding protein
ORFB141 iron-chelator utilization proteinORFB141 iron-chelator utilization protein
ORFB074 hemin ABC transporter, periplasmic hemin bpORFB074 hemin ABC transporter, periplasmic hemin bp
ORFB075 hemin ABC transporter, permease proteinORFB075 hemin ABC transporter, permease protein
ORFB076 hemin ABC transporter, ATP-binding proteinORFB076 hemin ABC transporter, ATP-binding protein
TIGRTIGR
Specialized Genetic Elements
(Chromosome II and Megaplasmid)
• Many two component systems
• Nitrogen metabolism
• LexA
• Ribonucleotide reductase
• UvrA2
• Many transcription factors (e.g., HepA)
• Iron metabolism
TIGRTIGR
Uses of Phylogenomics VIII:
Comparison of Closely Related
Genomes
TIGRTIGR
V. cholerae vs. E. coli All Hits
0
1000000
2000000
3000000
4000000
5000000E.coliCoordinates
0 1000000 2000000 3000000
V. cholerae CoordinatesTIGRTIGR
TIGRTIGR
V. cholerae vs. E. coli Top Hits
0
1000000
2000000
3000000
4000000
5000000
E.coliCoordinates
0 1000000 2000000 3000000
V. cholerae CoordinatesTIGRTIGR
TIGRTIGR
V. cholerae vs. E. coli
Only if EC-Orf is Closest in All Genomes
0
1000000
2000000
3000000
4000000
5000000
E.coliCoordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
TIGRTIGR
TIGRTIGR
V. cholerae vs. E. coli Proteins
Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
TIGRTIGR
V. cholerae vs. E. coli F+R
0
1000000
2000000
3000000
4000000
5000000
Bert
Ecoli R
Ecoli
TIGRTIGR
S. pneumoniae vs. S. pyogenes DNA F+R
0
500000
1000000
1500000
2000000
BSP vs Spyo
TIGRTIGR
M. tuberculosis vs. M. leprae DNA
0
1000000
2000000
3000000
4000000
M1
TIGRTIGR C. trachomatis MoPn
C.pneumoniaeAR39
Origin
Termination
C. trachomatis vs C. pneumoniae Dot Plot
TIGRTIGR
Duplication and Gene Loss Model
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
E
F
A
B
C
D
E
F
A ’
B’
C’
D’
E’
F ’
A
B
C
D
E
F
A ’
B’
C’
D’
E’
F’
A
C
D
F
A ’
B’
E’
E. coli
E. coli
B
C
D
F
A ’
B’
D’
E’
V. cholerae
A
B
C
D
E
F
A ’
B’
C’
D’
E’
F’
TIGRTIGR
B1
A1
B2
A2
B3
A3
A2
A1 A2
A3
B2
B1
B3
B2
24
23
22
21
20
19
18171615
14
13
12
11
10
9
6
7
258
26
27
28
29
30
1 2 3
4
5
3132
B1
3132
6
7
8
9
10
11
12
13
14
15161718
19
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
B3 24
23
22
21
20
19
18171615
14
13
12
11
10
9
6
7
258
26
27
28
29
3
3231 30
4
5
2 1
A1
3132
6
7
8
9
10
11
12
13
14
15161718
19
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
A2
3132
6
7
8
9
10
11
12
13
19
18171615
14
20
21
22
23
24
25
26
27
28
29
30
1 2 3
4
5
3132
A3
2
6
7
8
9
10
11
12
13
19
18171615
14
20
21
22
23
24
25
26
27
5
4
3 31 30
29
28
1 32
B2
Inversion
A round
Terminus (*)
Inversion
A round
Terminus (*)
Inversion
A round
Origin (*)
Inversion
A round
Origin (*)
* *
* *
* *
* *
Figure 4
C ommon
Ancestor of
A and B
3132
6
7
8
9
10
11
12
13
14
15161718
19
20
21
22
23
24
25
26
27
28
29
30
1 2
3
4
5
3132
TIGRTIGR
M. tuberculosis strain phylogeny (Indels)
TIGRTIGR
Musser-Type Evolution (Indel Phylogeny)
98a
107a
43a
73a
105a
133a
114a
169a
218a
290a
160a
159a
13a
18a
26a
30a
32a
53a
58a
70a
96a
97a
100a
124a
204a
208a
236a
239a
249a
286a
99a
279a
205a
304a
54a
155a
165a
CDC1551a
223a
110a
122a
245a
313a
36a
40a
71a
79a
168a
254a
283a
312a
4a
12a
41a
42a
52a
77a
187a
214a
81a
129a
274a
220a
64a
48a
55a
60a
72a
80a
83a
85a
89a
91a
95a
111a
170a
171a
182a
212a
219a
225a
244a
278a
301a
195a
2a
123a
207a
306a
69a
94a
101a
102a
112a
113a
121a
132a
211a
222a
235a
250a
284a
285a
N1a
87a
117a
120a
136a
191a
237a
261a
37a
131a
269a
240a
63a
197a
206a
75a
108a
263a
128a
172a
162a
86a
38a
109a
119a
248a
6a
65a
68a
189a
66a
106a
227a
31a
78a
202a
213a
62a
163a
224a
256a
276a
287a
173a
291a
252a
281a
295a
310a
251a
151a
188a
292a
140a
141a
103a
174a
229a
259a
H37Rv
88a
44a
74a
76a
126a
282a
166a
210a
84a
TIGRTIGR
Consistency Indices (Indel Phylogeny)
Calculated over stored trees
CI
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
maximum
average
minimum
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201
Character
TIGRTIGR
M. tuberculosis strain phylogeny
(Indels/SNPs)
TIGRTIGR
Musser-Type Evolution (Combined
Phylogeny)
TIGRTIGR
Consistency Indices (Combined Phylogeny)
Calculated over stored trees
CI
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
maximum
average
minimum
2
9
0
4
3
1
3
5
3
1
9
4
4
0
9
0
4
2
3
9
4
2
4
6
4
5
8
9
5
1
9
8
4
2
5
1
1
0
4
9
2 4C 6 6B 7 8C 9B 11B 14 15 15B 18C 4 8 12 16 18 M
U
S
S
E
R
S
m
e
a
r
Si
te
2
1
3
4
Character
TIGRTIGR
Uses of Phylogenomics VI:
Horizontal Gene Transfer and
Species Evolution
TIGRTIGR
Vertical Inheritance
TIGRTIGR
Examples of Horizontal Transfers
• Antibiotic resistance genes on plasmids
• Insertion sequences
• Pathogenicity islands
• Toxin resistance genes on plasmids
• Agrobacterium Ti plasmid
• Viruses and viroids
• Organelle to nucleus transfers
TIGRTIGR
Why Gene Transfers Are Useful to Identify
• Laterally transferred genes frequently involved in
environmental adaptations and/or pathogenicity
• Helps identify transposons, integrons, and other
vectors of gene transfer
• Helps identify species associations in the
environment
•
TIGRTIGR
Steps in Lateral Gene Transfer
1
2
3-5
6
A B C D
TIGRTIGR
How to Infer Gene Transfers
• Unusual distribution patterns
• Unusual nucleotide composition
• High sequence similarity to supposedly
distantly related species
• Unusual gene trees
• Observe transfer events
TIGRTIGR
Inferring Lateral Transfers
Observation Other Causes Always Occurs
Unusual Distribution Sampling bias Not if recipient already has gene.
Unusual GC/Codons Selection Not if donor/recipient similar.
Not if it occurred long ago.
High hit to "distant" species Selection
Rate variation
Gene loss
Usually.
Incongruent trees Bad trees
Missed paralogs
Usually.
Correlation of above with
neighbors
Selection Only if genes keep order after
transfer.
TIGRTIGR
E. coli and S. typhimurium Transfer
E. coli S. typhimurium
Old Model
E. coli S. typhimurium
New Model
TIGRTIGR
PGKPGK
Neighbor-joining;Neighbor-joining;
bootstrap;bootstrap;
50% majority rule50% majority rule
consensusconsensus
outgroup = Archaeaoutgroup = Archaea
T. maritima
M. genitalium
M. pneumoniae
A. aeolicus
B. burgdorferi
T. pallidum
B. subtilis
Synechocystis
E. coli
H. influenzae
H. pylori
M. tuburculosis
S. cerevisiae
A. fulgidus
M. jannascii
M. thermoauto
P. horikoshii
89
57
100
59
58
58 100
83
100
B. subtilis
S. cerevisiae
T. maritima
H. pylori
M. pneumoniae
M. genitalium
Synechocystis
B. burgdorferi
T. pallidum
P. horikoshii
M. jannascii
M. thermoautoA. fulgidus
A. aeolicus
H. influenzae
E. coli
M. tuburculosis
TIGRTIGR
Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes**
Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal
Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%)
Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%)
SynechocystisSynechocystis sp.sp. 126 (4%)126 (4%)
Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%)
Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%)
** 1010-5-5
over 60% of sequenceover 60% of sequence
TIGRTIGR
Evidence for lateral gene transfer inEvidence for lateral gene transfer in
ThermotogaThermotoga
1. 81 archaeal-like genes are clustered in 15 regions which
range in size from ~ 4 to 20 kb; many share conserved gene
order with their archaeal counterparts.
2. Many of the archaeal-like genes correspond to regions with
a significantly different base composition than the rest of
the chromosome.
3. Some of these regions are associated with a 30 bp repeat
structure found only in thermophiles.
4. Initial phylogenetic analyses of some of these genes lends
support to lateral gene transfer.
TIGRTIGR
0987 09900989ThermotogaThermotoga ORFORF
Archaea homologArchaea homolog
Bacterial homologBacterial homolog
Eukaryote homologEukaryote homolog
ThermotogaThermotoga ORFORF
Archaea homologArchaea homolog
Bacterial homologBacterial homolog
Eukaryote homologEukaryote homolog
0988 0991 0992 0993 0994
0995 0996 0997 0998 0999 1000 10021001 1003
Region TM00987 - TM1003 ( 21kb Archaea-like stretch)Region TM00987 - TM1003 ( 21kb Archaea-like stretch)
79% 69% 69% 72%
72% 69% 65%61% 78%
72%
TransposonTransposon
54%
48%
68% 51%
73%
73%
Regulatory proteinRegulatory protein
TIGRTIGR
Species Distribution of Top Hits:
A. thaliana Chr II
0
250
500
750
1000
TopHits
EAB
Syn. sp
TIGRTIGR
A. thaliana T1E2.8 is a
Chloroplast Derived HSP60
AR ATH -T1E2.8**********
ECOL
HAEIN
VIBCH
VIBCH
RICPR
YEAST
CHLPN
CHLTR
AQUAE
CAMJE
HELPY
BBUR
TREPA
THEMA
BACSU
DEIRA
MCYTU
MCYTU
SYNSP
SYNSP
ODONT CPST
MYCGE
MYCPN
CHLPN
CHLTR
CHLPN
CHLTR
ARCFU
ARCFU
METJA
PYRHO
METTH
METTH
YEAST
YEAST
YEAST
YEAST
CELEG
YEAST
YEAST
YEAST
CELEG
YEAST
YEAST
CELEG
YEAST
CELEG
CELEG
Eukarya
Archaea
Bacteria
Cyano/Cpst
TIGRTIGR
ParA Phylogeny
pOMB25.Bor
BBl32.Borb
Borbu3
Borbu.2
BBM32.Borb
CP32-6.Bor
BBA20.Borb
Cp18.Borbu
pOMB10.Bor
pLp7E.Borb
BBE19.Borb
BBB12.Borb
BBN32.Borb
BBF13.Borb
BBH28.Borb
BBK21.Borb
BBU05.Borb
BBJ17.Borb
BBQ08.Borb
BBF24.Borb
OrfC.Borbu
BBG08.Borb
Pyrab
Pyrho
YZ24 METJA
IncC1.Enta
IncC2.Enta
INC1 ECOLI
INC2 ECOLI
Orf.pRK2
IncC.pRK2
pM3.ParA
ORF3.Pseae
ORFB.Psepu
2603.Vibch*****
ParA.Strco
Strco2
Strco3
Myctu4
Mycle3
Deira.Chro
Soj.Trepa
SOJ BACSU
Ricpr
YGI1 PSEPU
ParA.Caucr
pAG1.Corgl
Mycle
Mycle2
Rv1708.Myc
Strco
Rv3213.Myc
Helpy99
Helpy26695
A00900.Vib*****
ParB.pR27.
ParA.pMT1.
parA.pMT1
parA.phage
ParA phage
ORFA00900
SOPA ECOLI
F-Plasmid
PhageN13
pCD1.Yerpe
pCD1#2.Yer
pYVe227.Ye
pNL1.Sphar
pQPH1.Coxb
p42d.Rhile
p42d.Rhiet
REPA AGRRA
pRiA4b.Agr
pTiB6S3.Ag
pTi-SAKURA
pRL8JI.Rhi
Y4CK Plasm
ParA.Raleu
pL6.5.Psef
Chr2.Deira
MP1#2.Deir
MP1.Deira
PX02.Bacan
ORF298.Clo
SojC.Halsp
Borbu4
sojD.Halsp
plasmid.St
SojB.Halsp
ParA.Rhoer
SOJ MYCPN
SOJ MYCGE
MinD2.Pyra
Pyrho2
pK214.Lacl
PatA.synsp
Deira.ParA
pCHL1.Chlt2
GP5D CHLTR
pCHL1.Chlt
Chltr
Chlps
Chlps2
Chlpn
Chltr2
Chlpn2
Chromosomal
Plasmid
and
Phage
BBQ08.Borb
Chlamydial
Inc
Borrelia
Plasmids
Archaea
Misc
Evolution of Chromosome Partitioning Proteins (ParA)
TIGRTIGR
0
0.1
0.2
0.3
0.4
C
B
A
0
Best Matches by Genetic Element
(D. radiodurans)
TIGRTIGR
N. meningitidis hits vs. genome size
0.0 250.0 500.0 750.0 1000.0 1250.0
Number ofN. meningitidis ORFs that have a significant hit
0.01000.02000.03000.04000.05000.0
TotalORFsinGenome Proteome Comparison ofN. meningitidis to other Complete Genomes
Archaea
H. influenzae
V. cholerae
E. coli
TIGRTIGR
Horizontal Gene Transfer II
TIGRTIGR
Reconciling a Tree of Life in the
Context of Lateral Gene Transfer
TIGRTIGR
rRNA Tree of Complete Genomes
Mycobacterium tuberculosis
Bacillus subtilis
Synechocystis sp.
Caenorhabditis elegans
Drosophila melanogaster
Saccharomyces cerevisiae
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Pyrococcus horikoshii
Methanococcus jannaschii
Aeropyrum pernix
Aquifex aeolicus
Thermotoga maritima
Deinococcus radiodurans
Treponema pallidum
Borrelia burgdorferi
Helicobacter pylori
Campylobacter jejuni
Neisseria meningitidis
Escherichia coli
Vibrio cholerae
Haemophilus influenzae
Rickettsia prowazekii
Mycoplasma pneumoniae
Mycoplasma genitalium
Chlamydia trachomatis
Chlamydia pneumoniae
0.05 changes
Archaea
Bacteria
Eukarya
TIGRTIGR
Whole Genome Phylogeny
TIGRTIGR
rRNA vs. Whole Genome Trees
Mycobacterium tuberculosis
Bacillus subtilis
Synechocystis sp.
Caenorhabditis elegans
Drosophila melanogaster
Saccharomyces cerevisiae
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Pyrococcus horikoshii
Methanococcus jannaschii
Aeropyrum pernix
Aquifex aeolicus
Thermotoga maritima
Deinococcus radiodurans
Treponema pallidum
Borrelia burgdorferi
Helicobacter pylori
Campylobacter jejuni
Neisseria meningitidis
Escherichia coli
Vibrio cholerae
Haemophilus influenzae
Rickettsia prowazekii
Mycoplasma pneumoniae
Mycoplasma genitalium
Chlamydia trachomatis
Chlamydia pneumoniae
0.05 changes
Archaea
Bacteria
Eukarya
TIGRTIGR
Gram Positives
Archaea
Eukaryotes
Syn. sp
Aquifex
Thermotoga
Proteobacteria
Top Hits of D. radiodurans Genes
TIGRTIGR
rRNA Suggested Deinococcus-Thermus Relationship
From
Embley et al.
Syst. Appl.
Microbiol. 16:
25-29
1993
TIGRTIGR
Serratia marcescens
Proteus mirabilis
Proteus vulgaris
Escherichia coli
Erwinia carotovora
Yersinia pestis
Enterobacter agglomerans
Vibrio
anguillarum
Vibrio cholerae
Haem
ophilus influenzae
Pseudomonasfluorescens
Pseudomonasputida
Pseudomonasaeruginosa
Azotobacter vinelandii
Acinotobactercalcoaceticus
Methylophilusmethylotrophus
Methylomonasclara
Methylobacillusflagellatum
Burkholderia
cepacia
Bordetella
pertussis
Xanthomonas oryzae
Legionella pneumophila
Acidiphilum facilis
Thiobacillus ferrooxidans
Neisseria gonorrhoeae
Rhizobium viciae
Myxococcus xanthus1
Myxococcus xanthus2
Campylobacter jejuni
StreptomycesviolaceusStreptomyceslividans
Streptomycesambofaciens
Mycobacteriumleprae
Mycobacteriumtuberculosis
Corynebacteriumglutamicum
Arabidopis thaliana
CPST
Synechococcussp.PCC7002
Synechococcussp.PCC7942
Anabaenavariabilis
Thermotoga maritima
Lactococcuslactis
Streptococcuspneumoniae
Staphylococcusaureus
Bacillussubtilis
Acholeplasm
a
laidlawii
Borrelia burgdorferi
Mycoplasma pulmonis
Mycoplasma mycoides
Bacteroides fragilis
Chlaymida trachomatis
Thermus thermophilus
Thermus aquaticus
Deinococcus radiodurans
Aquifex pyrophilus
0.10
α
γ1
γ2
β
Gram '+' High GC Cyanobacteria
Gram '+' Low GC
D/T
Magnetospirillum magnetotacticum
Helicobacter pylori
ε
δ
95
98
79
100
100
100
90
63
100
94
84
100
95
10088
93 91
75
100
100
100
100
100
8398
100
100
100
Rhizobium phaseoli
Agrobacterium tumefaciens
Rhizobium meliloti
Brucella abortus
Rhodobacter sphaeroides
Rhodobacter capsulatus
Rickettsia prowazekii
Acetobacter polyoxogenes
72 97
78
100
71
100
100 77
88
100
61
55
54
48
49
42
48
46
50
63
46
100
40
TIGRTIGR
Deinococcus-Thermus Comparison
• Took all available T. thermophilus proteins
• Searched against database of all available
complete genomes (including D. radiodurans)
• Identified gene with highest fasta p value
• Phylogenetic analysis of all genes with >4
homologs
TIGRTIGR
Other Bacteria
Archaea
D. radiodurans
Top Hits of T. thermophilus Proteins
TIGRTIGR
Significance of Deinococcus-
Thermus Relationship
• Mechanisms of extreme heat, radiation, and
desiccation resistance may be similar
• Complete genome of Thermus will be very
useful in identifying novel genes in
Deinococcus
• Shows utility of incomplete genome
sequences.
TIGRTIGR
Outline of Phylogenomics
Gene Evolution Events
Phenotype Predictions
Database
Species tree Presence/AbsenceGene trees
Congruence Evol. Distribution
F(x) Predictions
Pathway Evolution
TIGRTIGR
TIGRTIGR
Steps in Phylogenomic Analysis
• Create database of genes of interest
• Presence/absence of homologs in complete genomes
• Phylogenetic trees of each gene family
• Infer evolutionary events (gene origin, duplication, loss and
transfer)
• Refine presence/absence (orthologs, paralogs, subfamilies)
• Functional predictions and functional evolution
• Analysis of pathways
TIGRTIGR
Phylogenomics I:
Presence/Absence of Homologs
• Important to have complete genomes
• Similarity searches with high “homology
threshold” (to prevent false positives)
• Iterative searches (to prevent false negatives)
• Multiple sequence alignments to confirm
assignment of homology and to divide up
multi-domain proteins
TIGRTIGR
Phylogenomics II:
Phylogenetic Analysis of Homologs
• Multiple sequence alignment
• Mask alignment (exclude certain regions)
– ambiguous regions of alignment
– hypervariable regions and regions with large gaps
• Phylogenetic tree with method of choice
• Robustness checks
– bootstrapping
– compare trees with different alignments
– compare trees with different tree-building methods
TIGRTIGR
Phylogenomics III:
Inferring Evolutionary Events
• Infer evolutionary distribution patterns (overlay
presence/absence onto species tree)
• Compare gene tree vs. species tree
• Compare gene tree vs. evolutionary distribution
• Infer gene duplication and transfer events
• Combine gene transfer and duplication
information with evolutionary distribution
analysis to infer gene loss, gene origin, and
timing of gene duplications and transfers
TIGRTIGR
Phylogenomics IV:
Functional Predictions and Evolution
• Overlay experimentally determined functions
onto gene tree
• Infer changes in function
– many changes suggests caution should be used in
making new predictions
• Predict functions based on position in tree
relative to genes with known functions and
based on orthology groups
TIGRTIGR
Phylogenomics V:
Pathway Analysis
• Correlated presence/absence of all genes in pathway in
different species?
– If not, maybe non-orthologous gene displacement
– Alternatively, pathway may be different between species
• Correlated evolutionary events for genes in pathway
– loss of all genes at once
– correlated duplications?
• Compare evolution of function between pathways
– The number of times an activity has evolved helps in making
predictions of function/phenotype
TIGRTIGR
Evolution as a Screening
Method
• Gene duplications
• Gene loss
• Lateral gene transfers
• Organellar genes
• Structurally constrained genes
• Correlated evolutionary changes
TIGRTIGR
Evolutionary Genome Scanning
• Distribution patterns/phylogenetic profiles
• Patterns of evolution
– (ds/dn)
– Structurally constrained genes
– Correlated evolutionary changes
• Lateral gene transfers
– Organellar genes
– Pathogenicity islands
• Subdividing gene families
– Orthologs vs paralogs
– Functional predictions
– Subfamilies
– Motif identification
• Gene duplications
• Gene loss
TIGRTIGR
Genome Sequences Allow
“Hypothesisless Research”
• DNA microarrays
• Proteomics
• GC skew and other nucleotide composition
analyses
• Parallel genome wide genetic experiments
• Evolutionary genome scanning
• Phylogenetic profiles
TIGRTIGR
Evolutionary Diversity Still Poorly
Represented in Complete Genomes
Tmf-penden
R-rubrum3
Azs-brasi2
Rm-vanniel
Rhb-legum8
Bdr-japoni
Spg-capsul
Ric-prowaz
Ste-maltop
Spr-voluta
Rub-gelat2
Rcy-purpur
Nis-gonor1
Hrh-halch2
Alm-vinosm
Ps-aerugi3
E-coliMyx-xanthu
Bde-stolpiDsv-desulfDsb-postgaC-leptum
C-butyric4
C-pasteuri
Eub-barker
C-quercico
Hel-chlor2
Acp-laidla
M-capricol
C-ramosum
B-stearoth
Eco-faecal
Lis-monoc3
B-cereus4
B-subtilis
Stc-therm3
L-delbruck
L-casei
Fus-nuclea
Glb-violac
Olst-lut_CZeamaysC
Nost-muscr
Syn-6301
Tnm-lapsum
Flx-litora
Cy-lytica
Emb-brevi2
Bac-fragil
Prv-rumcol
Prb-difflu
Cy-hutchin
Flx-canada
Sap-grandi
Chl-limico
Wln-succi2
Hlb-pylor6
Cam-jejun5Stm-ambofa
Arb-globif
Cor-xerosi
Bif-bifidu
Cfx-aurant
Tmc-roseum
Aqu-pyroph
env-SBAR12
env-SBAR16
Msr-barker
Tpl-acidop
Msp-hungat
Hf-volcani
Mb-formici
Mt-fervid1
Tc-celer
Arg-fulgid
Mpy-kandl1
M
c-vanniel
Mc-jannasc
env-pJP27
Sul-acalda
Thp-tenax
env-pJP89
Tt-maritim
Fer-island
M
ei-ruber4
D-radiodur
Chd-psitta
Acbt-capsl
env-MC18
Pir-staley
Lpn-illini
Lps-interKSpi-stenos
Trp-pallid
Bor-burgdo
Spi-haloph
Brs-hyodys
Fib-sucS85
Tmf-penden
R-rubrum3
Azs-brasi2
Rm-vanniel
Rhb-legum8
Bdr-japoni
Spg-capsul
Ric-prowaz
Ste-maltop
Spr-voluta
Rub-gelat2
Rcy-purpur
Nis-gonor1
Hrh-halch2
Alm-vinosm
Ps-aerugi3
E-coliMyx-xanthu
Bde-stolpiDsv-desulfDsb-postgaC-leptum
C-butyric4
C-pasteuri
Eub-barker
C-quercico
Hel-chlor2
Acp-laidla
M-capricol
C-ramosum
B-stearoth
Eco-faecal
Lis-monoc3
B-cereus4
B-subtilis
Stc-therm
3
L-delbruck
L-casei
Fus-nuclea
Glb-violac
Olst-lut_CZeamaysC
Nost-muscr
Syn-6301
Tnm
-lapsum
Flx-litora
Cy-lytica
Emb-brevi2
Bac-fragil
Prv-rumcol
Prb-difflu
Cy-hutchin
Flx-canada
Sap-grandi
Chl-limico
Wln-succi2
Hlb-pylor6
Cam-jejun5Stm-ambofa
Arb-globif
Cor-xerosi
Bif-bifidu
Cfx-aurant
Tmc-roseum
Aqu-pyroph
env-SBAR12
env-SBAR16
Msr-barker
Tpl-acidop
Msp-hungat
Hf-volcani
Mb-formici
Mt-fervid1
Tc-celer
Arg-fulgid
Mpy-kandl1
M
c-vanniel
Mc-jannasc
env-pJP27
Sul-acalda
Thp-tenax
env-pJP89
Tt-maritim
Fer-island
M
ei-ruber4
D-radiodur
Chd-psitta
Acbt-capsl
env-MC18
Pir-staley
Lpn-illini
Lps-interKSpi-stenos
Trp-pallid
Bor-burgdo
Spi-haloph
Brs-hyodys
Fib-sucS85
Bacteria Archaea Bacteria Archaea
A.rRNAtreeofBacterialandArchaealMajorGroups B.GroupswithCompletedGenomesHighlighted
TIGRTIGR
Acknowledgements
• Genome duplications: S. Salzberg, J. Heidelberg, O.
White, A. Stoltzfus, J. Peterson
• Genome sequences and analysis: J. Heidelberg, T.
Read, H. Tettelin, K. Nelson, J. Peterson, R.
Fleischmann
• Horizontal transfers: K. Nelson, W. F. Doolittle
• TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul,
Seqcore
• $$$: DOE, NSH, NIH, ONR
TIGRTIGR
TIGRTIGR
TIGTIG
RR
OtherOther
peoplepeople
Mom and DadMom and Dad
S. KarlinS. Karlin
M. FeldmanM. Feldman
A. M. CampbellA. M. Campbell
R. FernaldR. Fernald
R. ShaferR. Shafer
D. AckerlyD. Ackerly
D. GoldsteinD. Goldstein
M. EisenM. Eisen
J. CourcelleJ. Courcelle
R. MyersR. Myers
C. M. CavanaughC. M. Cavanaugh
P. HanawaltP. Hanawalt
NSFNSF
J. HeidelberJ. Heidelber
T.ReadT.Read
S. KaulS. Kaul
M-I BenitoM-I Benito
J. C. VenterJ. C. VenterC. FraseC. Fraser
S. SalzbergS. Salzberg
O. WhiteO. White
K. NelsonK. Nelson
$$$$$$
ONRONR
DOEDOE
NIHNIH
H. TettelinH. Tettelin
TIGRTIGR
Using Evolutionary Analysis To
Help Identify Novel Features in D.
radiodurans
TIGRTIGR
Origins of Extreme Resistance
• Functional divergence
• Evolution of novel genes/processes
• Acquire genes from other species
• Gene duplication and functional divergence
• Enhanced catalytic efficiency and/or
coordination
TIGRTIGR
TIGRTIGR
SNF2 Family of Proteins (1995)
• SNF2 family defined by presence of conserved DNA-
dependent ATPase domain
• 100s of proteins
• Diversity of functions:
– transcriptional activation (SNF2)
– transcriptional repression (MOT1)
– Recombination (RAD54)
– transcription-coupled repair (CSB)
– post-replication repair (RAD5)
– chromosome segregation (lodestar)
– Many with unknown functions
• Some species have 15+ representatives
TIGRTIGR
How to Sort Out Diversity in SNF2 Family
• Presence of additional motifs
– RING fingers
– Bromodomains
– Chromodomains
• Interactions with other proteins
• Evolutionary relationships
– Orthology and paralogy
– Subfamilies
– Relationships among subfamilies
TIGRTIGR
SNF2 Alignment
BRM
hBRM
hBRG1
mBRG1
STH1
SNF2
YB95
F37A4
ISWI
SNF2L
CHD1
SYGP
ETL1
FUN30
MOT1
ERCC6
RAD26
YB53
DNRPPX
hNUCP
mNUCP
RAD5
spRAD8
HIP116
RAD16
LODE
NPH42
HepA
B.cereus ORF
I Ia Ib II III V VI
C
C
R
R
R
R
Br
CHD1
SNF2
SNF2L
ETL1
RAD16
ERCC6
RAD54
RAD54
Br
Br
Br
Br
Br
Protein
Sub-
Family
SCALE (aa)
0 500
Helicase Motif s --
MOT1
IV
TIGRTIGR
SNF2 Subfamilies
Subfamily Conserved Function
SNF2 Transcription activation (Swi/Snf complex)
SNF2L Transcription activation (NURF complex)
CHD1 Chromatin remodelling
ETL1 Unknown
MOT1 Transcription repression
CSB Transcription-coupled repair
Rad54 Recombinational repair
Rad16 Chromatin access for DNA repair
HepA Bacterial RNA polymerase subunit
HepA2 Unknown
TIGRTIGR
What Evolutionary Analysis
Reveals About the SNF2 Family
• Ancient duplication into two lineages may distinguish
genes by type of activity
• Multiple subfamilies with distinct sequences and
functions.
• Presence of particular orthologs can be predicted in
species for which they have not been cloned.
• Predict functions of uncharacterized members by
orthology.
• Addition of motifs to SNF2 domain occurred early in
eukaryotic evolution.
• Many duplications within eukaryotes.
• Classificaiton into subfamilies helps search for
functional motifs
TIGRTIGR
ETL1_M.m
YA19_S.c
CHD1_M.m
SYGP4_S.c
MOT1_S.c
ERCC6_H.s
RAD26_S.c
NUCP_H.s
NUCP_M.m
YB53_S.c
RAD54_S.c
DNRPPX_S.p
RAD5_S.c
RAD8_S.p
HIP116A_H.s
RAD16_S.c
LODE._D.m
NPHCG_42
HEPA._E.c
YB95_S.c
F37A4_C.e
ISWI_D.m
SNF2L_H.s
BRM_D.m
BRM_H.s
BRG1_H.s
BRG1_M.m
STH1_S.c
SNF2_S.c
SNF2
SNF2L
CHD1
ETL1
CSB
R A D54
R A D16
LO DE
Evolution of the SNF2 Family of Proteins
TIGRTIGR

Weitere ähnliche Inhalte

Andere mochten auch

[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from MetagenomesMads Albertsen
 
[2013.11.01] visualizing omics_data
[2013.11.01] visualizing omics_data[2013.11.01] visualizing omics_data
[2013.11.01] visualizing omics_dataMads Albertsen
 
[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomesMads Albertsen
 
The microbiology of the built environment talk for #SequencingCity by @phylog...
The microbiology of the built environment talk for #SequencingCity by @phylog...The microbiology of the built environment talk for #SequencingCity by @phylog...
The microbiology of the built environment talk for #SequencingCity by @phylog...Jonathan Eisen
 
The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen Jonathan Eisen
 
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"Jonathan Eisen
 
RNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A ReviewRNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A Reviewsesejun
 
Hand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional predictionHand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional predictionJonathan Eisen
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introductionMads Albertsen
 
バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析sesejun
 
Microbiome Studies - Challenges and Opportunities
Microbiome Studies - Challenges and Opportunities Microbiome Studies - Challenges and Opportunities
Microbiome Studies - Challenges and Opportunities Jonathan Eisen
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomicsMads Albertsen
 
[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomicsMads Albertsen
 
[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際
[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際
[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際DNA Data Bank of Japan center
 

Andere mochten auch (14)

[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
 
[2013.11.01] visualizing omics_data
[2013.11.01] visualizing omics_data[2013.11.01] visualizing omics_data
[2013.11.01] visualizing omics_data
 
[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes
 
The microbiology of the built environment talk for #SequencingCity by @phylog...
The microbiology of the built environment talk for #SequencingCity by @phylog...The microbiology of the built environment talk for #SequencingCity by @phylog...
The microbiology of the built environment talk for #SequencingCity by @phylog...
 
The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen
 
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
 
RNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A ReviewRNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A Review
 
Hand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional predictionHand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional prediction
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction
 
バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析
 
Microbiome Studies - Challenges and Opportunities
Microbiome Studies - Challenges and Opportunities Microbiome Studies - Challenges and Opportunities
Microbiome Studies - Challenges and Opportunities
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics
 
[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際
[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際
[DDBJing33] ゲノムワイド多型を利用した遺伝解析の実際
 

Ähnlich wie Evolutionary Genome Scanning - talk by J. Eisen in 2000 at MBL Molecular Evolution Course

A review of phthalates and the associated reproductive and decelopmental toxi...
A review of phthalates and the associated reproductive and decelopmental toxi...A review of phthalates and the associated reproductive and decelopmental toxi...
A review of phthalates and the associated reproductive and decelopmental toxi...Emma Greenwell
 
Cranial, craniofacial and skull base surgery uploaded for egypt orl-hns
Cranial, craniofacial and skull base surgery    uploaded for egypt orl-hnsCranial, craniofacial and skull base surgery    uploaded for egypt orl-hns
Cranial, craniofacial and skull base surgery uploaded for egypt orl-hnsEdgar Javier Majano B
 
Espaço SINDIMETAL 30
Espaço SINDIMETAL 30Espaço SINDIMETAL 30
Espaço SINDIMETAL 30SINDIMETAL RS
 
crop breeding.pdf
crop breeding.pdfcrop breeding.pdf
crop breeding.pdfKareemUmer
 

Ähnlich wie Evolutionary Genome Scanning - talk by J. Eisen in 2000 at MBL Molecular Evolution Course (6)

A review of phthalates and the associated reproductive and decelopmental toxi...
A review of phthalates and the associated reproductive and decelopmental toxi...A review of phthalates and the associated reproductive and decelopmental toxi...
A review of phthalates and the associated reproductive and decelopmental toxi...
 
Voss & jansa 2009
Voss & jansa 2009Voss & jansa 2009
Voss & jansa 2009
 
Cranial, craniofacial and skull base surgery uploaded for egypt orl-hns
Cranial, craniofacial and skull base surgery    uploaded for egypt orl-hnsCranial, craniofacial and skull base surgery    uploaded for egypt orl-hns
Cranial, craniofacial and skull base surgery uploaded for egypt orl-hns
 
Espaço SINDIMETAL 30
Espaço SINDIMETAL 30Espaço SINDIMETAL 30
Espaço SINDIMETAL 30
 
Oc in pakistan
Oc in pakistanOc in pakistan
Oc in pakistan
 
crop breeding.pdf
crop breeding.pdfcrop breeding.pdf
crop breeding.pdf
 

Mehr von Jonathan Eisen

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfJonathan Eisen
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesJonathan Eisen
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingJonathan Eisen
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2Jonathan Eisen
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4Jonathan Eisen
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 Jonathan Eisen
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines Jonathan Eisen
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionJonathan Eisen
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2Jonathan Eisen
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionJonathan Eisen
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionJonathan Eisen
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingJonathan Eisen
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionJonathan Eisen
 

Mehr von Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 

Kürzlich hochgeladen

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 

Kürzlich hochgeladen (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 

Evolutionary Genome Scanning - talk by J. Eisen in 2000 at MBL Molecular Evolution Course

  • 1. TIGRTIGR Topics of Discussion • Introduction to phylogenomics • Phylogenomics Examples – Functional prediction – Identifying “unusual” genes in genomes – Gene duplication – Genetic exchange within genomes – Gene loss – Horizontal gene transfer – Specialization – Comparing close relatives – Species evolution
  • 2. TIGRTIGRTIGRTIGR “Nothing in biology makes sense except in the light of evolution.” T. H. Dobzhansky (1973)
  • 4. TIGRTIGR Uses of Evolutionary Analysis in Molecular Biology • Identification of mutation patterns (e.g., ts/tv ratio) • Amino-acid/nucleotide substitution patterns useful in structural studies (e.g., rRNA) • Sequence searching matrices (e.g., PAM, Blosum) • Motif analysis (e.g., Blocks) • Functional predictions • Classifying multigene families • Evolutionary history puts other information into perspective (e.g., duplications, gene loss) TIGRTIGR
  • 5. TIGRTIGR Phylogenomic Analysis Phylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences.
  • 6. TIGRTIGR Why use Phylogenomics? Evolutionary information improves genome analysis -Classification of multigene families -Predicting functions -Origins of genes and pathways Genomics information improves evolutionary reconstructions -More sequences of genes -Unbiased sampling -Presence/absence needed to infer certain events Feedback loop between two types of analysis TIGRTIGR
  • 8. TIGRTIGR Uses of Phylogenomics I: Functional Predictions
  • 9. TIGRTIGR Predicting Function • Identification of motifs • Homology/similarity based methods – Highest hit – Top hits – Clusters of orthologous groups – HMM models – Structural threading and modeling – Evolutionary reconstructions TIGRTIGR
  • 10. TIGRTIGR Types of Molecular Homology • Homologs: genes that are descended from a common ancestor (e.g., all globins) • Orthologs: homologs that have diverged after speciation events (e.g., human and chimp β-globins) • Paralogs: homologs that have diverged after gene duplication events (e.g., α and β globin). • Xenologs: homologs that have diverged after lateral transfer events • Positional homology: common ancestry of specific amino acid or nucleotide positions in different genes
  • 12. TIGRTIGR Blast Search of H. pylori “MutS” Score E Sequences producing significant alignments: (bits) Value sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10 sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09 sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08 sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07 sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07 • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs
  • 13. TIGRTIGR H. pylori and MutS • Prior to this genome, all species that encoded a MutS homolog also encoded a MutL homolog • Experimental studies have shown MutS and MutL always work together in mismatch repair • Problem: what do we conclude about H. pylori mismatch repair
  • 14. TIGRTIGR Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Bacteria Escherichia coli K12 1 Haemophilus influenzae Rd KW20 1 Neisseria gonorrhoeae 1 Helicobacter pylori 26695 1 Mycoplasma genitalium G-37 0 Mycoplasma pneumoniae M129 0 Bacillus subtilis 169 2 Streptococcus pyogenes 2 Synechocystis sp. PCC6803 2 Treponema pallidum Nichols 1 Borrelia burgdorferi B31 2 Aquifex aeolicus 2 Deinococcus radiodurans R1 2 Archaea Archaeoglobus fulgidus VC-16, DSM4304 0 Methanococcus janasscii DSM 2661 0 Methanobacterium thermoautotrophicum ∆H 1 Eukaryotes Saccharomyces cerevisiae 6 Homo sapiens 5
  • 15. TIGRTIGR MutS Alignment EEDLKNRLCQKF . DA . HYNT IWMPT IQA I SN IDCLLA I TRTSEYLGAPSC DTSLKDCMRRLFCNFDKNHKDWQSAVEC IAVLDVLLCLANYSQGGDGPMC CSAEWLDFLEK . FS . . EHYHSLCKAVHHLATVDC I FSLAKV . . AKQGDYC SELQYKEFLNK . I T . . AEYTELRK I TLNLAQYDC I LSLAAT . . SCNVNYV EYELYKELRER . VV . . KELDKVGNNASAVAEVDF IQSLAQ I . . AYEKDWA EYELFTELREK . VK . . QY I PRLQQLAKQMSELDALQCFAT I . . SENRHYT EYE I FTEVRAT . VA . . EKAQP IRDVAKAVAA IDVLAGLAEV . . AVYQGYC EQRVLKS I TDE . IV . . SHHKTLRSLANALDELD I STSLATL . . AQEQDFV EAN I IDLFKRK . F I . . DRSNVVRQVATTLGYLDTLSSFAVL . . ANERNLV QDA IVKE IVN I . SS . . GYVEPMQTLNDVLAQLDAVVSFAHVSNGAPVPYV QSALVRE I IN I . TL . . TYTPVFEKLSLVLAHLDV IASFAHTSSYAP I PY I EEER I LRQLSDQVL . . EVLLDLEHLLA IATRLDLATARVRY . . . SFWLGA EVRKVLQR I TEY IG . . DYAKELLESFEACVEVDFQQCKYRF . . SKLVEGS E I ER I LRVLTEKTA . . EYTEELFLDLQVLQTLDF I FAKARY . . AKAVKAT TYMIVCKLLSE . IY . . EH IHCLYKLSDTVSMLDMLLSFAHA . . CTLSDYV SEETVDELLDK . IA . . TH I SELFMIAEAVA I LDLVCSFTYN . . LKENNYT ETLLMYQLQCQ . VL . . ARAAVLTRVLDLASRLDVLLALASA . . ARDYGYS E I E I LFSLQEQ . I L . . RRKTQLTAYN I LLSELE I LLSFAQV . . SAERNYA RPT IVDEVDSKTNTQLNGFLKFKSLRHPCFNLGA . . . TTA . KDF I PND I E RPE IVLP . . GEDTHP . . . FLEFKGSRHPC I TKTF . . . FG . . DDF I PND I L RPTVQEE . . . . . . R . . . . K IV IKNGRHPV IDVLL . . GEQ . . DQYVPNNTD RPTFVNG . . . . . QQ . . . . A I IAKNARNP I I ESLD . . . . . . . VHYVPND IM KPQ IHE . . . . . . GY . . . . EL I I EEGRHPV I EEF . . . . . V . . ENYVPNDTK KPEFSK . . . . . . D . . . . . EVEV I EGRHPVVEKVM. . . DS . . QEYVPNNCM RP IMQM. . . . . EPG . . . . L ID I EAGRHPVVEQSL . . . GA . . GFFVANDTQ RPVVDD . . . . . SH . . . . . AHTV IQGRHP IVEKGL . . SHKL . I PFTPNDCF CPKVDE . . . . . SN . . . . . KLEVVNGRHLMVEEGL . . SARSLETFTANNCE RPA I LEK . . . . GQG . . . . R I I LKASR . . . VEVQD . . . . E . . IAF I PNDVY RPKLHPM. . . DSER . . . . RTHL I SSRHPVLEMQD . . . . D . . I SF I SNDVT HPPQWL . . . TPGDEK . . . P I TLRQLRHPLLHWQA . . EKEGGPAVVP I TLT FPDFGE . . . . .WVE . . . . . . . LYEARHPVLVLVKED . . . . . VVPVG I LLK KP IMND . . . . . TG . . . . . F IRLKKARHPLLPP . . . . . . . . . DQVVAND I E RPEFTD . . . . . . . . . . . . TLA IKQGWHP I LEK I S . . . . A . . EKP IANNTY I P I FTN . . . . . . . . . . . . NLL IRDSRHPLLEKVL . . . . . . . KNFVPNT I S RPRYSPQ . . . . VL . . . . . GVR IQNGRHPLMELCA . . . . . . . RTFVPNSTE EPQLVE . . . . . DEC . . . . I LE I INGRHALYETFL . . . . . . . DNY I PNSTM LGKE . . . . . . QPR . . . . . . IGCE . . . EEAEEHGKAY . . LSED . . . . . . SER . . . . . . MSPE . . . . . . NGK . . . . . . LDRD . . . . . . SF . . . . . . . MGDN . . . . . . RQ . . . . . . . LGHD . . . . . . HWHPD . . . . VGNGNV . . . . N . . . . . . . . LAKD . . . . . . N . . . . . . . . FEKD . . . . . . KQM. . . . . . LESG . . . . . . KGD . . . . . . IDSQ . . . . . . IR . . . . . . . EKKG . . . . . . . . . . . . . . . LGRD . . . . . . FS . . . . . . . VTE . . . . . . . GSN . . . . . . STKH . . . . . . SSS . . . . . . CGGD . . . . . . KGR . . . . . . IDGG . . LFSELSWCEQNKG . LGLLTGANAAGKST I LRMAC IAV IMAQMGC . CVLVTGPNMGGKSTL IRQAGLLAVMAQLGC . VMI I TGPNMGGKSSY IKQVAL I T IMAQ IGS . IN I I TGPNMGGKSSY IRQVALLT IMAQ IGS . IHV I TGPNMAGKSSY IRQVGVLTLLSH IGS .MLL I TGPNMSGKSTYMRQ IAL I S IMAQ IGC . LV I LTGPNASGKSCYLRQVGL IQLMAQTGS . IWL I TGPNMAGKSTFLRQNA I I S I LAQ IGS . LWV I TGPNMGGKSTFLRQNA I IV I LAQ IGC . FH I I TGPNMGGKSTY IRQTGV IVLMAQ IGC . FL I I TGPNMGGKSTY IRQVGV I SLMAQ IGC . V IA I TGPNTGGKTVTLKTLGLVALMAKVGL . . L I LTGPNTGGKTVALKTLGLSVLMFQSA I . T IV I TGPNTGGKTVTLKTLGLLTLMAQSGL . FL I I TGPNMSGKSTYLKQ IALCQ IMAQ IGS . LQ I I TGCNMSGKSVYLKQVAL IC IMAQMGS . VKV I TGPNSSGKS IYLKQVGL I TFMALVGS R I IVVTGANASGKSVYLTQNGL IVYLAQ IGC YVPCESA . VLTP IDR IMTRLGANDN IMQGKSTFFVELAETKK I LD . . . . . YVPAEKC . RLTPVDRVFTRLGASDR IMSGESTFFVELSETAS I LR . . . . . YVPAEEA . T IG IVDG I FTRMGAADN IYKGRSTFMEELTDTAE I IR . . . . . FVPAEE I . RLS I FENVLTR IGAHDD I INGDSTFKVEMLD I LH I LK . . . . . F I PARRA . K I PVVDALFTR IGSGDVLALGVSTFMNEMLEVSN I LN . . . . . FVPAKKA . VLP I FDQ I FTR IGAADDL I SGQSTFMVEMLEAKNA IV . . . . . F I PAKTA . TLS ICDR I FTRVGAVDDLATGQSTFMVEMNETAN I LN . . . . . FVPASNA . R IG IVDQ I FSR IGSADNLYQQKSTFMVEMMETSF I LK . . . . . FVPCSKA . RVG IVDKLFSRVGSADDLYNEMSTFMVEMI ETSF I LQ . . . . . FVPCESA . EVS IVDC I LARVGAGDSQLKGVSTFMAEMLETAS I LR . . . . . FVPCEEA . E IA IVDA I LCRVGAGDSQLKGVSTFMVE I LETAS I LK . . . . . Y I PAKETVEMPWFAQ I LAD IGDEQSLQQNLSTFSGH ICR I IR I LQALPSG PVPASPNSKLPLFEKVFTD IGDEQS I EQNLSTFSAHVKNMAEFLP . . . . . H I PADEGSEAAVFEHVFAD IGDEQS I EQSLSTFSSHMVN IVG I LE . . . . . YVPAEYS . SFR IAKQ I FTR I STDDD I ETNSSTFMKEMKE IAY I LH . . . . . G I PALYG . SFPVFKRLHARV . CNDSMELTSSNFGFEMKEMAYFLD . . . . . FVPAEEA . E IGAVDA I FTR IHSCES I SLGLSTFMIDLNQVAKAVN . . . . . FVPAERA . R IG IADK I LTR IRTQETVYKTQSSFLLDSQQMAKSLS . . . . . C S A A A S A A A A A A A G A S L . . . . . . . . . . . . .MATNRSLLVVDELGRGGSSSDGFA I . . . . . . . . . . . . . HATAHSLVLVDELGRGTATFDGTA I . . . . . . . . . . . . . KATSQSLV I LDELGRGTSTHDG IA I . . . . . . . . . . . . . NCNKRSLLLLDEVGRGTGTHDG IA I . . . . . . . . . . . . . NATEKSLV I LDEVGRGTSTYDG IA I . . . . . . . . . . . . . NATKNSL I LFDE IGRGTSTYDGMAL . . . . . . . . . . . . . HATAKSLVLLDE IGRGTATFDGLA I . . . . . . . . . . . . . NATRRSFV IMDE IGRGTTASDG IA I . . . . . . . . . . . . . GATERSLA I LDE IGRGTSGKEG I S I . . . . . . . . . . . . . SATKDSL I I IDELGRGTSTYDGFGL . . . . . . . . . . . . . NASKNSL I IVDELGRGTSTYDGFGL VQDVLDPE IDSPNHP I FPSLVLLDEVGAGTDPTEGSAL . . . . . . . . . . . . . KSDENTLVL IDELGAGTDP I EGSAL . . . . . . . . . . . . . QVNENSLVLFDELGAGTDPQEGAAL . . . . . . . . . . . . . NANDKSL I L IDELGRGTNTEEG IG I . . . . . . . . . . . . . D INTETLL I LDELGRGSS IADGFCV . . . . . . . . . . . . . NATAQSLVL IDEFGKGTNTVDGLAL . . . . . . . . . . . . . LATEKSL I L IDEYGKGTD I LDGPSLF Y ESVLHHVATH I SAVVKELAET I YATLEYF IRDV AL IKYFSELS KA IVKY I SEKL QA I I EYVHDH I WSVAEYLAGE I YGCLKYLST IN YATLKYLLENN WA I SEY IATK I WA IAEH IASK I IALLRHLADQP IG I LEYLKKKK MS I LDDVHRTN YAVCEYLLSLK LAVTEHLLRTE AAVLRHWLARG GS IMLNMSKSE . QSLGF . FATHYGTLASSFKHHPQ . VRPLKMS I L . . . VDE . . . . . A . . . . . KCRTL . FSTHYHSLVEDYSKSVC . VRLGHMACM. . . VENECEDPS . . . . . KSLTL . FVTHYPPVCELEKNYSHQVGNYHMGFL . . . VSEDESKLDPGAA . DCPL I LFTTHFPMLGE IKSPL . . . IRNYHMDYV . . . . EEQKTGED . . . . . KAKTL . LATHFLE I TELEGK I EG . VKNYHMEVE . . . . . . . . . . . KT . . . . GAKTL . FSTHYHELTVLEDKLPQ . LKNVHVRAE . . . . . . . . . . . EY . . . . QART I . FATHYHELNELASLLEN . VANFQVTVK . . . . . . . . . . . EL . . . . HSRTL . FATHAHQLTNLTKSFKN . VECYCTNLS . . . . . IDRD . . . . . . . . QCRTL . FATHFGQELKQ I IDNKC . SKGMSEKVK . . . . . . FYQSG I TDLG . GAFCM. FATHFHELTALANQ I PT . VNNLHVTALT . . . . . . . . . . . . . . . . GCFAL . FATHFHELTELSEKLPN . VKNMHVVAH I . . . . . EKNLKEQKH . . . CLTV . ATTHYGELKALKYQDAR . FENASVEFD . . . . . . . . . . . . . . . . . . AWVF . VTTHHTP IKLYSTNSDY . YTPASVLFD . . . . . . . . . . . . . . . . . . ARVL . ATTHYPELKAYGYNREG . VMNASVEFD . . . . . . . . . . . . . . . . . . AFTL . FATHFLELCH IDALYPN . VENMHFEVQ . . . . . . . . HVK . . . NT . . ATVF . LSTHFQD I PK IMSKKPA . VSHLHMDAV . . . . . . . . LLN . . . . . PTCPH I FVATNFLSLVQLQLLPQGPLVQYLTMET . . . . . . . . . . . . . . . . . KCPR I IACTHFHELFNENVLTEN IKG IKHYCTD I L I SQKYNLLETAHVG . . . . TRNVTFLYKMLEGQSEGSFGMHVASMCG I SKE I IDNAQ IAAD . . . . QET I TFLYKF IKGACPKSYGFNAARLANLPEEV IQKGHRKAR EQV . PDFVTFLYQ I TRG IAARSYGLNVAKLADVPGE I LKKAAHKSK . . . .WMSV I FLYKLKKGLTYNSYGMNVAKLARLDKD I INRAFS I SE . . . . PEG IRFLY I LKEGKAEGSFG I EVAKLAGLPEEVVEEARK I LR . . . . NGTVVFLHQ IKEGAADKSYG IHVAQLAELPGDL IARAQD I LK . . . . PEE I I FLHQVTPGGADKSYG I EAGRLAGLPSSV I TRARQVMA . . . . DHTFSFDYKLKKGVNYQSHGLKVAEMAG I PKNVLLAAEEVLT . . . . GNNFCYNHKLKPG ICTKSDA IRVAELAGFPMEALKEARE I LG . . . TEETLTMLYQVKKGVCDQSFG IHVAELANFPKHV I ECAKQKAL . . . DDED I TLLYKVEPG I SDQSFG IHVAEVVQFPEK IVKMAKRKAN . . . . DQSLSPTYRLLWG I PGRSNALA IAQRLGLPLA IVEQAKDKLG . . . . RETLKPLYK IAYNTVGESMAFY IAQKYG I PSEV I E IAKRHVG . . . . I ETLSPTYKLL IGVPGRSNAFE I SKRLGLPDH I IGQAKSEMT SRNKEA I LYTYKLSKGLTEEKNYGLKAAEVSSLPPS IVLDAKE I TT . . . . DNSVKMNYQLTQKSVA I ENSG IRVVKK I FNPD I IAEAYNMDS . . . CEDGNDLVFFYQVCEGVAKASHASHTAAQAGLPDKLVARGKEV EDHESEG I TFLFKVKEG I SKQSFG IYCAKVCGLSRD IVERAEELSR ----------------I------------------ -----------II------------ ------------III------------ ------IV------ MSH6__Yeast MSH6__Mouse MSH3__Human MSH3__Yeast MutS__Aquae MutS__Bacsu MutS__Synsp MSH1__Pombe MSH1__Yeast MSH2__Human MSH2__Yeast MutS2_Synsp MutS2_Aquae MutS2_Bacsu MSH4__Human MSH4__Yeast MSH5__Human MSH5__Yeast MSH6__Yeast MSH6__Mouse MSH3__Human MSH3__Yeast MutS__Aquae MutS__Bacsu MutS__Synsp MSH1__Pombe MSH1__Yeast MSH2__Human MSH2__Yeast MutS2_Synsp MutS2_Aquae MutS2_Bacsu MSH4__Human MSH4__Yeast MSH5__Human MSH5__Yeast MSH6__Yeast MSH6__Mouse MSH3__Human MSH3__Yeast MutS__Aquae MutS__Bacsu MutS__Synsp MSH1__Pombe MSH1__Yeast MSH2__Human MSH2__Yeast MutS2_Synsp MutS2_Aquae MutS2_Bacsu MSH4__Human MSH4__Yeast MSH5__Human MSH5__Yeast
  • 16. TIGRTIGR Phylogenetic Tree of MutS Family Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human
  • 18. TIGRTIGR MutS Subfamilies • MutS1 Bacterial MMR • MSH1 Euk - mitochondrial MMR • MSH2 Euk - all MMR in nucleus • MSH3 Euk - loop MMR in nucleus • MSH6 Euk - base:base MMR in nucleus • MutS2 Bacterial - function unknown • MSH4 Euk - meiotic crossing-over • MSH5 Euk - meiotic crossing-over
  • 19. TIGRTIGR Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Which Subfamilies? Bacteria Escherichia coli K12 1 MutS1 Haemophilus influenzae Rd KW20 1 MutS1 Neisseria gonorrhoeae 1 MutS1 Helicobacter pylori 26695 1 MutS2 Mycoplasma genitalium G-37 0 - Mycoplasma pneumoniae M129 0 - Bacillus subtilis 169 2 MutS1,MutS2 Streptococcus pyogenes 2 MutS1,MutS2 Synechocystis sp. PCC6803 2 MutS1,MutS2 Treponema pallidum Nichols 1 MutS1 Borrelia burgdorferi B31 2 MutS1,MutS2 Aquifex aeolicus 2 MutS1,MutS2 Deinococcus radiodurans R1 2 MutS1,MutS2 Archaea Archaeoglobus fulgidus VC-16, DSM4304 0 - Methanococcus janasscii DSM 2661 0 - Methanobacterium thermoautotrophicum ∆H 1 MutS2 Eukaryotes Saccharomyces cerevisiae 6 MSH1-6 Homo sapiens 5 MSH2-6
  • 20. TIGRTIGR Overlaying Functions onto Tree Aquae Trepa Rat Fly Xenla Mouse Human Yeast Neucr Arath Borbu Synsp Neigo Thema Strpy Bacsu Ecoli TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath StrpyBacsu Human Celeg Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2
  • 21. TIGRTIGR Functional Prediction Using Tree Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath MSH1 Repair in Mictochondria MSH3 Repair of Loops in Nucleus MSH6 Repair of Mismatches in Nucleus MutS1 Repair of Loops and Mismatches StrpyBacsu Celeg Human Yeast MetthBorbu Aquae Synsp Deira Helpy mSaco Yeast Celeg Human MSH4 Meiotic Crossing-Over MSH5 Meiotic Crossing-Over MutS2 Unknown Functions MSH2 Repair of Loops and Mismatches in Nucleus
  • 22. TIGRTIGR Table 3. Presence of MutS Homologs in Complete Genomes Sequences Species # of MutS Homologs Which Subfamilies? MutL Homologs Bacteria Escherichia coli K12 1 MutS1 1 Haemophilus influenzae Rd KW20 1 MutS1 1 Neisseria gonorrhoeae 1 MutS1 1 Helicobacter pylori 26695 1 MutS2 - Mycoplasma genitalium G-37 - - - Mycoplasma pneumoniae M129 - - - Bacillus subtilis 169 2 MutS1,MutS2 1 Streptococcus pyogenes 2 MutS1,MutS2 1 Mycobacterium tuberculosis - - - Synechocystis sp. PCC6803 2 MutS1,MutS2 1 Treponema pallidum Nichols 1 MutS1 1 Borrelia burgdorferi B31 2 MutS1,MutS2 1 Aquifex aeolicus 2 MutS1,MutS2 1 Deinococcus radiodurans R1 2 MutS1,MutS2 1 Archaea Archaeoglobus fulgidus VC-16, DSM4304 - - - Methanococcus janasscii DSM 2661 - - - Methanobacterium thermoautotrophicum ∆H 1 MutS2 - Eukaryotes Saccharomyces cerevisiae 6 MSH1-6 3+ Homo sapiens 5 MSH2-6 3+
  • 23. TIGRTIGR Why was the MutS2 Family Missed? Blast Search of Syn. sp. MutS#2 Sequences producing significant alignments: (bits) Value sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17 sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16 sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14 sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14 sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14 • Blast search pulls up standard MutS genes but with only a moderate p value (10-17 )
  • 24. TIGRTIGR Problems with Similarity Based Functional Prediction • Prone to database error propagation. • Cannot identify orthologous groups reliably. • Perform poorly in cases of evolutionary rate variation and non-hierarchical trees (similarity will not reflect evolutionary relationships in these cases) • May be misled by modular proteins or large insertion/deletion events. • Are not set up to deal with expanding data sets. TIGRTIGR
  • 27. TIGRTIGR Rate Variation and Duplication Species 3 Species 1 Species 2 1A 2A 3A 1B 2B 3B Duplication
  • 28. TIGRTIGR AlkA Domain (O6-Me-Gglycosylase) Ogt Domain (O6-Me-Galkyltransferase) Ada Domain (transcriptions regulator) Ada E. coli Ada H. infl Ogt E. coli Ogt H. infl Ogt Gram+ Ogt D. radio AlkA Gram+ AlkAE. coli MGMTEuks Alkylation Repair Genes
  • 29. TIGRTIGR Evolutionary Method P H Y L O G E N E N E T IC P R E D IC T IO N O F G E N E F U N C T IO N ID E N T IFY H O M O L O G S O V E R L A Y K N O W N FU N C T IO N S O N T O T R E E IN FE R L IK E L Y FU N C T IO N O F G E N E (S) O F IN T E R E ST 1 2 3 4 5 6 3 5 3 1A 2A 3A 1B 2B 3B 2A 1B 1A 3A 1B 2B 3B A L IG N SE Q U E N C E S C A L C U L A T E G E N E T R E E 1 2 4 6 C H O O S E G E N E (S) O F IN T E R E ST 2A 2A 5 3 S pecies 3S pecies 1 S pecies 2 1 1 2 2 2 31 1A 3A 1A 2A 3A 1A 2A 3A 4 6 4 5 6 4 5 6 2B 3B 1B 2B 3B 1B 2B 3B A C T U A L E V O L U T IO N (A SSU M E D T O B E U N K N O W N ) Duplication? E X A M P L E A E X A M P L E B D uplication? D uplication? D uplication 5 M E T H O D A m biguous
  • 30. TIGRTIGR MutS.Aquae orf.Trepa SPE1.Drome MSH2.Xenla MSH2.Rat MSH2.Mouse MSH2.Human MSH2.Yeast MSH2.Neucr atMSH2.Arath MutS.Borbu orf.Strpy MutS.Bacsu MutS SynspMutS Ecoli orf Neigo MutS Thema MutS Theaq orf.Deira orf.Chltr MSH1.Spombe MSH1.Yeast MSH3.Yeast Swi4.Spombe Rep3.Mouse hMSH3.Human orf.Arath MSH6.Yeast GTBP.Human GTBP.Mouse MSH6.Arath orf Strpy yshD Bacsu MSH5 Caeel hMHS5 human MSH5 Yeast MutS.Metth orf Borbu MutS2 Aquae MutS Synsporf Deira MutS.Helpy sgMutS.Saugl MSH4.Yeast MSH4.Caeel hMSH4.Human A. Aquae Trepa Fly Xenla Rat Mouse Human Yeast Neucr Arath Borbu Strpy Bacsu Synsp Ecoli Neigo Thema TheaqDeira Chltr Spombe Yeast Yeast Spombe Mouse Human Arath Yeast Human Mouse Arath MutS2.Metth MutS2.Saugl StrpyBacsu Caeel Human Yeast Borbu Aquae Synsp Deira Helpy Yeast Caeel Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 B. Aquae Trepa Xenla Neucr Arath Borbu Synsp Neigo Thema Deira Chltr Spombe Spombe Arath Mouse Mouse Fly Rat Mouse Human Yeast Strpy Bacsu Ecoli Theaq Yeast Yeast Human Yeast Human Arath StrpyBacsu Human MutS2-MetthBorbu Aquae Synsp Deira Helpy MutS2-Saugl Caeel Yeast Yeast Caeel Human MSH4 MSH5 MutS2 MutS1 MSH1 MSH3 MSH6 MSH2 C. MutS2StrpyBacsu MutS2.MetthBorbu Aquae Synsp Deira Helpy MutS2.Saugl Caeel Yeast Yeast Caeel Human Human MSH4 Segregation & Crossover MSH5 Segregation & Crossover Fly Mouse Human Yeast Aquae Trepa Xenla Neucr Arath Borbu Synsp Neigo Thema Deira Chltr Spombe Spombe Arath Arath MutS1 All MMR (Bacteria) Rat Strpy Bacsu Ecoli Theaq Yeast Yeast Mouse Human Yeast Human Mouse MSH1 MMR in Mitochondria MSH3 MMR of Large Loops in Nucleus MSH6 MMR of Mismatches and Small Loops in Nucleus MSH2 All MMR in Nucleus D.
  • 31. TIGRTIGR Clustering vs. Neighbor-joining MutS2.Syns MutS2.Bacs MutS2.Help MutS2.Deir Mutsl.Mett MSH4.Celeg MSH4.Yeast MSH4.human mMutS.Saco MSH3.yeast C23C11.Spo MSH1.Yeast MSH3.Human REP1.Mouse GTBP.Mouse GTBP.Human MSH6.Yeast MSH5.Human MSH5.Celeg MSH5.Yeast MSH2.Human MSH2.Mouse MSH2.Yeast MutS.Ecoli MutS.Synsp MutS.Deira MutS.Bacsu M utS.Ecoli M utS.Synsp M utS.B acsu M utS.Deira M SH 2.H uman M SH 2.M ouse M SH 2.Yeast M SH 3.H uman R EP1.M ouse G TB P.M ouse G TB P.H uman M SH 6.Yeast C 23C 11.Sp o M SH 1.Yeast M SH 3.yeast M SH 4.C eleg M SH 4.human M SH 5.C eleg M SH 5.Yeast mM utS.Saco M SH 5.H uman M SH 4.Yeast M utS2.Syns M utS2.B acs M utS2.Deir M utS2.H elp M utsl.M ett UPGMANeighbor-Joining
  • 33. TIGRTIGR UvrA Gene Family • UvrA has conserved role in nucleotide excision repair in bacteria (part of UvrABCD complex) • UvrA homologs found in all complete bacterial genomes • Some UvrA homologs have been found to be involved in resistance to DNA damaging antibiotics • UvrA accumulates at membrane after DNA damage • All UvrAs are members of the ABC transporter family • Possible role in DNA damage export?
  • 34. TIGRTIGR UvrAs in D. radiodurans • UvrA homolog in D. radiodurans shown to be part of UV endonuclease α complex • D. radiodurans genome sequence reveals a second UvrA gene - on the large megaplasmid • D. radiodurans known to export DNA repair products (e.g., damaged bases) out of cell after damage • Export may be important for radiation resistance (Battista 1997)
  • 35. TIGRTIGR UvrA Evolution • Originated by gene duplication of an ABC transporter • Subsequently, there was a tandem duplication of the ABC transporter motif within UvrA • Ancient duplication into UvrA1 and UvrA2 subfamilies • UvrA1s - conserved role in NER • UvrA2s - transport of DNA damage? • UvrA2 in D. radiodurans may be from lateral transfer
  • 36. TIGRTIGR Evolution of UvrA Family UvrA2 UvrA2 S. coelicolor DrrC S. peuceteus UvrA2 D. radiodurans Duplication in UvrA family UvrA1 UvrA H. influenzae UvrA E. coli UvrA N. gonorrhoaea UvrA R. prowazekii UvrA S. mutans UvrA S. pyogenes UvrA S. pneumoniae UvrA B. subtilis UvrA M. luteus UvrA M. tuberculosis UvrA M. hermoautotrophicum UvrA H. pylori UvrA C. jejuni UvrA P. gingivalis UvrA C. tepidum uvra1 D. radiodurans UvrA T. thermophilus UvrA T. pallidum UvrA B. burgdorefi UvrA T. maritima UvrA A. aeolicus UvrA Synechocystis sp. UvrA1 UvrA2 OppDF UUP NodI LivF XylG NrtDC PstB MDR HlyB TAP1 CFTR, SUR A. ABC Transporters B. UvrA Subfamily
  • 37. TIGRTIGR UvrA Evolution Diversification of ABC family UvrA UvrAC UvrAN UvrA1C UvrA1N UvrA2C UvrA2N ABC1ABC2 ABC Tandem Duplication Gene Duplication
  • 38. TIGRTIGR Three V. cholerae Photolyases Phr.S thyp PHR E. coli ORFA00965* * * * * * * * * phr.neucr Phr.Tricho Phr.Yeast Phr.B firm phr.strpy phr.haloba PHR STRGR pCRY1.huma phr.mouse phr2.human phr2.mouse phr.drosop phr3.Synsp O RF02295.V ibch* * * * * * * * phr.neigo ORF01792.V ibch* * * * * * * Phr.Adiant Phr2.Adian Phr3.Adian phr.tomato CRY1 ARATH phr.phycom CRY2 ARATH PHH1.arath PHR1 SINAL phr.chlamy PHR ANANI phr.Synsp PHR SYNY3 phr.Theth Rh.caps MTHF type Class I CPD Photolyases 6-4 Photolyases Blue Light Receptors 8-HDF type CPD Photolyases Three Photolyase Homologs inV. cholerae
  • 39. TIGRTIGR MFS phylogenetic tree Bmr Bsu TetB Eco Vmt1 Rno Mmr Sco EmrB Eco QacA Sau Sge1 Sce TetK Sau NarK Bsu NasA Bsu CrnA Eni NapO Ocu Ykh4 Cel Hup1 Cke AraE Eco Itr1 Sce Gtr1 Hsa ProP EcoKgtP Eco CitA Sty HI1104 Hin NanT Eco YjhB Eco Ycy8 Sce YaeC Spo Pho84 Sce UhpT Eco PgtP Sty UhpC Eco GlpT Bsu NupG Eco XapB Eco LacY Eco LacY Kpn RafB Eco CscB Eco YhjX Eco Y38K Tte OxlT Ofo T02G5 Cel XpcT Hsa Mct2 Rno Gal Bab FucP Eco Yhe7 Sce Yhe0 Sce YK86 Sce OFA OHS NHS OPA PHS SHS MHS SPACS NNP DHA14 DHA12 UMF FGHS MCP
  • 40. TIGRTIGR Uses of Phylogenomics II: Knowing when to Not Predict Functions
  • 41. TIGRTIGR DNA Repair Genes in D. radiodurans Complete Genome Process Genes in D. radiodurans Nucleotide Excision Repair UvrABCD, UvrA2 Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths, MPG AP Endonuclease Xth Mismatch Excision Repair MutS, MutL Recombination Initiation Recombinase Migration and resolution RecFJNRQ, SbcCD, RecD RecA RuvABC, RecG Replication PolA, PolC, PolX, phage Pol Ligation DnlJ dNTP pools, cleanup MutTs, RRase Other LexA, RadA, HepA, UVDE, MutS2
  • 42. TIGRTIGR Recombination Genes in Genomes Pathway |------------------------------Bacteria---------------------------| |---Archaea---| Euks Protein Name(s) Initiation RecBCD pathway RecB + + - - - - - - + + - + - - - - - - - - RecC + + - - - - - - + ±+ - ± - - - - - - - - RecD + + - - ± - - - + ±+ - ++ - ± ±+ - - - - - RecF pathway RecF + + + - + - - + + - + ± - - + - - ± ± ± RecJ + + + + + - - + - + + + + + + - - - - - RecO + + - - + - - + + - - - - - ± - - - - - RecR + + + ±+ + - - + + - + + - + + - - - - - RecN + + + + + - - + + - + - ± + + - - ± ± - RecQ + + - - + - - + - - + - - - + - - - - + ++ RecE pathway RecE/ExoVIII + - - - - - - - - - - - - - - - - - - - RecT + - - - + - - - - - - - - - - - - - - - SbcBCD pathway SbcB/ExoI + + - - - - - - - - - - - - - - - - - - SbcC + - - - + - - + - + + - + + + ± ± ± ± ± ± SbcD + - - - + - - + - + + - + + + ± ± ± ± ± ± AddAB Pathway AddA/RexA - - + - + - - - - - + + - ± - - - - - - AddB/RexB - - - - + - - - - - - - - - - - - - - - Rad52 pathway Rad52, Rad59 - - - - - - - - - - - - - - - - - - - ++ + Mre11/Rad32 ± - - - ± - - ± - ± ± - ± ± ± + + + + + + Rad50 ± - - - ± - - ± - ± ± - ± ± ± + + + ± + + Recombinase RecA, Rad51 + + + + + + + + + + + + + + + + + + + ++ ++ Branch migration RuvA + + + + + + + + + + + + + - + - - - - - RuvB + + + + + + + + + + + + + - + - - - - - RecG + + + + + - - + + + + - + + + - - - - - Resolvases RuvC + + + + - - - + + - + + + - + - - - - - RecG + + + + + - - + + + + - + + + - - - - - Rus + - - - - - - - ±+ - - - - ±+ - - - - - - CCE1 - - - - - - - - - - - - - - - - - - - + Other recombination proteins Rad54 - - - - - - - - - - - - - - - - - - - + + Rad55 - - - - - - - - - - - - - - - - - - - + + Rad57 - - - - - - - - - - - - - - - - - - - + + Xrs2 - - - - - - - - - - - - - - - - - - - +
  • 43. TIGRTIGR Unusual Features of D. radiodurans DNA Repair Genes Process Genes Nucleotide excision repair Two UvrAs Base excision repair Four MutY-Nths Recombination RecD but not RecBC Replication Four Pol genes dNTP pools Many MutTs, two RRases Other UVDE
  • 44. TIGRTIGR Problem: List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other bacterial genomes of the similar size
  • 45. TIGRTIGR Repair Studies in Different Species (determined by Medline searches as of 1998) Humans 7028 E. coli 3926 S. cerevisiae 988 Drosophila 387 B. subtilits 284 S. pombe 116 Xenopus 56 C. elegans 25 A. thaliana 20 Methanogens 16 Haloferax 5 Giardia 0
  • 46. TIGRTIGR -Ogt -RecFRQN -RuvC -Dut -SMS -PhrI -AlkA -Nfo -Vsr -SbcCD -LexA -UmuC -PhrI -PhrII -AlkA -Fpg -Nfo -MutLS -RecFORQ -SbcCD -LexA -UmuC -TagI -PhrI -Ogt -AlkA -Xth -MutLS -RecFJORQN -Mfd -SbcCD -RecG -Dut -PriA -LexA -SMS -MutT -PhrI -PhrII? -AlkA -Fpg -Nfo -RecO -LexA -UmuC -PhrI -Ung? -MutLS -RecQ? -Dut -UmuC -PhrII -Ogg -Ogt -AlkA -TagI -Nfo -Rec -SbcCD -LexA -Ogt -AlkA -Nfo -RecQ -SbcD? -Lon -LexA -AlkA -Xth -Rad25? -AlkA -Rad25 -Nfo -Ogt -Ung -Nfo -Dut -Lon -Ung -PhrII -PhrI Ecoli Haein Neigo Helpy Bacsu Strpy Mycge Mycpn Borbu Trepa Synsp Metjn Arcfu Metth Human Yeast BACTERIA ARCHAEA EUKARYOTES from mitochondria +Ada +MutH +SbcB dPhr +TagI? +Fpg +UvrABCD +Mfd +RecFJNOR +RuvABC +RecG +LigI +LexA +SSB +PriA +Dut? +Rus +UmuD +Nei? +RecE tRecT? +Vsr +RecBCD? +RFAs +TFIIH +Rad4,10,14,16,23,26 +CSA +Rad52,53,54 +DNA-PK, Ku dSNF2 dMutS dMutL dRecA +Rad1 +Rad2 +Rad25? +Ogg +LigII +Ung? +SSB, +Dut? +PhrI, PhrII +Ogt +Ung, AlkA, MutY-Nth +AlkA +Xth, Nfo? +MutLS? +SbcCD +RecA +UmuC +MutT +Lon dMutSI/MutSII dRecA/SMS dPhrI/PhrII +Spr t3MG +Rad7 +CCE1 +P53 dRecQ dRad23 +MAG? -PhrII -RuvC tRad25 +TagI? +RecT tUvrABCD tTagI ? Gain and Loss of Repair Genes TIGRTIGR
  • 47. TIGRTIGR Evolution of Uracil Glycosylase • Ung activity has evolve many times (many non- homologous proteins have uracil-DNA glycosylase activity) • Therefore, absence of homologs of these genes should not be used to infer likely absence of activity • However, presence of homologs of Ung and MUG genes can be used to indicate presence of activity because all homologs of these genes have this activity
  • 48. TIGRTIGR Evolution of Photoreactivation • All known enzymes that perform photoreactivation are part of a single large photolyase gene family • Some members of the family do not function as photolyases, but instead work as blue-light receptors • If a species does not encode a member of the photolyase gene family, it likely does not have photoreactivation capability • If a species encodes a photolyase, one cannot conclude it has photolyase activity • Position of photolyase homologs within photolyase tree helps predict what activities they have
  • 49. TIGRTIGR Evolution of Alkyltransferases • All known alkyltransferases share a conserved, homologous alkyltransferase domain • Therefore, if a species does not encode any protein with this domain, it likely does not have alkyltransferase activity • If a species does encode an member of this gene family, it likely has alkyltransferase activity
  • 50. TIGRTIGR Uses of Phylogenomics III: Gene Duplication
  • 51. TIGRTIGR Why Duplications Are Useful to Identify • Allows division into orthologs and paralogs • Aids functional predictions • Recent duplications may be indicative of species’ specific adaptations • Helps identify mechanisms of duplication • Can be used to study mutation processes in different parts of genome
  • 52. TIGRTIGR MurA Homologs in A. thaliana A RA TH I F5F19.7 A R A TH II F26C24.7 A RA TH IV F7N22.13 A R A TH IV T3H13.11 A R A TH II F23M 2.29 A RA TH II T13H18.11 A RA TH IV T24H24.6 A R A TH IV T3H13.13A R A TH I F22O13.23 A R A TH II F9B22.14 A RA TH II F27C21.3 A R A TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4 A R A TH II T13E11.2 A RA TH IV T24M 8.2 A R A TH IV F7N22.10A R A TH IV T3F12.3 A R A TH II T13E11.15 A R A TH IV T7M 24.1 A R A TH IV T3F12.8 A RA TH V T21B04.1 A R A TH II F27L4.10 A R A TH II F26B6.15 A RA TH II F23M 2.24 A R A TH I F1N21.16 A R A TH IV F9D12.2 A RA TH II F9B22.8 A R A TH IV F28J12.70 A RA TH IV T3F12.12 A RA TH II T13P21.20 A R A TH II T13E11.10 A RA TH V T21B04.16 A R A TH V T19K24.12 A R A TH V T19K24.13 A R A TH V T19K24.17 A R A TH V T21B04.11 A R A TH V T21B04.14 A R A TH V T21B04.10 A R A TH II T13P21.21 A R A TH V T21B04.13 A R A TH V T21B04.12 A RA TH II T13P21.3 A R A TH V T19K24.15 A RA TH V T19K24.16 A R A TH II T13E11.20 A RA TH V T19K24.11 A R A TH II T13E11.21 A R A TH II T13E11.9 A R A TH V T19K24.10 A RA TH V T21B04.15 A RA TH V T19K24.14 A R A TH II T11J7.3
  • 53. TIGRTIGR MurA Homologs in A. thaliana colored by chromosome A R A TH I F5F19.7 A RA TH I F22O13.23 A R A TH I F1N21.16 A RA TH V T21B04.1 A RA TH V T21B04.16 A R A TH V T19K24.12 A R A TH V T19K24.13 A RA TH V T19K24.17 A RA TH V T21B04.11 A R A TH V T21B04.14 A R A TH V T21B04.10 A RA TH V T21B04.13 A RA TH V T21B04.12 A RA TH V T19K24.15 A R A TH V T19K24.16 A R A TH V T19K24.11 A R A TH V T19K24.10 A RA TH V T21B04.15 A R A TH V T19K24.14 A R A TH IV F7N22.13 A RA TH IV T3H13.11 A RA TH IV T24H24.6 A R A TH IV T3H13.13 A R A TH IV T24M 8.2 A R A TH IV F7N22.10 A R A TH IV T3F12.3 A RA TH IV T7M 24.1 A R A TH IV T3F12.8 A RA TH IV F9D12.2 A R A TH IV F28J12.70 A R A TH IV T3F12.12 A R A TH II F26C24.7 A R A TH II F23M 2.29 A R A TH II T13H18.11 A R A TH II F9B22.14 A R A TH II F27C21.3 A RA TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4A R A TH II T13E11.2 A R A TH II T13E11.15 A R A TH II F27L4.10 A R A TH II F26B6.15 A RA TH II F23M 2.24 A R A TH II F9B22.8 A RA TH II T13P21.20 A R A TH II T13E11.10 A R A TH II T13P21.21 A R A TH II T13P21.3 A RA TH II T13E11.20 A RA TH II T13E11.21 A R A TH II T13E11.9 A RA TH II T11J7.3
  • 54. TIGRTIGR Recent Duplications • Gene duplication is frequently accompanied by functional divergence • Evolutionary analysis can identify recent duplications with no bias towards type of gene • Location of duplicates can help identify mechanisms of duplication
  • 56. TIGRTIGR Expansion of MCP Family in V. cholerae E.coligi1787690 B.subtilisgi2633766 Synechocystissp. gi1001299 Synechocystissp. gi1001300 Synechocystissp. gi1652276 Synechocystissp.gi1652103 H.pylori gi2313716 H.pylori99 gi4155097 C.jejuniCj1190c C.jejuniCj1110c A.fulgidusgi2649560 A.fulgidusgi2649548 B.subtilisgi2634254 B.subtilisgi2632630 B.subtilisgi2635607 B.subtilisgi2635608 B.subtilisgi2635609 B.subtilisgi2635610 B.subtilisgi2635882 E.coligi1788195 E.coligi2367378 E.coligi1788194 E.coligi1789453 C.jejuniCj0144 C.jejuniCj0262c H.pylori gi2313186 H.pylori99 gi4154603 C.jejuniCj1564 C.jejuniCj1506c H.pylori gi2313163 H.pylori99 gi4154575 H.pylori gi2313179 H.pylori99 gi4154599 C.jejuniCj0019c C.jejuniCj0951c C.jejuniCj0246c B.subtilisgi2633374 T.maritima TM0014 T.pallidumgi3322777 T.pallidumgi3322939 T.pallidumgi3322938 B.burgdorferi gi2688522 T.pallidumgi3322296 B.burgdorferi gi2688521 T.maritima TM0429 T.maritima TM0918 T.maritima TM0023 T.maritima TM1428 T.maritima TM1143 T.maritima TM1146 P.abyssiPAB1308 P.horikoshiigi3256846 P.abyssiPAB1336 P.horikoshiigi3256896 P.abyssiPAB2066 P.horikoshiigi3258290 P.abyssiPAB1026 P.horikoshiigi3256884 D.radiodurans DR A00354 D.radiodurans DRA0353 D.radiodurans DRA0352 P.abyssiPAB1189 P.horikoshiigi3258414 B.burgdorferi gi2688621 M.tuberculosisgi1666149 V .c hole ra eV C0 5 1 2 V . c hol e ra eV CA1 0 3 4 V .c hole ra eV CA 0 9 7 4 V .c hole raeV CA 0 06 8 V . chol e ra eV C0 8 2 5 V . c hol e ra eV C0 28 2 V .c hol e raeV CA 0 9 0 6 V . chol e ra eV CA0 9 7 9 V .c hol e raeV CA 1 0 5 6 V . c hol e ra eV C1 64 3 V . c hol e ra eV C2 1 6 1 V .c hole ra eV CA 09 2 3 V .c hole raeV C0 5 1 4 V . c hol e ra eV C1 8 6 8 V . c hol era eV CA0 7 7 3 V .c hole raeV C1 3 1 3 V . c hol era eV C1 8 5 9 V . c hole ra eV C14 1 3 V .c hol e raeV CA 0 2 6 8 V .c hol e raeV CA0 6 5 8 V . c hole ra eV C14 0 5 V . c hol e ra eV C1 2 9 8 V . c hol e ra eV C1 2 4 8 V . c hol era eV CA0 8 6 4 V . c hole ra eV CA0 1 7 6 V. c hol e ra eV CA0 2 2 0 V .c hole ra eV C1 2 8 9 V .c hole ra eV CA 10 6 9 V . c hol e ra eV C2 43 9 V . chol e ra eV C1 9 6 7 V . chol e ra eV CA0 0 3 1 V . c hole ra eV C18 9 8 V . chol e ra eV CA0 6 6 3 V .c hole ra eV CA 0 9 8 8 V . c hol era eV C0 2 1 6 V . c hol era eV C0 4 4 9 V .c hole ra eV CA 0 0 0 8 V . c hole ra eV C14 0 6 V . chol e ra eV C1 5 3 5 V .c hole ra eV C0 8 4 0 V . c hol e raeV C0 0 98 V .c hole ra eV CA 1 0 9 2 V .c hole ra eV C1 4 0 3 V .c hole ra eV CA1 0 8 8 V . c hol e ra eV C1 3 9 4 V .c hole ra eV C0 6 2 2 NJ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * *
  • 58. TIGRTIGR Levels of Paralogy Within A Genome • All – All members of a gene family are linked together • Top matches – Only top matching pairs are linked together. Therefore, if in a large gene family, only the pair from the most recent duplication event is included • Recent – Operational definition based on comparison to other species. Only pairs which are more similar to each other than to selected other species are included. TIGRTIGR
  • 59. TIGRTIGR C. pneumoniae Paralogs - All 0 250000 500000 750000 1000000 1250000 SubjectOrfPosition 0 250000 500000 750000 1000000 1250000 Query Orf Position TIGRTIGR
  • 60. TIGRTIGR C. pneumoniae Paralogs - Top 0 250000 500000 750000 1000000 1250000 SubjectOrfPosition 0 250000 500000 750000 1000000 1250000 Query Orf Position TIGRTIGR
  • 61. TIGRTIGR C. pneumoniae Paralogs – Recent 0 250000 500000 750000 1000000 1250000 SubjectOrfPosition 0 250000 500000 750000 1000000 1250000 Query Orf Position TIGRTIGR
  • 62. TIGRTIGR E. coli Paralogs - All 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 SubjectCoordinates 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Query Coordinates TIGRTIGR
  • 63. TIGRTIGR E. coli Paralogs - Top 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 SubjectCoordinates 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Query Coordinates TIGRTIGR
  • 64. TIGRTIGR E. coli Paralogs - Recent 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 SubjectCoordinates 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 Query Coordinates TIGRTIGR
  • 65. TIGRTIGR 0 1000 2000 3000 4000 ChromosomePositionofRecentDuplicate 0 1000 2000 3000 4000 Chromosome Position ofQuery Recent Duplications in N. meningitidis TIGRTIGR
  • 66. TIGRTIGR 0 500000 1000000 1500000 Query ORF Chromosome Position C. pneumoniae AR39 BestMatchChromosomePosiion 0 500000 1000000 C. trachomatis MoPn Query ORF Chromosome Position BestMatchChromosomePosiion A. B. 0 500000 1000000 1500000 0 500000 1000000
  • 67. TIGRTIGR Uses of Phylogenomics IV: Genetic Exchange within Genomes
  • 69. TIGRTIGR D. radiodurans Transposase Family DEIRA_ORF01427_transposase__ps DEIRA_ORF01431_transposase_{Sy DEIRA_ORF03257_transposase_{Sy DEIRA_ORFB01001_transposase__p DEIRA_ORFB01020_transposase_{S DEIRA_ORFB01025_transposase_{S DEIRA_ORFB01012_transposase_{S DEIRA_ORFB01035_transposase_{S DEIRA_ORFC0021_transposase_{Sy DEIRA_ORFC0025_hypothetical_pr DEIRA_ORFC0018_transposase__ps ORFB ORF0 ORFC
  • 72. TIGRTIGR Why Gene Loss is Useful to Identify • Indicates that gene is not absolutely required for survival • Helps distinguish likelihood of gene transfers • Correlated loss of same gene in different species may indicate selective advantage of loss of that gene • Correlated loss of genes in a pathway indicates a conserved association among those genes
  • 73. TIGRTIGR EuksArch Bacteria Loss Evolutionary O rigin of Gene MT MJ SC HS AA DR TA BS MG MP BB TP HP HI EC SS MT Presence ( ) or Absence of Gene Species Abbreviation Kingdom Example of Tracing Gene Loss TIGRTIGR
  • 75. TIGRTIGR Need for Phylogenomics Example: Gene Duplication and Loss • Genome analysis required to determine number of homologs in different species • Evolutionary analysis required to divide into orthology groups and identify gene duplications • Genome analysis is then required to determine presence and absence of orthologs • Then loss of orthologs can be traced onto evolutionary tree of species
  • 76. TIGRTIGR Uses of Phylogenomics VII: Specialization
  • 78. TIGRTIGR Species Distribution of Homologs of D. radiodurans Genes 0 10 20 30 40 50 60 0 5 10 15 20 0 50 100 150 0 5 10 15 20 NumberofSpeciesWithHighHits 0 50 100 150 200 250 Frequency 0 5 10 15 20 PapaBear MamaBear BabyBear 0 100 200 300 400 500 0 5 10 15 20 E.coli
  • 79. TIGRTIGR Megaplasmid I: Iron Utilization/Iron Transport ORFB040 Na+/H+ antiporterORFB040 Na+/H+ antiporter ORFB042 iron ABC transporter, ATP-binding proteinORFB042 iron ABC transporter, ATP-binding protein ORFB044 iron ABC transporter, permease proteinORFB044 iron ABC transporter, permease protein ORFB045 iron ABC transporter, permease proteinORFB045 iron ABC transporter, permease protein ORFB046 iron-chelator utilization proteinORFB046 iron-chelator utilization protein ORFB047 iron ABC transporter, periplasmic substrate bpORFB047 iron ABC transporter, periplasmic substrate bp ORFB067 putative metal binding proteinORFB067 putative metal binding protein ORFB141 iron-chelator utilization proteinORFB141 iron-chelator utilization protein ORFB074 hemin ABC transporter, periplasmic hemin bpORFB074 hemin ABC transporter, periplasmic hemin bp ORFB075 hemin ABC transporter, permease proteinORFB075 hemin ABC transporter, permease protein ORFB076 hemin ABC transporter, ATP-binding proteinORFB076 hemin ABC transporter, ATP-binding protein
  • 80. TIGRTIGR Specialized Genetic Elements (Chromosome II and Megaplasmid) • Many two component systems • Nitrogen metabolism • LexA • Ribonucleotide reductase • UvrA2 • Many transcription factors (e.g., HepA) • Iron metabolism
  • 81. TIGRTIGR Uses of Phylogenomics VIII: Comparison of Closely Related Genomes
  • 82. TIGRTIGR V. cholerae vs. E. coli All Hits 0 1000000 2000000 3000000 4000000 5000000E.coliCoordinates 0 1000000 2000000 3000000 V. cholerae CoordinatesTIGRTIGR
  • 83. TIGRTIGR V. cholerae vs. E. coli Top Hits 0 1000000 2000000 3000000 4000000 5000000 E.coliCoordinates 0 1000000 2000000 3000000 V. cholerae CoordinatesTIGRTIGR
  • 84. TIGRTIGR V. cholerae vs. E. coli Only if EC-Orf is Closest in All Genomes 0 1000000 2000000 3000000 4000000 5000000 E.coliCoordinates 0 1000000 2000000 3000000 V. cholerae Coordinates TIGRTIGR
  • 85. TIGRTIGR V. cholerae vs. E. coli Proteins Top 0 1000000 2000000 3000000 4000000 V. cholerae ORF Coordinates
  • 86. TIGRTIGR V. cholerae vs. E. coli F+R 0 1000000 2000000 3000000 4000000 5000000 Bert Ecoli R Ecoli
  • 87. TIGRTIGR S. pneumoniae vs. S. pyogenes DNA F+R 0 500000 1000000 1500000 2000000 BSP vs Spyo
  • 88. TIGRTIGR M. tuberculosis vs. M. leprae DNA 0 1000000 2000000 3000000 4000000 M1
  • 89. TIGRTIGR C. trachomatis MoPn C.pneumoniaeAR39 Origin Termination C. trachomatis vs C. pneumoniae Dot Plot
  • 90. TIGRTIGR Duplication and Gene Loss Model A B CD E F A B CD E F A B C D E F A B C D E F A ’ B’ C’ D’ E’ F ’ A B C D E F A ’ B’ C’ D’ E’ F’ A C D F A ’ B’ E’ E. coli E. coli B C D F A ’ B’ D’ E’ V. cholerae A B C D E F A ’ B’ C’ D’ E’ F’
  • 91. TIGRTIGR B1 A1 B2 A2 B3 A3 A2 A1 A2 A3 B2 B1 B3 B2 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 30 1 2 3 4 5 3132 B1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 B3 24 23 22 21 20 19 18171615 14 13 12 11 10 9 6 7 258 26 27 28 29 3 3231 30 4 5 2 1 A1 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A2 3132 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A3 2 6 7 8 9 10 11 12 13 19 18171615 14 20 21 22 23 24 25 26 27 5 4 3 31 30 29 28 1 32 B2 Inversion A round Terminus (*) Inversion A round Terminus (*) Inversion A round Origin (*) Inversion A round Origin (*) * * * * * * * * Figure 4 C ommon Ancestor of A and B 3132 6 7 8 9 10 11 12 13 14 15161718 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132
  • 92. TIGRTIGR M. tuberculosis strain phylogeny (Indels)
  • 93. TIGRTIGR Musser-Type Evolution (Indel Phylogeny) 98a 107a 43a 73a 105a 133a 114a 169a 218a 290a 160a 159a 13a 18a 26a 30a 32a 53a 58a 70a 96a 97a 100a 124a 204a 208a 236a 239a 249a 286a 99a 279a 205a 304a 54a 155a 165a CDC1551a 223a 110a 122a 245a 313a 36a 40a 71a 79a 168a 254a 283a 312a 4a 12a 41a 42a 52a 77a 187a 214a 81a 129a 274a 220a 64a 48a 55a 60a 72a 80a 83a 85a 89a 91a 95a 111a 170a 171a 182a 212a 219a 225a 244a 278a 301a 195a 2a 123a 207a 306a 69a 94a 101a 102a 112a 113a 121a 132a 211a 222a 235a 250a 284a 285a N1a 87a 117a 120a 136a 191a 237a 261a 37a 131a 269a 240a 63a 197a 206a 75a 108a 263a 128a 172a 162a 86a 38a 109a 119a 248a 6a 65a 68a 189a 66a 106a 227a 31a 78a 202a 213a 62a 163a 224a 256a 276a 287a 173a 291a 252a 281a 295a 310a 251a 151a 188a 292a 140a 141a 103a 174a 229a 259a H37Rv 88a 44a 74a 76a 126a 282a 166a 210a 84a
  • 94. TIGRTIGR Consistency Indices (Indel Phylogeny) Calculated over stored trees CI 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 maximum average minimum 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 Character
  • 95. TIGRTIGR M. tuberculosis strain phylogeny (Indels/SNPs)
  • 97. TIGRTIGR Consistency Indices (Combined Phylogeny) Calculated over stored trees CI 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 maximum average minimum 2 9 0 4 3 1 3 5 3 1 9 4 4 0 9 0 4 2 3 9 4 2 4 6 4 5 8 9 5 1 9 8 4 2 5 1 1 0 4 9 2 4C 6 6B 7 8C 9B 11B 14 15 15B 18C 4 8 12 16 18 M U S S E R S m e a r Si te 2 1 3 4 Character
  • 98. TIGRTIGR Uses of Phylogenomics VI: Horizontal Gene Transfer and Species Evolution
  • 100. TIGRTIGR Examples of Horizontal Transfers • Antibiotic resistance genes on plasmids • Insertion sequences • Pathogenicity islands • Toxin resistance genes on plasmids • Agrobacterium Ti plasmid • Viruses and viroids • Organelle to nucleus transfers
  • 101. TIGRTIGR Why Gene Transfers Are Useful to Identify • Laterally transferred genes frequently involved in environmental adaptations and/or pathogenicity • Helps identify transposons, integrons, and other vectors of gene transfer • Helps identify species associations in the environment •
  • 102. TIGRTIGR Steps in Lateral Gene Transfer 1 2 3-5 6 A B C D
  • 103. TIGRTIGR How to Infer Gene Transfers • Unusual distribution patterns • Unusual nucleotide composition • High sequence similarity to supposedly distantly related species • Unusual gene trees • Observe transfer events
  • 104. TIGRTIGR Inferring Lateral Transfers Observation Other Causes Always Occurs Unusual Distribution Sampling bias Not if recipient already has gene. Unusual GC/Codons Selection Not if donor/recipient similar. Not if it occurred long ago. High hit to "distant" species Selection Rate variation Gene loss Usually. Incongruent trees Bad trees Missed paralogs Usually. Correlation of above with neighbors Selection Only if genes keep order after transfer.
  • 105. TIGRTIGR E. coli and S. typhimurium Transfer E. coli S. typhimurium Old Model E. coli S. typhimurium New Model
  • 106. TIGRTIGR PGKPGK Neighbor-joining;Neighbor-joining; bootstrap;bootstrap; 50% majority rule50% majority rule consensusconsensus outgroup = Archaeaoutgroup = Archaea T. maritima M. genitalium M. pneumoniae A. aeolicus B. burgdorferi T. pallidum B. subtilis Synechocystis E. coli H. influenzae H. pylori M. tuburculosis S. cerevisiae A. fulgidus M. jannascii M. thermoauto P. horikoshii 89 57 100 59 58 58 100 83 100 B. subtilis S. cerevisiae T. maritima H. pylori M. pneumoniae M. genitalium Synechocystis B. burgdorferi T. pallidum P. horikoshii M. jannascii M. thermoautoA. fulgidus A. aeolicus H. influenzae E. coli M. tuburculosis
  • 107. TIGRTIGR Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes** Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%) Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%) SynechocystisSynechocystis sp.sp. 126 (4%)126 (4%) Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%) Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%) ** 1010-5-5 over 60% of sequenceover 60% of sequence
  • 108. TIGRTIGR Evidence for lateral gene transfer inEvidence for lateral gene transfer in ThermotogaThermotoga 1. 81 archaeal-like genes are clustered in 15 regions which range in size from ~ 4 to 20 kb; many share conserved gene order with their archaeal counterparts. 2. Many of the archaeal-like genes correspond to regions with a significantly different base composition than the rest of the chromosome. 3. Some of these regions are associated with a 30 bp repeat structure found only in thermophiles. 4. Initial phylogenetic analyses of some of these genes lends support to lateral gene transfer.
  • 109. TIGRTIGR 0987 09900989ThermotogaThermotoga ORFORF Archaea homologArchaea homolog Bacterial homologBacterial homolog Eukaryote homologEukaryote homolog ThermotogaThermotoga ORFORF Archaea homologArchaea homolog Bacterial homologBacterial homolog Eukaryote homologEukaryote homolog 0988 0991 0992 0993 0994 0995 0996 0997 0998 0999 1000 10021001 1003 Region TM00987 - TM1003 ( 21kb Archaea-like stretch)Region TM00987 - TM1003 ( 21kb Archaea-like stretch) 79% 69% 69% 72% 72% 69% 65%61% 78% 72% TransposonTransposon 54% 48% 68% 51% 73% 73% Regulatory proteinRegulatory protein
  • 110. TIGRTIGR Species Distribution of Top Hits: A. thaliana Chr II 0 250 500 750 1000 TopHits EAB Syn. sp
  • 111. TIGRTIGR A. thaliana T1E2.8 is a Chloroplast Derived HSP60 AR ATH -T1E2.8********** ECOL HAEIN VIBCH VIBCH RICPR YEAST CHLPN CHLTR AQUAE CAMJE HELPY BBUR TREPA THEMA BACSU DEIRA MCYTU MCYTU SYNSP SYNSP ODONT CPST MYCGE MYCPN CHLPN CHLTR CHLPN CHLTR ARCFU ARCFU METJA PYRHO METTH METTH YEAST YEAST YEAST YEAST CELEG YEAST YEAST YEAST CELEG YEAST YEAST CELEG YEAST CELEG CELEG Eukarya Archaea Bacteria Cyano/Cpst
  • 112. TIGRTIGR ParA Phylogeny pOMB25.Bor BBl32.Borb Borbu3 Borbu.2 BBM32.Borb CP32-6.Bor BBA20.Borb Cp18.Borbu pOMB10.Bor pLp7E.Borb BBE19.Borb BBB12.Borb BBN32.Borb BBF13.Borb BBH28.Borb BBK21.Borb BBU05.Borb BBJ17.Borb BBQ08.Borb BBF24.Borb OrfC.Borbu BBG08.Borb Pyrab Pyrho YZ24 METJA IncC1.Enta IncC2.Enta INC1 ECOLI INC2 ECOLI Orf.pRK2 IncC.pRK2 pM3.ParA ORF3.Pseae ORFB.Psepu 2603.Vibch***** ParA.Strco Strco2 Strco3 Myctu4 Mycle3 Deira.Chro Soj.Trepa SOJ BACSU Ricpr YGI1 PSEPU ParA.Caucr pAG1.Corgl Mycle Mycle2 Rv1708.Myc Strco Rv3213.Myc Helpy99 Helpy26695 A00900.Vib***** ParB.pR27. ParA.pMT1. parA.pMT1 parA.phage ParA phage ORFA00900 SOPA ECOLI F-Plasmid PhageN13 pCD1.Yerpe pCD1#2.Yer pYVe227.Ye pNL1.Sphar pQPH1.Coxb p42d.Rhile p42d.Rhiet REPA AGRRA pRiA4b.Agr pTiB6S3.Ag pTi-SAKURA pRL8JI.Rhi Y4CK Plasm ParA.Raleu pL6.5.Psef Chr2.Deira MP1#2.Deir MP1.Deira PX02.Bacan ORF298.Clo SojC.Halsp Borbu4 sojD.Halsp plasmid.St SojB.Halsp ParA.Rhoer SOJ MYCPN SOJ MYCGE MinD2.Pyra Pyrho2 pK214.Lacl PatA.synsp Deira.ParA pCHL1.Chlt2 GP5D CHLTR pCHL1.Chlt Chltr Chlps Chlps2 Chlpn Chltr2 Chlpn2 Chromosomal Plasmid and Phage BBQ08.Borb Chlamydial Inc Borrelia Plasmids Archaea Misc Evolution of Chromosome Partitioning Proteins (ParA)
  • 113. TIGRTIGR 0 0.1 0.2 0.3 0.4 C B A 0 Best Matches by Genetic Element (D. radiodurans)
  • 114. TIGRTIGR N. meningitidis hits vs. genome size 0.0 250.0 500.0 750.0 1000.0 1250.0 Number ofN. meningitidis ORFs that have a significant hit 0.01000.02000.03000.04000.05000.0 TotalORFsinGenome Proteome Comparison ofN. meningitidis to other Complete Genomes Archaea H. influenzae V. cholerae E. coli
  • 116. TIGRTIGR Reconciling a Tree of Life in the Context of Lateral Gene Transfer
  • 117. TIGRTIGR rRNA Tree of Complete Genomes Mycobacterium tuberculosis Bacillus subtilis Synechocystis sp. Caenorhabditis elegans Drosophila melanogaster Saccharomyces cerevisiae Methanobacterium thermoautotrophicum Archaeoglobus fulgidus Pyrococcus horikoshii Methanococcus jannaschii Aeropyrum pernix Aquifex aeolicus Thermotoga maritima Deinococcus radiodurans Treponema pallidum Borrelia burgdorferi Helicobacter pylori Campylobacter jejuni Neisseria meningitidis Escherichia coli Vibrio cholerae Haemophilus influenzae Rickettsia prowazekii Mycoplasma pneumoniae Mycoplasma genitalium Chlamydia trachomatis Chlamydia pneumoniae 0.05 changes Archaea Bacteria Eukarya
  • 119. TIGRTIGR rRNA vs. Whole Genome Trees Mycobacterium tuberculosis Bacillus subtilis Synechocystis sp. Caenorhabditis elegans Drosophila melanogaster Saccharomyces cerevisiae Methanobacterium thermoautotrophicum Archaeoglobus fulgidus Pyrococcus horikoshii Methanococcus jannaschii Aeropyrum pernix Aquifex aeolicus Thermotoga maritima Deinococcus radiodurans Treponema pallidum Borrelia burgdorferi Helicobacter pylori Campylobacter jejuni Neisseria meningitidis Escherichia coli Vibrio cholerae Haemophilus influenzae Rickettsia prowazekii Mycoplasma pneumoniae Mycoplasma genitalium Chlamydia trachomatis Chlamydia pneumoniae 0.05 changes Archaea Bacteria Eukarya
  • 121. TIGRTIGR rRNA Suggested Deinococcus-Thermus Relationship From Embley et al. Syst. Appl. Microbiol. 16: 25-29 1993
  • 122. TIGRTIGR Serratia marcescens Proteus mirabilis Proteus vulgaris Escherichia coli Erwinia carotovora Yersinia pestis Enterobacter agglomerans Vibrio anguillarum Vibrio cholerae Haem ophilus influenzae Pseudomonasfluorescens Pseudomonasputida Pseudomonasaeruginosa Azotobacter vinelandii Acinotobactercalcoaceticus Methylophilusmethylotrophus Methylomonasclara Methylobacillusflagellatum Burkholderia cepacia Bordetella pertussis Xanthomonas oryzae Legionella pneumophila Acidiphilum facilis Thiobacillus ferrooxidans Neisseria gonorrhoeae Rhizobium viciae Myxococcus xanthus1 Myxococcus xanthus2 Campylobacter jejuni StreptomycesviolaceusStreptomyceslividans Streptomycesambofaciens Mycobacteriumleprae Mycobacteriumtuberculosis Corynebacteriumglutamicum Arabidopis thaliana CPST Synechococcussp.PCC7002 Synechococcussp.PCC7942 Anabaenavariabilis Thermotoga maritima Lactococcuslactis Streptococcuspneumoniae Staphylococcusaureus Bacillussubtilis Acholeplasm a laidlawii Borrelia burgdorferi Mycoplasma pulmonis Mycoplasma mycoides Bacteroides fragilis Chlaymida trachomatis Thermus thermophilus Thermus aquaticus Deinococcus radiodurans Aquifex pyrophilus 0.10 α γ1 γ2 β Gram '+' High GC Cyanobacteria Gram '+' Low GC D/T Magnetospirillum magnetotacticum Helicobacter pylori ε δ 95 98 79 100 100 100 90 63 100 94 84 100 95 10088 93 91 75 100 100 100 100 100 8398 100 100 100 Rhizobium phaseoli Agrobacterium tumefaciens Rhizobium meliloti Brucella abortus Rhodobacter sphaeroides Rhodobacter capsulatus Rickettsia prowazekii Acetobacter polyoxogenes 72 97 78 100 71 100 100 77 88 100 61 55 54 48 49 42 48 46 50 63 46 100 40
  • 123. TIGRTIGR Deinococcus-Thermus Comparison • Took all available T. thermophilus proteins • Searched against database of all available complete genomes (including D. radiodurans) • Identified gene with highest fasta p value • Phylogenetic analysis of all genes with >4 homologs
  • 124. TIGRTIGR Other Bacteria Archaea D. radiodurans Top Hits of T. thermophilus Proteins
  • 125. TIGRTIGR Significance of Deinococcus- Thermus Relationship • Mechanisms of extreme heat, radiation, and desiccation resistance may be similar • Complete genome of Thermus will be very useful in identifying novel genes in Deinococcus • Shows utility of incomplete genome sequences.
  • 126. TIGRTIGR Outline of Phylogenomics Gene Evolution Events Phenotype Predictions Database Species tree Presence/AbsenceGene trees Congruence Evol. Distribution F(x) Predictions Pathway Evolution TIGRTIGR
  • 127. TIGRTIGR Steps in Phylogenomic Analysis • Create database of genes of interest • Presence/absence of homologs in complete genomes • Phylogenetic trees of each gene family • Infer evolutionary events (gene origin, duplication, loss and transfer) • Refine presence/absence (orthologs, paralogs, subfamilies) • Functional predictions and functional evolution • Analysis of pathways
  • 128. TIGRTIGR Phylogenomics I: Presence/Absence of Homologs • Important to have complete genomes • Similarity searches with high “homology threshold” (to prevent false positives) • Iterative searches (to prevent false negatives) • Multiple sequence alignments to confirm assignment of homology and to divide up multi-domain proteins
  • 129. TIGRTIGR Phylogenomics II: Phylogenetic Analysis of Homologs • Multiple sequence alignment • Mask alignment (exclude certain regions) – ambiguous regions of alignment – hypervariable regions and regions with large gaps • Phylogenetic tree with method of choice • Robustness checks – bootstrapping – compare trees with different alignments – compare trees with different tree-building methods
  • 130. TIGRTIGR Phylogenomics III: Inferring Evolutionary Events • Infer evolutionary distribution patterns (overlay presence/absence onto species tree) • Compare gene tree vs. species tree • Compare gene tree vs. evolutionary distribution • Infer gene duplication and transfer events • Combine gene transfer and duplication information with evolutionary distribution analysis to infer gene loss, gene origin, and timing of gene duplications and transfers
  • 131. TIGRTIGR Phylogenomics IV: Functional Predictions and Evolution • Overlay experimentally determined functions onto gene tree • Infer changes in function – many changes suggests caution should be used in making new predictions • Predict functions based on position in tree relative to genes with known functions and based on orthology groups
  • 132. TIGRTIGR Phylogenomics V: Pathway Analysis • Correlated presence/absence of all genes in pathway in different species? – If not, maybe non-orthologous gene displacement – Alternatively, pathway may be different between species • Correlated evolutionary events for genes in pathway – loss of all genes at once – correlated duplications? • Compare evolution of function between pathways – The number of times an activity has evolved helps in making predictions of function/phenotype
  • 133. TIGRTIGR Evolution as a Screening Method • Gene duplications • Gene loss • Lateral gene transfers • Organellar genes • Structurally constrained genes • Correlated evolutionary changes
  • 134. TIGRTIGR Evolutionary Genome Scanning • Distribution patterns/phylogenetic profiles • Patterns of evolution – (ds/dn) – Structurally constrained genes – Correlated evolutionary changes • Lateral gene transfers – Organellar genes – Pathogenicity islands • Subdividing gene families – Orthologs vs paralogs – Functional predictions – Subfamilies – Motif identification • Gene duplications • Gene loss
  • 135. TIGRTIGR Genome Sequences Allow “Hypothesisless Research” • DNA microarrays • Proteomics • GC skew and other nucleotide composition analyses • Parallel genome wide genetic experiments • Evolutionary genome scanning • Phylogenetic profiles
  • 136. TIGRTIGR Evolutionary Diversity Still Poorly Represented in Complete Genomes Tmf-penden R-rubrum3 Azs-brasi2 Rm-vanniel Rhb-legum8 Bdr-japoni Spg-capsul Ric-prowaz Ste-maltop Spr-voluta Rub-gelat2 Rcy-purpur Nis-gonor1 Hrh-halch2 Alm-vinosm Ps-aerugi3 E-coliMyx-xanthu Bde-stolpiDsv-desulfDsb-postgaC-leptum C-butyric4 C-pasteuri Eub-barker C-quercico Hel-chlor2 Acp-laidla M-capricol C-ramosum B-stearoth Eco-faecal Lis-monoc3 B-cereus4 B-subtilis Stc-therm3 L-delbruck L-casei Fus-nuclea Glb-violac Olst-lut_CZeamaysC Nost-muscr Syn-6301 Tnm-lapsum Flx-litora Cy-lytica Emb-brevi2 Bac-fragil Prv-rumcol Prb-difflu Cy-hutchin Flx-canada Sap-grandi Chl-limico Wln-succi2 Hlb-pylor6 Cam-jejun5Stm-ambofa Arb-globif Cor-xerosi Bif-bifidu Cfx-aurant Tmc-roseum Aqu-pyroph env-SBAR12 env-SBAR16 Msr-barker Tpl-acidop Msp-hungat Hf-volcani Mb-formici Mt-fervid1 Tc-celer Arg-fulgid Mpy-kandl1 M c-vanniel Mc-jannasc env-pJP27 Sul-acalda Thp-tenax env-pJP89 Tt-maritim Fer-island M ei-ruber4 D-radiodur Chd-psitta Acbt-capsl env-MC18 Pir-staley Lpn-illini Lps-interKSpi-stenos Trp-pallid Bor-burgdo Spi-haloph Brs-hyodys Fib-sucS85 Tmf-penden R-rubrum3 Azs-brasi2 Rm-vanniel Rhb-legum8 Bdr-japoni Spg-capsul Ric-prowaz Ste-maltop Spr-voluta Rub-gelat2 Rcy-purpur Nis-gonor1 Hrh-halch2 Alm-vinosm Ps-aerugi3 E-coliMyx-xanthu Bde-stolpiDsv-desulfDsb-postgaC-leptum C-butyric4 C-pasteuri Eub-barker C-quercico Hel-chlor2 Acp-laidla M-capricol C-ramosum B-stearoth Eco-faecal Lis-monoc3 B-cereus4 B-subtilis Stc-therm 3 L-delbruck L-casei Fus-nuclea Glb-violac Olst-lut_CZeamaysC Nost-muscr Syn-6301 Tnm -lapsum Flx-litora Cy-lytica Emb-brevi2 Bac-fragil Prv-rumcol Prb-difflu Cy-hutchin Flx-canada Sap-grandi Chl-limico Wln-succi2 Hlb-pylor6 Cam-jejun5Stm-ambofa Arb-globif Cor-xerosi Bif-bifidu Cfx-aurant Tmc-roseum Aqu-pyroph env-SBAR12 env-SBAR16 Msr-barker Tpl-acidop Msp-hungat Hf-volcani Mb-formici Mt-fervid1 Tc-celer Arg-fulgid Mpy-kandl1 M c-vanniel Mc-jannasc env-pJP27 Sul-acalda Thp-tenax env-pJP89 Tt-maritim Fer-island M ei-ruber4 D-radiodur Chd-psitta Acbt-capsl env-MC18 Pir-staley Lpn-illini Lps-interKSpi-stenos Trp-pallid Bor-burgdo Spi-haloph Brs-hyodys Fib-sucS85 Bacteria Archaea Bacteria Archaea A.rRNAtreeofBacterialandArchaealMajorGroups B.GroupswithCompletedGenomesHighlighted
  • 137. TIGRTIGR Acknowledgements • Genome duplications: S. Salzberg, J. Heidelberg, O. White, A. Stoltzfus, J. Peterson • Genome sequences and analysis: J. Heidelberg, T. Read, H. Tettelin, K. Nelson, J. Peterson, R. Fleischmann • Horizontal transfers: K. Nelson, W. F. Doolittle • TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul, Seqcore • $$$: DOE, NSH, NIH, ONR TIGRTIGR
  • 138. TIGRTIGR TIGTIG RR OtherOther peoplepeople Mom and DadMom and Dad S. KarlinS. Karlin M. FeldmanM. Feldman A. M. CampbellA. M. Campbell R. FernaldR. Fernald R. ShaferR. Shafer D. AckerlyD. Ackerly D. GoldsteinD. Goldstein M. EisenM. Eisen J. CourcelleJ. Courcelle R. MyersR. Myers C. M. CavanaughC. M. Cavanaugh P. HanawaltP. Hanawalt NSFNSF J. HeidelberJ. Heidelber T.ReadT.Read S. KaulS. Kaul M-I BenitoM-I Benito J. C. VenterJ. C. VenterC. FraseC. Fraser S. SalzbergS. Salzberg O. WhiteO. White K. NelsonK. Nelson $$$$$$ ONRONR DOEDOE NIHNIH H. TettelinH. Tettelin
  • 139. TIGRTIGR Using Evolutionary Analysis To Help Identify Novel Features in D. radiodurans
  • 140. TIGRTIGR Origins of Extreme Resistance • Functional divergence • Evolution of novel genes/processes • Acquire genes from other species • Gene duplication and functional divergence • Enhanced catalytic efficiency and/or coordination
  • 142. TIGRTIGR SNF2 Family of Proteins (1995) • SNF2 family defined by presence of conserved DNA- dependent ATPase domain • 100s of proteins • Diversity of functions: – transcriptional activation (SNF2) – transcriptional repression (MOT1) – Recombination (RAD54) – transcription-coupled repair (CSB) – post-replication repair (RAD5) – chromosome segregation (lodestar) – Many with unknown functions • Some species have 15+ representatives
  • 143. TIGRTIGR How to Sort Out Diversity in SNF2 Family • Presence of additional motifs – RING fingers – Bromodomains – Chromodomains • Interactions with other proteins • Evolutionary relationships – Orthology and paralogy – Subfamilies – Relationships among subfamilies
  • 144. TIGRTIGR SNF2 Alignment BRM hBRM hBRG1 mBRG1 STH1 SNF2 YB95 F37A4 ISWI SNF2L CHD1 SYGP ETL1 FUN30 MOT1 ERCC6 RAD26 YB53 DNRPPX hNUCP mNUCP RAD5 spRAD8 HIP116 RAD16 LODE NPH42 HepA B.cereus ORF I Ia Ib II III V VI C C R R R R Br CHD1 SNF2 SNF2L ETL1 RAD16 ERCC6 RAD54 RAD54 Br Br Br Br Br Protein Sub- Family SCALE (aa) 0 500 Helicase Motif s -- MOT1 IV
  • 145. TIGRTIGR SNF2 Subfamilies Subfamily Conserved Function SNF2 Transcription activation (Swi/Snf complex) SNF2L Transcription activation (NURF complex) CHD1 Chromatin remodelling ETL1 Unknown MOT1 Transcription repression CSB Transcription-coupled repair Rad54 Recombinational repair Rad16 Chromatin access for DNA repair HepA Bacterial RNA polymerase subunit HepA2 Unknown
  • 146. TIGRTIGR What Evolutionary Analysis Reveals About the SNF2 Family • Ancient duplication into two lineages may distinguish genes by type of activity • Multiple subfamilies with distinct sequences and functions. • Presence of particular orthologs can be predicted in species for which they have not been cloned. • Predict functions of uncharacterized members by orthology. • Addition of motifs to SNF2 domain occurred early in eukaryotic evolution. • Many duplications within eukaryotes. • Classificaiton into subfamilies helps search for functional motifs

Hinweis der Redaktion

  1. <number>
  2. <number>