6. TIGRTIGR
Why use Phylogenomics?
Evolutionary information improves genome analysis
-Classification of multigene families
-Predicting functions
-Origins of genes and pathways
Genomics information improves evolutionary
reconstructions
-More sequences of genes
-Unbiased sampling
-Presence/absence needed to infer certain events
Feedback loop between two types of analysis
TIGRTIGR
9. TIGRTIGR
Predicting Function
• Identification of motifs
• Homology/similarity based methods
– Highest hit
– Top hits
– Clusters of orthologous groups
– HMM models
– Structural threading and modeling
– Evolutionary reconstructions
TIGRTIGR
10. TIGRTIGR
Types of Molecular Homology
• Homologs: genes that are descended from a common
ancestor (e.g., all globins)
• Orthologs: homologs that have diverged after speciation
events (e.g., human and chimp β-globins)
• Paralogs: homologs that have diverged after gene
duplication events (e.g., α and β globin).
• Xenologs: homologs that have diverged after lateral
transfer events
• Positional homology: common ancestry of specific amino
acid or nucleotide positions in different genes
12. TIGRTIGR
Blast Search of H. pylori “MutS”
Score E
Sequences producing significant alignments: (bits) Value
sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10
sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09
sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08
sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07
sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07
• Blast search pulls up Syn. sp MutS#2 with
much higher p value than other MutS
homologs
13. TIGRTIGR
H. pylori and MutS
• Prior to this genome, all species that
encoded a MutS homolog also encoded a
MutL homolog
• Experimental studies have shown MutS and
MutL always work together in mismatch
repair
• Problem: what do we conclude about H.
pylori mismatch repair
15. TIGRTIGR
MutS Alignment
EEDLKNRLCQKF . DA . HYNT IWMPT IQA I SN IDCLLA I TRTSEYLGAPSC
DTSLKDCMRRLFCNFDKNHKDWQSAVEC IAVLDVLLCLANYSQGGDGPMC
CSAEWLDFLEK . FS . . EHYHSLCKAVHHLATVDC I FSLAKV . . AKQGDYC
SELQYKEFLNK . I T . . AEYTELRK I TLNLAQYDC I LSLAAT . . SCNVNYV
EYELYKELRER . VV . . KELDKVGNNASAVAEVDF IQSLAQ I . . AYEKDWA
EYELFTELREK . VK . . QY I PRLQQLAKQMSELDALQCFAT I . . SENRHYT
EYE I FTEVRAT . VA . . EKAQP IRDVAKAVAA IDVLAGLAEV . . AVYQGYC
EQRVLKS I TDE . IV . . SHHKTLRSLANALDELD I STSLATL . . AQEQDFV
EAN I IDLFKRK . F I . . DRSNVVRQVATTLGYLDTLSSFAVL . . ANERNLV
QDA IVKE IVN I . SS . . GYVEPMQTLNDVLAQLDAVVSFAHVSNGAPVPYV
QSALVRE I IN I . TL . . TYTPVFEKLSLVLAHLDV IASFAHTSSYAP I PY I
EEER I LRQLSDQVL . . EVLLDLEHLLA IATRLDLATARVRY . . . SFWLGA
EVRKVLQR I TEY IG . . DYAKELLESFEACVEVDFQQCKYRF . . SKLVEGS
E I ER I LRVLTEKTA . . EYTEELFLDLQVLQTLDF I FAKARY . . AKAVKAT
TYMIVCKLLSE . IY . . EH IHCLYKLSDTVSMLDMLLSFAHA . . CTLSDYV
SEETVDELLDK . IA . . TH I SELFMIAEAVA I LDLVCSFTYN . . LKENNYT
ETLLMYQLQCQ . VL . . ARAAVLTRVLDLASRLDVLLALASA . . ARDYGYS
E I E I LFSLQEQ . I L . . RRKTQLTAYN I LLSELE I LLSFAQV . . SAERNYA
RPT IVDEVDSKTNTQLNGFLKFKSLRHPCFNLGA . . . TTA . KDF I PND I E
RPE IVLP . . GEDTHP . . . FLEFKGSRHPC I TKTF . . . FG . . DDF I PND I L
RPTVQEE . . . . . . R . . . . K IV IKNGRHPV IDVLL . . GEQ . . DQYVPNNTD
RPTFVNG . . . . . QQ . . . . A I IAKNARNP I I ESLD . . . . . . . VHYVPND IM
KPQ IHE . . . . . . GY . . . . EL I I EEGRHPV I EEF . . . . . V . . ENYVPNDTK
KPEFSK . . . . . . D . . . . . EVEV I EGRHPVVEKVM. . . DS . . QEYVPNNCM
RP IMQM. . . . . EPG . . . . L ID I EAGRHPVVEQSL . . . GA . . GFFVANDTQ
RPVVDD . . . . . SH . . . . . AHTV IQGRHP IVEKGL . . SHKL . I PFTPNDCF
CPKVDE . . . . . SN . . . . . KLEVVNGRHLMVEEGL . . SARSLETFTANNCE
RPA I LEK . . . . GQG . . . . R I I LKASR . . . VEVQD . . . . E . . IAF I PNDVY
RPKLHPM. . . DSER . . . . RTHL I SSRHPVLEMQD . . . . D . . I SF I SNDVT
HPPQWL . . . TPGDEK . . . P I TLRQLRHPLLHWQA . . EKEGGPAVVP I TLT
FPDFGE . . . . .WVE . . . . . . . LYEARHPVLVLVKED . . . . . VVPVG I LLK
KP IMND . . . . . TG . . . . . F IRLKKARHPLLPP . . . . . . . . . DQVVAND I E
RPEFTD . . . . . . . . . . . . TLA IKQGWHP I LEK I S . . . . A . . EKP IANNTY
I P I FTN . . . . . . . . . . . . NLL IRDSRHPLLEKVL . . . . . . . KNFVPNT I S
RPRYSPQ . . . . VL . . . . . GVR IQNGRHPLMELCA . . . . . . . RTFVPNSTE
EPQLVE . . . . . DEC . . . . I LE I INGRHALYETFL . . . . . . . DNY I PNSTM
LGKE . . . . . . QPR . . . . . .
IGCE . . . EEAEEHGKAY . .
LSED . . . . . . SER . . . . . .
MSPE . . . . . . NGK . . . . . .
LDRD . . . . . . SF . . . . . . .
MGDN . . . . . . RQ . . . . . . .
LGHD . . . . . . HWHPD . . . .
VGNGNV . . . . N . . . . . . . .
LAKD . . . . . . N . . . . . . . .
FEKD . . . . . . KQM. . . . . .
LESG . . . . . . KGD . . . . . .
IDSQ . . . . . . IR . . . . . . .
EKKG . . . . . . . . . . . . . . .
LGRD . . . . . . FS . . . . . . .
VTE . . . . . . . GSN . . . . . .
STKH . . . . . . SSS . . . . . .
CGGD . . . . . . KGR . . . . . .
IDGG . . LFSELSWCEQNKG
. LGLLTGANAAGKST I LRMAC IAV IMAQMGC
. CVLVTGPNMGGKSTL IRQAGLLAVMAQLGC
. VMI I TGPNMGGKSSY IKQVAL I T IMAQ IGS
. IN I I TGPNMGGKSSY IRQVALLT IMAQ IGS
. IHV I TGPNMAGKSSY IRQVGVLTLLSH IGS
.MLL I TGPNMSGKSTYMRQ IAL I S IMAQ IGC
. LV I LTGPNASGKSCYLRQVGL IQLMAQTGS
. IWL I TGPNMAGKSTFLRQNA I I S I LAQ IGS
. LWV I TGPNMGGKSTFLRQNA I IV I LAQ IGC
. FH I I TGPNMGGKSTY IRQTGV IVLMAQ IGC
. FL I I TGPNMGGKSTY IRQVGV I SLMAQ IGC
. V IA I TGPNTGGKTVTLKTLGLVALMAKVGL
. . L I LTGPNTGGKTVALKTLGLSVLMFQSA I
. T IV I TGPNTGGKTVTLKTLGLLTLMAQSGL
. FL I I TGPNMSGKSTYLKQ IALCQ IMAQ IGS
. LQ I I TGCNMSGKSVYLKQVAL IC IMAQMGS
. VKV I TGPNSSGKS IYLKQVGL I TFMALVGS
R I IVVTGANASGKSVYLTQNGL IVYLAQ IGC
YVPCESA . VLTP IDR IMTRLGANDN IMQGKSTFFVELAETKK I LD . . . . .
YVPAEKC . RLTPVDRVFTRLGASDR IMSGESTFFVELSETAS I LR . . . . .
YVPAEEA . T IG IVDG I FTRMGAADN IYKGRSTFMEELTDTAE I IR . . . . .
FVPAEE I . RLS I FENVLTR IGAHDD I INGDSTFKVEMLD I LH I LK . . . . .
F I PARRA . K I PVVDALFTR IGSGDVLALGVSTFMNEMLEVSN I LN . . . . .
FVPAKKA . VLP I FDQ I FTR IGAADDL I SGQSTFMVEMLEAKNA IV . . . . .
F I PAKTA . TLS ICDR I FTRVGAVDDLATGQSTFMVEMNETAN I LN . . . . .
FVPASNA . R IG IVDQ I FSR IGSADNLYQQKSTFMVEMMETSF I LK . . . . .
FVPCSKA . RVG IVDKLFSRVGSADDLYNEMSTFMVEMI ETSF I LQ . . . . .
FVPCESA . EVS IVDC I LARVGAGDSQLKGVSTFMAEMLETAS I LR . . . . .
FVPCEEA . E IA IVDA I LCRVGAGDSQLKGVSTFMVE I LETAS I LK . . . . .
Y I PAKETVEMPWFAQ I LAD IGDEQSLQQNLSTFSGH ICR I IR I LQALPSG
PVPASPNSKLPLFEKVFTD IGDEQS I EQNLSTFSAHVKNMAEFLP . . . . .
H I PADEGSEAAVFEHVFAD IGDEQS I EQSLSTFSSHMVN IVG I LE . . . . .
YVPAEYS . SFR IAKQ I FTR I STDDD I ETNSSTFMKEMKE IAY I LH . . . . .
G I PALYG . SFPVFKRLHARV . CNDSMELTSSNFGFEMKEMAYFLD . . . . .
FVPAEEA . E IGAVDA I FTR IHSCES I SLGLSTFMIDLNQVAKAVN . . . . .
FVPAERA . R IG IADK I LTR IRTQETVYKTQSSFLLDSQQMAKSLS . . . . .
C
S
A
A
A
S
A
A
A
A
A
A
A
G
A
S
L
. . . . . . . . . . . . .MATNRSLLVVDELGRGGSSSDGFA I
. . . . . . . . . . . . . HATAHSLVLVDELGRGTATFDGTA I
. . . . . . . . . . . . . KATSQSLV I LDELGRGTSTHDG IA I
. . . . . . . . . . . . . NCNKRSLLLLDEVGRGTGTHDG IA I
. . . . . . . . . . . . . NATEKSLV I LDEVGRGTSTYDG IA I
. . . . . . . . . . . . . NATKNSL I LFDE IGRGTSTYDGMAL
. . . . . . . . . . . . . HATAKSLVLLDE IGRGTATFDGLA I
. . . . . . . . . . . . . NATRRSFV IMDE IGRGTTASDG IA I
. . . . . . . . . . . . . GATERSLA I LDE IGRGTSGKEG I S I
. . . . . . . . . . . . . SATKDSL I I IDELGRGTSTYDGFGL
. . . . . . . . . . . . . NASKNSL I IVDELGRGTSTYDGFGL
VQDVLDPE IDSPNHP I FPSLVLLDEVGAGTDPTEGSAL
. . . . . . . . . . . . . KSDENTLVL IDELGAGTDP I EGSAL
. . . . . . . . . . . . . QVNENSLVLFDELGAGTDPQEGAAL
. . . . . . . . . . . . . NANDKSL I L IDELGRGTNTEEG IG I
. . . . . . . . . . . . . D INTETLL I LDELGRGSS IADGFCV
. . . . . . . . . . . . . NATAQSLVL IDEFGKGTNTVDGLAL
. . . . . . . . . . . . . LATEKSL I L IDEYGKGTD I LDGPSLF
Y
ESVLHHVATH I
SAVVKELAET I
YATLEYF IRDV
AL IKYFSELS
KA IVKY I SEKL
QA I I EYVHDH I
WSVAEYLAGE I
YGCLKYLST IN
YATLKYLLENN
WA I SEY IATK I
WA IAEH IASK I
IALLRHLADQP
IG I LEYLKKKK
MS I LDDVHRTN
YAVCEYLLSLK
LAVTEHLLRTE
AAVLRHWLARG
GS IMLNMSKSE
. QSLGF . FATHYGTLASSFKHHPQ . VRPLKMS I L . . . VDE . . . . . A . . . .
. KCRTL . FSTHYHSLVEDYSKSVC . VRLGHMACM. . . VENECEDPS . . . .
. KSLTL . FVTHYPPVCELEKNYSHQVGNYHMGFL . . . VSEDESKLDPGAA
. DCPL I LFTTHFPMLGE IKSPL . . . IRNYHMDYV . . . . EEQKTGED . . . .
. KAKTL . LATHFLE I TELEGK I EG . VKNYHMEVE . . . . . . . . . . . KT . . .
. GAKTL . FSTHYHELTVLEDKLPQ . LKNVHVRAE . . . . . . . . . . . EY . . .
. QART I . FATHYHELNELASLLEN . VANFQVTVK . . . . . . . . . . . EL . . .
. HSRTL . FATHAHQLTNLTKSFKN . VECYCTNLS . . . . . IDRD . . . . . . .
. QCRTL . FATHFGQELKQ I IDNKC . SKGMSEKVK . . . . . . FYQSG I TDLG
. GAFCM. FATHFHELTALANQ I PT . VNNLHVTALT . . . . . . . . . . . . . . .
. GCFAL . FATHFHELTELSEKLPN . VKNMHVVAH I . . . . . EKNLKEQKH .
. . CLTV . ATTHYGELKALKYQDAR . FENASVEFD . . . . . . . . . . . . . . . .
. . AWVF . VTTHHTP IKLYSTNSDY . YTPASVLFD . . . . . . . . . . . . . . . .
. . ARVL . ATTHYPELKAYGYNREG . VMNASVEFD . . . . . . . . . . . . . . . .
. . AFTL . FATHFLELCH IDALYPN . VENMHFEVQ . . . . . . . . HVK . . . NT
. . ATVF . LSTHFQD I PK IMSKKPA . VSHLHMDAV . . . . . . . . LLN . . . . .
PTCPH I FVATNFLSLVQLQLLPQGPLVQYLTMET . . . . . . . . . . . . . . . .
. KCPR I IACTHFHELFNENVLTEN IKG IKHYCTD I L I SQKYNLLETAHVG
. . . . TRNVTFLYKMLEGQSEGSFGMHVASMCG I SKE I IDNAQ IAAD
. . . . QET I TFLYKF IKGACPKSYGFNAARLANLPEEV IQKGHRKAR
EQV . PDFVTFLYQ I TRG IAARSYGLNVAKLADVPGE I LKKAAHKSK
. . . .WMSV I FLYKLKKGLTYNSYGMNVAKLARLDKD I INRAFS I SE
. . . . PEG IRFLY I LKEGKAEGSFG I EVAKLAGLPEEVVEEARK I LR
. . . . NGTVVFLHQ IKEGAADKSYG IHVAQLAELPGDL IARAQD I LK
. . . . PEE I I FLHQVTPGGADKSYG I EAGRLAGLPSSV I TRARQVMA
. . . . DHTFSFDYKLKKGVNYQSHGLKVAEMAG I PKNVLLAAEEVLT
. . . . GNNFCYNHKLKPG ICTKSDA IRVAELAGFPMEALKEARE I LG
. . . TEETLTMLYQVKKGVCDQSFG IHVAELANFPKHV I ECAKQKAL
. . . DDED I TLLYKVEPG I SDQSFG IHVAEVVQFPEK IVKMAKRKAN
. . . . DQSLSPTYRLLWG I PGRSNALA IAQRLGLPLA IVEQAKDKLG
. . . . RETLKPLYK IAYNTVGESMAFY IAQKYG I PSEV I E IAKRHVG
. . . . I ETLSPTYKLL IGVPGRSNAFE I SKRLGLPDH I IGQAKSEMT
SRNKEA I LYTYKLSKGLTEEKNYGLKAAEVSSLPPS IVLDAKE I TT
. . . . DNSVKMNYQLTQKSVA I ENSG IRVVKK I FNPD I IAEAYNMDS
. . . CEDGNDLVFFYQVCEGVAKASHASHTAAQAGLPDKLVARGKEV
EDHESEG I TFLFKVKEG I SKQSFG IYCAKVCGLSRD IVERAEELSR
----------------I------------------ -----------II------------ ------------III------------
------IV------
MSH6__Yeast
MSH6__Mouse
MSH3__Human
MSH3__Yeast
MutS__Aquae
MutS__Bacsu
MutS__Synsp
MSH1__Pombe
MSH1__Yeast
MSH2__Human
MSH2__Yeast
MutS2_Synsp
MutS2_Aquae
MutS2_Bacsu
MSH4__Human
MSH4__Yeast
MSH5__Human
MSH5__Yeast
MSH6__Yeast
MSH6__Mouse
MSH3__Human
MSH3__Yeast
MutS__Aquae
MutS__Bacsu
MutS__Synsp
MSH1__Pombe
MSH1__Yeast
MSH2__Human
MSH2__Yeast
MutS2_Synsp
MutS2_Aquae
MutS2_Bacsu
MSH4__Human
MSH4__Yeast
MSH5__Human
MSH5__Yeast
MSH6__Yeast
MSH6__Mouse
MSH3__Human
MSH3__Yeast
MutS__Aquae
MutS__Bacsu
MutS__Synsp
MSH1__Pombe
MSH1__Yeast
MSH2__Human
MSH2__Yeast
MutS2_Synsp
MutS2_Aquae
MutS2_Bacsu
MSH4__Human
MSH4__Yeast
MSH5__Human
MSH5__Yeast
16. TIGRTIGR
Phylogenetic Tree of MutS Family
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
20. TIGRTIGR
Overlaying Functions onto Tree
Aquae Trepa
Rat
Fly
Xenla
Mouse
Human
Yeast
Neucr
Arath
Borbu
Synsp
Neigo
Thema
Strpy
Bacsu
Ecoli
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
StrpyBacsu
Human
Celeg
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
MSH5
MutS2
MutS1
MSH1
MSH3
MSH6
MSH2
21. TIGRTIGR
Functional Prediction Using Tree
Aquae Trepa
Fly
Xenla
Rat
Mouse
Human
Yeast
Neucr
Arath
Borbu
Strpy
Bacsu
Synsp
Ecoli
Neigo
Thema
TheaqDeira
Chltr
Spombe
Yeast
Yeast
Spombe
Mouse
Human
Arath
Yeast
Human
Mouse
Arath
MSH1
Repair
in Mictochondria
MSH3
Repair of Loops
in Nucleus
MSH6
Repair of Mismatches
in Nucleus
MutS1
Repair of Loops and Mismatches
StrpyBacsu
Celeg
Human
Yeast
MetthBorbu
Aquae
Synsp
Deira Helpy
mSaco
Yeast
Celeg
Human
MSH4
Meiotic Crossing-Over
MSH5
Meiotic Crossing-Over
MutS2 Unknown Functions
MSH2
Repair of Loops and Mismatches
in Nucleus
23. TIGRTIGR
Why was the MutS2 Family Missed?
Blast Search of Syn. sp. MutS#2
Sequences producing significant alignments: (bits) Value
sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17
sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16
sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14
sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14
sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14
sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14
• Blast search pulls up standard MutS genes
but with only a moderate p value (10-17
)
24. TIGRTIGR
Problems with Similarity Based
Functional Prediction
• Prone to database error propagation.
• Cannot identify orthologous groups reliably.
• Perform poorly in cases of evolutionary rate
variation and non-hierarchical trees (similarity will
not reflect evolutionary relationships in these cases)
• May be misled by modular proteins or large
insertion/deletion events.
• Are not set up to deal with expanding data sets.
TIGRTIGR
28. TIGRTIGR
AlkA Domain (O6-Me-Gglycosylase)
Ogt Domain (O6-Me-Galkyltransferase)
Ada Domain (transcriptions regulator)
Ada E. coli
Ada H. infl
Ogt E. coli
Ogt H. infl
Ogt Gram+
Ogt D. radio
AlkA Gram+
AlkAE. coli
MGMTEuks
Alkylation Repair Genes
29. TIGRTIGR
Evolutionary
Method
P H Y L O G E N E N E T IC P R E D IC T IO N O F G E N E F U N C T IO N
ID E N T IFY H O M O L O G S
O V E R L A Y K N O W N
FU N C T IO N S O N T O T R E E
IN FE R L IK E L Y FU N C T IO N
O F G E N E (S) O F IN T E R E ST
1 2 3 4 5 6
3 5
3
1A 2A 3A 1B 2B 3B
2A 1B
1A
3A
1B
2B
3B
A L IG N SE Q U E N C E S
C A L C U L A T E G E N E T R E E
1
2
4
6
C H O O S E G E N E (S) O F IN T E R E ST
2A
2A
5
3
S pecies 3S pecies 1 S pecies 2
1
1 2
2
2 31
1A 3A
1A 2A 3A
1A 2A 3A
4 6
4 5 6
4 5 6
2B 3B
1B 2B 3B
1B 2B 3B
A C T U A L E V O L U T IO N
(A SSU M E D T O B E U N K N O W N )
Duplication?
E X A M P L E A E X A M P L E B
D uplication?
D uplication?
D uplication
5
M E T H O D
A m biguous
33. TIGRTIGR
UvrA Gene Family
• UvrA has conserved role in nucleotide excision repair in
bacteria (part of UvrABCD complex)
• UvrA homologs found in all complete bacterial genomes
• Some UvrA homologs have been found to be involved in
resistance to DNA damaging antibiotics
• UvrA accumulates at membrane after DNA damage
• All UvrAs are members of the ABC transporter family
• Possible role in DNA damage export?
34. TIGRTIGR
UvrAs in D. radiodurans
• UvrA homolog in D. radiodurans shown to be
part of UV endonuclease α complex
• D. radiodurans genome sequence reveals a second
UvrA gene - on the large megaplasmid
• D. radiodurans known to export DNA repair
products (e.g., damaged bases) out of cell after
damage
• Export may be important for radiation resistance
(Battista 1997)
35. TIGRTIGR
UvrA Evolution
• Originated by gene duplication of an ABC transporter
• Subsequently, there was a tandem duplication of the
ABC transporter motif within UvrA
• Ancient duplication into UvrA1 and UvrA2
subfamilies
• UvrA1s - conserved role in NER
• UvrA2s - transport of DNA damage?
• UvrA2 in D. radiodurans may be from lateral transfer
36. TIGRTIGR
Evolution of UvrA Family
UvrA2
UvrA2 S. coelicolor
DrrC S. peuceteus
UvrA2 D. radiodurans
Duplication
in UvrA
family
UvrA1
UvrA H. influenzae
UvrA E. coli
UvrA N. gonorrhoaea
UvrA R. prowazekii
UvrA S. mutans
UvrA S. pyogenes
UvrA S. pneumoniae
UvrA B. subtilis
UvrA M. luteus
UvrA M. tuberculosis
UvrA M. hermoautotrophicum
UvrA H. pylori
UvrA C. jejuni
UvrA P. gingivalis
UvrA C. tepidum
uvra1 D. radiodurans
UvrA T. thermophilus
UvrA T. pallidum
UvrA B. burgdorefi
UvrA T. maritima
UvrA A. aeolicus
UvrA Synechocystis sp.
UvrA1
UvrA2
OppDF
UUP
NodI
LivF
XylG
NrtDC
PstB
MDR
HlyB
TAP1
CFTR, SUR
A. ABC Transporters B. UvrA Subfamily
43. TIGRTIGR
Unusual Features of D. radiodurans
DNA Repair Genes
Process Genes
Nucleotide excision repair Two UvrAs
Base excision repair Four MutY-Nths
Recombination RecD but not RecBC
Replication Four Pol genes
dNTP pools Many MutTs, two RRases
Other UVDE
44. TIGRTIGR
Problem:
List of DNA repair gene homologs
in D. radiodurans genome is not
significantly different from other
bacterial genomes of the similar size
45. TIGRTIGR
Repair Studies in Different Species
(determined by Medline searches as of 1998)
Humans 7028
E. coli 3926
S. cerevisiae 988
Drosophila 387
B. subtilits 284
S. pombe 116
Xenopus 56
C. elegans 25
A. thaliana 20
Methanogens 16
Haloferax 5
Giardia 0
47. TIGRTIGR
Evolution of Uracil Glycosylase
• Ung activity has evolve many times (many non-
homologous proteins have uracil-DNA glycosylase
activity)
• Therefore, absence of homologs of these genes
should not be used to infer likely absence of
activity
• However, presence of homologs of Ung and MUG
genes can be used to indicate presence of activity
because all homologs of these genes have this
activity
48. TIGRTIGR
Evolution of Photoreactivation
• All known enzymes that perform photoreactivation are part of
a single large photolyase gene family
• Some members of the family do not function as photolyases,
but instead work as blue-light receptors
• If a species does not encode a member of the photolyase gene
family, it likely does not have photoreactivation capability
• If a species encodes a photolyase, one cannot conclude it has
photolyase activity
• Position of photolyase homologs within photolyase tree helps
predict what activities they have
49. TIGRTIGR
Evolution of Alkyltransferases
• All known alkyltransferases share a conserved,
homologous alkyltransferase domain
• Therefore, if a species does not encode any
protein with this domain, it likely does not have
alkyltransferase activity
• If a species does encode an member of this gene
family, it likely has alkyltransferase activity
51. TIGRTIGR
Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Aids functional predictions
• Recent duplications may be indicative of species’
specific adaptations
• Helps identify mechanisms of duplication
• Can be used to study mutation processes in
different parts of genome
52. TIGRTIGR
MurA Homologs in A. thaliana
A RA TH I F5F19.7
A R A TH II F26C24.7
A RA TH IV F7N22.13
A R A TH IV T3H13.11
A R A TH II F23M 2.29
A RA TH II T13H18.11
A RA TH IV T24H24.6
A R A TH IV T3H13.13A R A TH I F22O13.23
A R A TH II F9B22.14
A RA TH II F27C21.3
A R A TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4
A R A TH II T13E11.2
A RA TH IV T24M 8.2
A R A TH IV F7N22.10A R A TH IV T3F12.3
A R A TH II T13E11.15
A R A TH IV T7M 24.1
A R A TH IV T3F12.8
A RA TH V T21B04.1
A R A TH II F27L4.10
A R A TH II F26B6.15
A RA TH II F23M 2.24
A R A TH I F1N21.16
A R A TH IV F9D12.2
A RA TH II F9B22.8
A R A TH IV F28J12.70
A RA TH IV T3F12.12
A RA TH II T13P21.20
A R A TH II T13E11.10
A RA TH V T21B04.16
A R A TH V T19K24.12
A R A TH V T19K24.13
A R A TH V T19K24.17 A R A TH V T21B04.11
A R A TH V T21B04.14
A R A TH V T21B04.10
A R A TH II T13P21.21
A R A TH V T21B04.13
A R A TH V T21B04.12
A RA TH II T13P21.3
A R A TH V T19K24.15
A RA TH V T19K24.16
A R A TH II T13E11.20
A RA TH V T19K24.11
A R A TH II T13E11.21
A R A TH II T13E11.9
A R A TH V T19K24.10
A RA TH V T21B04.15
A RA TH V T19K24.14
A R A TH II T11J7.3
53. TIGRTIGR
MurA Homologs in A. thaliana
colored by chromosome
A R A TH I F5F19.7
A RA TH I F22O13.23
A R A TH I F1N21.16
A RA TH V T21B04.1
A RA TH V T21B04.16
A R A TH V T19K24.12
A R A TH V T19K24.13
A RA TH V T19K24.17 A RA TH V T21B04.11
A R A TH V T21B04.14
A R A TH V T21B04.10
A RA TH V T21B04.13
A RA TH V T21B04.12
A RA TH V T19K24.15
A R A TH V T19K24.16
A R A TH V T19K24.11
A R A TH V T19K24.10
A RA TH V T21B04.15
A R A TH V T19K24.14
A R A TH IV F7N22.13
A RA TH IV T3H13.11
A RA TH IV T24H24.6
A R A TH IV T3H13.13
A R A TH IV T24M 8.2
A R A TH IV F7N22.10
A R A TH IV T3F12.3
A RA TH IV T7M 24.1
A R A TH IV T3F12.8
A RA TH IV F9D12.2
A R A TH IV F28J12.70
A R A TH IV T3F12.12
A R A TH II F26C24.7
A R A TH II F23M 2.29
A R A TH II T13H18.11
A R A TH II F9B22.14
A R A TH II F27C21.3
A RA TH II T9F8.6A R A TH II T13P21.2A R A TH II F5O4.4A R A TH II T13E11.2
A R A TH II T13E11.15
A R A TH II F27L4.10
A R A TH II F26B6.15
A RA TH II F23M 2.24
A R A TH II F9B22.8
A RA TH II T13P21.20
A R A TH II T13E11.10
A R A TH II T13P21.21
A R A TH II T13P21.3
A RA TH II T13E11.20
A RA TH II T13E11.21
A R A TH II T13E11.9
A RA TH II T11J7.3
54. TIGRTIGR
Recent Duplications
• Gene duplication is frequently accompanied
by functional divergence
• Evolutionary analysis can identify recent
duplications with no bias towards type of
gene
• Location of duplicates can help identify
mechanisms of duplication
56. TIGRTIGR
Expansion of MCP Family in V. cholerae
E.coligi1787690
B.subtilisgi2633766
Synechocystissp. gi1001299
Synechocystissp. gi1001300
Synechocystissp. gi1652276
Synechocystissp.gi1652103
H.pylori gi2313716
H.pylori99 gi4155097
C.jejuniCj1190c
C.jejuniCj1110c
A.fulgidusgi2649560
A.fulgidusgi2649548
B.subtilisgi2634254
B.subtilisgi2632630
B.subtilisgi2635607
B.subtilisgi2635608
B.subtilisgi2635609
B.subtilisgi2635610
B.subtilisgi2635882
E.coligi1788195
E.coligi2367378
E.coligi1788194
E.coligi1789453
C.jejuniCj0144
C.jejuniCj0262c
H.pylori gi2313186
H.pylori99 gi4154603
C.jejuniCj1564
C.jejuniCj1506c
H.pylori gi2313163
H.pylori99 gi4154575
H.pylori gi2313179
H.pylori99 gi4154599
C.jejuniCj0019c
C.jejuniCj0951c
C.jejuniCj0246c
B.subtilisgi2633374
T.maritima TM0014
T.pallidumgi3322777
T.pallidumgi3322939
T.pallidumgi3322938
B.burgdorferi gi2688522
T.pallidumgi3322296
B.burgdorferi gi2688521
T.maritima TM0429
T.maritima TM0918
T.maritima TM0023
T.maritima TM1428
T.maritima TM1143
T.maritima TM1146
P.abyssiPAB1308
P.horikoshiigi3256846
P.abyssiPAB1336
P.horikoshiigi3256896
P.abyssiPAB2066
P.horikoshiigi3258290
P.abyssiPAB1026
P.horikoshiigi3256884
D.radiodurans DR A00354
D.radiodurans DRA0353
D.radiodurans DRA0352
P.abyssiPAB1189
P.horikoshiigi3258414
B.burgdorferi gi2688621
M.tuberculosisgi1666149
V .c hole ra eV C0 5 1 2
V . c hol e ra eV CA1 0 3 4
V .c hole ra eV CA 0 9 7 4
V .c hole raeV CA 0 06 8
V . chol e ra eV C0 8 2 5
V . c hol e ra eV C0 28 2
V .c hol e raeV CA 0 9 0 6
V . chol e ra eV CA0 9 7 9
V .c hol e raeV CA 1 0 5 6
V . c hol e ra eV C1 64 3
V . c hol e ra eV C2 1 6 1
V .c hole ra eV CA 09 2 3
V .c hole raeV C0 5 1 4
V . c hol e ra eV C1 8 6 8
V . c hol era eV CA0 7 7 3
V .c hole raeV C1 3 1 3
V . c hol era eV C1 8 5 9
V . c hole ra eV C14 1 3
V .c hol e raeV CA 0 2 6 8
V .c hol e raeV CA0 6 5 8
V . c hole ra eV C14 0 5
V . c hol e ra eV C1 2 9 8
V . c hol e ra eV C1 2 4 8
V . c hol era eV CA0 8 6 4
V . c hole ra eV CA0 1 7 6
V. c hol e ra eV CA0 2 2 0
V .c hole ra eV C1 2 8 9
V .c hole ra eV CA 10 6 9
V . c hol e ra eV C2 43 9
V . chol e ra eV C1 9 6 7
V . chol e ra eV CA0 0 3 1
V . c hole ra eV C18 9 8
V . chol e ra eV CA0 6 6 3
V .c hole ra eV CA 0 9 8 8
V . c hol era eV C0 2 1 6
V . c hol era eV C0 4 4 9
V .c hole ra eV CA 0 0 0 8
V . c hole ra eV C14 0 6
V . chol e ra eV C1 5 3 5
V .c hole ra eV C0 8 4 0
V . c hol e raeV C0 0 98
V .c hole ra eV CA 1 0 9 2
V .c hole ra eV C1 4 0 3
V .c hole ra eV CA1 0 8 8
V . c hol e ra eV C1 3 9 4
V .c hole ra eV C0 6 2 2
NJ
* *
* *
* *
*
* *
* *
* *
* *
* *
*
* *
* *
* *
* *
*
* *
* *
* *
* *
* *
* *
* *
* *
* *
* *
*
* *
* *
* *
* ** *
* *
*
*
*
*
* *
*
* *
* *
* *
*
* *
* *
*
58. TIGRTIGR
Levels of Paralogy Within A Genome
• All
– All members of a gene family are linked together
• Top matches
– Only top matching pairs are linked together.
Therefore, if in a large gene family, only the pair
from the most recent duplication event is included
• Recent
– Operational definition based on comparison to other
species. Only pairs which are more similar to each
other than to selected other species are included.
TIGRTIGR
59. TIGRTIGR
C. pneumoniae Paralogs - All
0
250000
500000
750000
1000000
1250000
SubjectOrfPosition
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
60. TIGRTIGR
C. pneumoniae Paralogs - Top
0
250000
500000
750000
1000000
1250000
SubjectOrfPosition
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
72. TIGRTIGR
Why Gene Loss is Useful to Identify
• Indicates that gene is not absolutely required for
survival
• Helps distinguish likelihood of gene transfers
• Correlated loss of same gene in different species
may indicate selective advantage of loss of that
gene
• Correlated loss of genes in a pathway indicates a
conserved association among those genes
73. TIGRTIGR
EuksArch Bacteria
Loss
Evolutionary O rigin of Gene
MT MJ SC HS AA DR TA BS MG MP BB TP HP HI EC SS MT
Presence ( ) or Absence of Gene
Species Abbreviation
Kingdom
Example of Tracing Gene Loss
TIGRTIGR
75. TIGRTIGR
Need for Phylogenomics Example:
Gene Duplication and Loss
• Genome analysis required to determine number of
homologs in different species
• Evolutionary analysis required to divide into
orthology groups and identify gene duplications
• Genome analysis is then required to determine
presence and absence of orthologs
• Then loss of orthologs can be traced onto
evolutionary tree of species
78. TIGRTIGR
Species Distribution of Homologs of
D. radiodurans Genes
0
10
20
30
40
50
60
0 5 10 15 20
0
50
100
150
0 5 10 15 20
NumberofSpeciesWithHighHits
0
50
100
150
200
250
Frequency
0 5 10 15 20
PapaBear MamaBear BabyBear
0
100
200
300
400
500
0 5 10 15 20
E.coli
79. TIGRTIGR
Megaplasmid I:
Iron Utilization/Iron Transport
ORFB040 Na+/H+ antiporterORFB040 Na+/H+ antiporter
ORFB042 iron ABC transporter, ATP-binding proteinORFB042 iron ABC transporter, ATP-binding protein
ORFB044 iron ABC transporter, permease proteinORFB044 iron ABC transporter, permease protein
ORFB045 iron ABC transporter, permease proteinORFB045 iron ABC transporter, permease protein
ORFB046 iron-chelator utilization proteinORFB046 iron-chelator utilization protein
ORFB047 iron ABC transporter, periplasmic substrate bpORFB047 iron ABC transporter, periplasmic substrate bp
ORFB067 putative metal binding proteinORFB067 putative metal binding protein
ORFB141 iron-chelator utilization proteinORFB141 iron-chelator utilization protein
ORFB074 hemin ABC transporter, periplasmic hemin bpORFB074 hemin ABC transporter, periplasmic hemin bp
ORFB075 hemin ABC transporter, permease proteinORFB075 hemin ABC transporter, permease protein
ORFB076 hemin ABC transporter, ATP-binding proteinORFB076 hemin ABC transporter, ATP-binding protein
80. TIGRTIGR
Specialized Genetic Elements
(Chromosome II and Megaplasmid)
• Many two component systems
• Nitrogen metabolism
• LexA
• Ribonucleotide reductase
• UvrA2
• Many transcription factors (e.g., HepA)
• Iron metabolism
82. TIGRTIGR
V. cholerae vs. E. coli All Hits
0
1000000
2000000
3000000
4000000
5000000E.coliCoordinates
0 1000000 2000000 3000000
V. cholerae CoordinatesTIGRTIGR
83. TIGRTIGR
V. cholerae vs. E. coli Top Hits
0
1000000
2000000
3000000
4000000
5000000
E.coliCoordinates
0 1000000 2000000 3000000
V. cholerae CoordinatesTIGRTIGR
84. TIGRTIGR
V. cholerae vs. E. coli
Only if EC-Orf is Closest in All Genomes
0
1000000
2000000
3000000
4000000
5000000
E.coliCoordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
TIGRTIGR
85. TIGRTIGR
V. cholerae vs. E. coli Proteins
Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
86. TIGRTIGR
V. cholerae vs. E. coli F+R
0
1000000
2000000
3000000
4000000
5000000
Bert
Ecoli R
Ecoli
89. TIGRTIGR C. trachomatis MoPn
C.pneumoniaeAR39
Origin
Termination
C. trachomatis vs C. pneumoniae Dot Plot
90. TIGRTIGR
Duplication and Gene Loss Model
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
E
F
A
B
C
D
E
F
A ’
B’
C’
D’
E’
F ’
A
B
C
D
E
F
A ’
B’
C’
D’
E’
F’
A
C
D
F
A ’
B’
E’
E. coli
E. coli
B
C
D
F
A ’
B’
D’
E’
V. cholerae
A
B
C
D
E
F
A ’
B’
C’
D’
E’
F’
100. TIGRTIGR
Examples of Horizontal Transfers
• Antibiotic resistance genes on plasmids
• Insertion sequences
• Pathogenicity islands
• Toxin resistance genes on plasmids
• Agrobacterium Ti plasmid
• Viruses and viroids
• Organelle to nucleus transfers
101. TIGRTIGR
Why Gene Transfers Are Useful to Identify
• Laterally transferred genes frequently involved in
environmental adaptations and/or pathogenicity
• Helps identify transposons, integrons, and other
vectors of gene transfer
• Helps identify species associations in the
environment
•
103. TIGRTIGR
How to Infer Gene Transfers
• Unusual distribution patterns
• Unusual nucleotide composition
• High sequence similarity to supposedly
distantly related species
• Unusual gene trees
• Observe transfer events
104. TIGRTIGR
Inferring Lateral Transfers
Observation Other Causes Always Occurs
Unusual Distribution Sampling bias Not if recipient already has gene.
Unusual GC/Codons Selection Not if donor/recipient similar.
Not if it occurred long ago.
High hit to "distant" species Selection
Rate variation
Gene loss
Usually.
Incongruent trees Bad trees
Missed paralogs
Usually.
Correlation of above with
neighbors
Selection Only if genes keep order after
transfer.
105. TIGRTIGR
E. coli and S. typhimurium Transfer
E. coli S. typhimurium
Old Model
E. coli S. typhimurium
New Model
106. TIGRTIGR
PGKPGK
Neighbor-joining;Neighbor-joining;
bootstrap;bootstrap;
50% majority rule50% majority rule
consensusconsensus
outgroup = Archaeaoutgroup = Archaea
T. maritima
M. genitalium
M. pneumoniae
A. aeolicus
B. burgdorferi
T. pallidum
B. subtilis
Synechocystis
E. coli
H. influenzae
H. pylori
M. tuburculosis
S. cerevisiae
A. fulgidus
M. jannascii
M. thermoauto
P. horikoshii
89
57
100
59
58
58 100
83
100
B. subtilis
S. cerevisiae
T. maritima
H. pylori
M. pneumoniae
M. genitalium
Synechocystis
B. burgdorferi
T. pallidum
P. horikoshii
M. jannascii
M. thermoautoA. fulgidus
A. aeolicus
H. influenzae
E. coli
M. tuburculosis
107. TIGRTIGR
Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes**
Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal
Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%)
Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%)
SynechocystisSynechocystis sp.sp. 126 (4%)126 (4%)
Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%)
Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%)
** 1010-5-5
over 60% of sequenceover 60% of sequence
108. TIGRTIGR
Evidence for lateral gene transfer inEvidence for lateral gene transfer in
ThermotogaThermotoga
1. 81 archaeal-like genes are clustered in 15 regions which
range in size from ~ 4 to 20 kb; many share conserved gene
order with their archaeal counterparts.
2. Many of the archaeal-like genes correspond to regions with
a significantly different base composition than the rest of
the chromosome.
3. Some of these regions are associated with a 30 bp repeat
structure found only in thermophiles.
4. Initial phylogenetic analyses of some of these genes lends
support to lateral gene transfer.
114. TIGRTIGR
N. meningitidis hits vs. genome size
0.0 250.0 500.0 750.0 1000.0 1250.0
Number ofN. meningitidis ORFs that have a significant hit
0.01000.02000.03000.04000.05000.0
TotalORFsinGenome Proteome Comparison ofN. meningitidis to other Complete Genomes
Archaea
H. influenzae
V. cholerae
E. coli
123. TIGRTIGR
Deinococcus-Thermus Comparison
• Took all available T. thermophilus proteins
• Searched against database of all available
complete genomes (including D. radiodurans)
• Identified gene with highest fasta p value
• Phylogenetic analysis of all genes with >4
homologs
125. TIGRTIGR
Significance of Deinococcus-
Thermus Relationship
• Mechanisms of extreme heat, radiation, and
desiccation resistance may be similar
• Complete genome of Thermus will be very
useful in identifying novel genes in
Deinococcus
• Shows utility of incomplete genome
sequences.
126. TIGRTIGR
Outline of Phylogenomics
Gene Evolution Events
Phenotype Predictions
Database
Species tree Presence/AbsenceGene trees
Congruence Evol. Distribution
F(x) Predictions
Pathway Evolution
TIGRTIGR
127. TIGRTIGR
Steps in Phylogenomic Analysis
• Create database of genes of interest
• Presence/absence of homologs in complete genomes
• Phylogenetic trees of each gene family
• Infer evolutionary events (gene origin, duplication, loss and
transfer)
• Refine presence/absence (orthologs, paralogs, subfamilies)
• Functional predictions and functional evolution
• Analysis of pathways
128. TIGRTIGR
Phylogenomics I:
Presence/Absence of Homologs
• Important to have complete genomes
• Similarity searches with high “homology
threshold” (to prevent false positives)
• Iterative searches (to prevent false negatives)
• Multiple sequence alignments to confirm
assignment of homology and to divide up
multi-domain proteins
129. TIGRTIGR
Phylogenomics II:
Phylogenetic Analysis of Homologs
• Multiple sequence alignment
• Mask alignment (exclude certain regions)
– ambiguous regions of alignment
– hypervariable regions and regions with large gaps
• Phylogenetic tree with method of choice
• Robustness checks
– bootstrapping
– compare trees with different alignments
– compare trees with different tree-building methods
130. TIGRTIGR
Phylogenomics III:
Inferring Evolutionary Events
• Infer evolutionary distribution patterns (overlay
presence/absence onto species tree)
• Compare gene tree vs. species tree
• Compare gene tree vs. evolutionary distribution
• Infer gene duplication and transfer events
• Combine gene transfer and duplication
information with evolutionary distribution
analysis to infer gene loss, gene origin, and
timing of gene duplications and transfers
131. TIGRTIGR
Phylogenomics IV:
Functional Predictions and Evolution
• Overlay experimentally determined functions
onto gene tree
• Infer changes in function
– many changes suggests caution should be used in
making new predictions
• Predict functions based on position in tree
relative to genes with known functions and
based on orthology groups
132. TIGRTIGR
Phylogenomics V:
Pathway Analysis
• Correlated presence/absence of all genes in pathway in
different species?
– If not, maybe non-orthologous gene displacement
– Alternatively, pathway may be different between species
• Correlated evolutionary events for genes in pathway
– loss of all genes at once
– correlated duplications?
• Compare evolution of function between pathways
– The number of times an activity has evolved helps in making
predictions of function/phenotype
133. TIGRTIGR
Evolution as a Screening
Method
• Gene duplications
• Gene loss
• Lateral gene transfers
• Organellar genes
• Structurally constrained genes
• Correlated evolutionary changes
137. TIGRTIGR
Acknowledgements
• Genome duplications: S. Salzberg, J. Heidelberg, O.
White, A. Stoltzfus, J. Peterson
• Genome sequences and analysis: J. Heidelberg, T.
Read, H. Tettelin, K. Nelson, J. Peterson, R.
Fleischmann
• Horizontal transfers: K. Nelson, W. F. Doolittle
• TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul,
Seqcore
• $$$: DOE, NSH, NIH, ONR
TIGRTIGR
138. TIGRTIGR
TIGTIG
RR
OtherOther
peoplepeople
Mom and DadMom and Dad
S. KarlinS. Karlin
M. FeldmanM. Feldman
A. M. CampbellA. M. Campbell
R. FernaldR. Fernald
R. ShaferR. Shafer
D. AckerlyD. Ackerly
D. GoldsteinD. Goldstein
M. EisenM. Eisen
J. CourcelleJ. Courcelle
R. MyersR. Myers
C. M. CavanaughC. M. Cavanaugh
P. HanawaltP. Hanawalt
NSFNSF
J. HeidelberJ. Heidelber
T.ReadT.Read
S. KaulS. Kaul
M-I BenitoM-I Benito
J. C. VenterJ. C. VenterC. FraseC. Fraser
S. SalzbergS. Salzberg
O. WhiteO. White
K. NelsonK. Nelson
$$$$$$
ONRONR
DOEDOE
NIHNIH
H. TettelinH. Tettelin
142. TIGRTIGR
SNF2 Family of Proteins (1995)
• SNF2 family defined by presence of conserved DNA-
dependent ATPase domain
• 100s of proteins
• Diversity of functions:
– transcriptional activation (SNF2)
– transcriptional repression (MOT1)
– Recombination (RAD54)
– transcription-coupled repair (CSB)
– post-replication repair (RAD5)
– chromosome segregation (lodestar)
– Many with unknown functions
• Some species have 15+ representatives
143. TIGRTIGR
How to Sort Out Diversity in SNF2 Family
• Presence of additional motifs
– RING fingers
– Bromodomains
– Chromodomains
• Interactions with other proteins
• Evolutionary relationships
– Orthology and paralogy
– Subfamilies
– Relationships among subfamilies
146. TIGRTIGR
What Evolutionary Analysis
Reveals About the SNF2 Family
• Ancient duplication into two lineages may distinguish
genes by type of activity
• Multiple subfamilies with distinct sequences and
functions.
• Presence of particular orthologs can be predicted in
species for which they have not been cloned.
• Predict functions of uncharacterized members by
orthology.
• Addition of motifs to SNF2 domain occurred early in
eukaryotic evolution.
• Many duplications within eukaryotes.
• Classificaiton into subfamilies helps search for
functional motifs