This document summarizes research on analyzing microbial communities and relationships using genomic tools. It discusses how gene profiles and trees can be used to understand microbial identities and relationships. Network approaches are presented as useful for capturing complex relationships between microbes, including lateral gene transfer. Phylogenetic reconciliation and supertree methods are described for inferring evolutionary histories and minimum gene transfer events needed to explain gene tree discordance with species trees. The analysis of one particular microbe, Lachnozilla, is discussed as a case study.
11. PNAS, 2012
Gene transfer matters
“…pathogen-driven inflammatory responses in the gut can generate transient enterobacterial
blooms in which conjugative transfer occurs at unprecedented rates.”
PLoS Biol, 2007
“…lateral gene transfer, mobile elements, and gene amplification have played important roles in
affecting the ability of gut-dwelling Bacteroidetes to vary their cell surface, sense their
environment, and harvest nutrient resources present in the distal intestine.”
11
17. From profile to distance matrix
17
Gene 1 Gene 2 Gene 3 Gene 4 Gene n
A
B
C
D
E
F
S1 = 0.91 0.82 0.72 0.89
푑퐴,퐵 = 1.0 −
1
푛
푛
푔=1
푆푔 A B C
A 0 0.165 0.252
B 0.165 0 0.297
C 0.252 0.297 0
18. Neighbor-joining
18
Start with a ‘star’ tree
At each iteration, split off the pair of taxa that minimizes the total sum
of branch lengths in the tree
Choose groups x and y to minimize the Q-criterion:
Distance matrix entry for (x,y)
x
y
Weighted distance to all leaves
23. Limitations of neighbor-net
• Neighbor-net still imposes a constraint on the
relationships among genomes: “long-distance”
connections cannot be shown
23
?
24. Explicit connections between
genomes
• Make each genome a vertex in a graph G
V = {A,B,C,D,E,F,…}
E = {{A,B},…}
For some threshold t:
{A,B} ϵ G iff dA,B ≤ t
or if some other condition is satisfied
24
A B
wA,B
25. Linear programming
•Weighting networks based on straight
genome-genome similarity highlights
close relatives, redundancy
• LP introduces weighting scheme that
constrains connections and promotes
distinct relationships
25
26. P. aeruginosa
P. fluorescens
P. lePewtida
P. syringae
P. entomophila
P. stutzeri
P. mendocina
“Plume”
Holloway and Beiko, BMC Evol Biol (2010)
26
27. 27
Some like it hot
Pyrococcus furiosus
optimal growth temperature:
100°C
31. Phylogenetic tree reconciliation
31
Species tree S Lateral gene transfer Gene tree G
Subtree prune and regraft
Whidden et al., Syst Biol (2014)
32. 32
For two rooted trees, dSPR is equal to the
number of components in a MAF, minus 1
So building a MAF is equivalent to inferring the minimum
number of SPR events needed to reconcile a species tree
with a gene tree
Problem is NP-hard
dSPR = 1
MAF components = 2
Bordewich and Semple, Ann Combinatorics (2005)
33. 33
T1 T2
Case 1
(separate components)
Case 3
(several pendant nodes)
Case 2
(one pendant node)
Chris’s algorithm
34. Fixed-parameter tractability
• Problem is dominated by Case 3 (3 alternatives)
• Cut all candidate edges at each step = linear 3-approximation
• Decision problem: 푂 2.42푘푛 to decide if SPR distance ≤ k
• Problem is exponential in SPR distance, NOT number of leaves
therefore FPT
Chris Whidden + Norbert Zeh 34
36. SPR Supertrees
Supertree: a tree that satisfies some optimality
criterion with respect to a set of input trees
SPR supertree: given a set of gene trees, find a tree
that minimizes the total number of SPR operations vs. all
gene trees
Building an SPR supertree: assemble an initial tree,
then propose SPR operations and evaluate its total SPR
distance from input trees
Whidden et al., 2014 36
37. Why SPR supertrees?
1. Explicit representation of LGT events
2. Branches broken in MAF → implied
LGT events. Can build graph of
connections
37
38. 244 bacterial genomes
40,631 gene trees
= Bacterial SPR supertree
LGT patterns for Clostridium
Whidden et al., 2014
41. Phylogenetic profile based
on extremely good matches to
other genomes
(> 95% ID, > 95% coverage)
= “recent” LGT events
C. difficile
….
“Virulence-associated protein”
Mobile DNA
41
48. Lachnozilla in graph form
(it all makes sense now)
Legionaminic acid
Acetylneuraminic acid
(pathogen associated)
Bacteroides pectinophilus
Butyrivibrio proteoclasticus
Eubacterium plexicaudatum
Roseburia
Neighbors
Weirdly named isolates