SlideShare a Scribd company logo
1 of 50
Perl for PhyloInformatics What did we learn? What did we do?
Tree Concepts What are phylogenetic trees?
Phylogenetic Trees Describe the historical relationships among lineages of organisms or their parts, such as their genes.
Operational taxonomic units (OTU) / Taxa Sisters A Internal nodes B Terminal nodes or tips C D Root E F Branches Tree terminology
Interpreting phylogenies These trees are the same shape
Rooted vs. unrooted trees D A B E A B C D Root E C F F Rooted tree:Has a root that denotes common ancestry Unrootedtree:Only specifies the degree of kinship among taxa but not the evolutionary path Tree terminology
Rooted and unrooted trees The number of rooted and unrooted trees for n species is NR = (2n - 3)!/2n-2(n - 2)! NU = (2n - 5)!/2n-3(n - 3)!
A simple example
Why more rooted than unrooted? On an unrooted tree, the root can be placed on any of the branches.
Trees and classification
Monophyletic A monophyletic group is a group of organisms which forms a clade, meaning that it consists of an ancestor and all its descendants. (Most clades on our Supertree are monophyletic.)
Paraphyletic Aclade that excludes species that share a common ancestor with its members.
Polyphyletic A polyphyletic group is one whose members' most recent common ancestor is not a member of the group.
Example: birds and reptiles Reptiles, without the birds, form a paraphyletic group.
Change and time
A B C D E F Phylograms: Branch lengths are proportional to amount of change that occurred on that branch (these are the gene trees before r8s). Cladograms:Branch lengths are not proportional to the amount of changes (this is the Supertree from Monday). Cladograms and phylograms
Ultrametric trees If the distance from the root represents time (not change) we can use trees to study how fast new species form. (This is our final tree after we put it all together.)
Types of data What evidence are phylogenetic trees based on?
Distance data Example: DNA-DNA hybridization. The more closely related two species are, the more similar their DNA. The more similar the DNA, the stronger the bond between the two strands, and the shorter the distance.
Morphological characters Example: the shape of spider webs.
Molecular sequence data I am sure you have all heard about DNA sequencing. Amino acid sequences are often used for more distantly related species.
Types of Data Two categories Numerical data Evolutionary distance between two species Usually derived from sequence data Character data Each character has a finite number of states E.g. number or legs = 1, 2, 4 DNA = {A, C, T, G}
Tree reconstruction
Distance methods Types of data Distance matrices: DNA-DNA hybridization Computed from sequences Examples UPGMA is the oldest distance matrix method Neighbor-joining is more commonly used
Distance data When using sequences, distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference
Neighbor-Joining Methods Maintain a pairwise distance matrix Find the closest two taxa Collapse them into one row (internal node) and recompute distance from the merged row to every other row Loop to 2 Build tree as you go
Character methods Types of data Any homologized data: Morphological data Molecular sequences Examples Optimality-criterion methods: Maximum parsimony Maximum likelihood Bayesian methods: MCMC
What is homology? Example: forelimbs Definition Homology means any similarity between characters that is due to their shared ancestry. Anatomical structures that evolved from the same structure in some ancestor species are homologous. In genetics, homology can be observed inaligned DNA sequences.
What is an “optimality criterion”? An optimality criterion is simply a way to quantify, using a number, how well a tree fits the data relative to other trees. Examples are parsimony tree length (this is how the Supertree was optimized on the CIPRES cluster) and likelihood score. The posterior probability can also be seen as an optimality criterion.
Parsimony tree length Tree length is the minimum number of reconstructed changes. The most parsimonious tree is the tree with the fewest number of changes.
Finding the optimal tree Under an optimality criterion, trees need to be compared with one another to find the one that maximizes the optimality criterion. When we talk about MP and ML trees, this is usually done with hill-climbing algorithms.
…but this is not the whole story! Maximum Parsimony assumes a very simple model for evolutionary change – namely that change is rare. Especially molecular evolution can be modeled in more realistic ways, using substitution models. There are more complex ways to explore tree space than just hill-climbing (such as the Parsimony Ratchet). We can also sample different areas of tree space to see how optimality is distributed, using MCMC.
Substitution models
Base frequencies and substitution rates
Additional parameters Gamma distribution Invariant sites Perhaps some sites never change. Maybe specify their proportion?
Likelihood and the number of parameters More parameters always leads to a better fit of the data
Likelihood and the number of parameters More parameters always leads to a higher value of the likelihood whether or not the additional parameters are providing a ‘significantly’ better fit to the data
Are the extra parameters justified? Maximum Likelihood | H1 ( ) Likelihood ratio statistic:  2 log Maximum Likelihood | H0 Has chi-squared distribution dof = number of additional parameters (We did this with ModelTest)
How did we use the substitution models? Each substitution has an associated likelihood given a branch of a certain length and the estimated model parameters. A function is derived to represent the likelihood of the data given the tree, branch-lengths and additional parameters. Optimise the branch lengths to get the maximum likelihood estimate.
Estimating node ages
Rate smoothing r8s methods attempt to simultaneously estimate unknown divergence times and smooth the rapidity of rate change along lineages. This is done by invoking some function that penalizes rates that change too quickly from branch to neighboring branch.
supertree Given a cladogram, how do we infer the divergence dates of the true tree? A     B     C     D     E NOT time A        C       E A      B      D     E The relative lengths of some branches can be obtained from genes that fit an MLK model.
“true tree” A        C       E A      B      D     E A    B    E    D    C time Simmons Hackman Estimates from multiple molecular sequences can subsequently be combined by calibrating the gene trees on a common node, and applying the resulting node depths to the supertree.
Where did we get the other dates? If there is no extinction and constant speciation (!), the expected waiting time from one speciation event to the next is 1/n, where n=number of lineages. This is a little more complicated if we take multiple labeled histories into account… …but we can come up with expected ages this way.
PhyloInformatics
What is PhyloInformatics? A made up word! We’ve seen we have to deal with data of different types (trees, sequences, alignments, metadata). This are part of complex work flows or pipelines. We “do” phyloinformatics when we come up with repeatable ways to automate these pipelines.
The power of UNIX UNIX is very useful for phyloinformatics: Everything is text-based Everything can be scripted and called from other programs Many programs for phylogenetics are available on UNIX platforms Everything can be piped together to create larger workflows
The power of Perl Perl allows us to chain other UNIX tools together Many perl libraries exist for dealing with biological data Easy to learn, quick to develop
Join us! We do a lot more phyloinformatics: Hackathons Google Summer of Code Ongoing projects Stay in touch, we can help each other!
谢谢! Thank you!

More Related Content

What's hot (20)

Phylogenetics: Tree building
Phylogenetics: Tree buildingPhylogenetics: Tree building
Phylogenetics: Tree building
 
Mega
MegaMega
Mega
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 
Phylogenetic analyses1
Phylogenetic analyses1Phylogenetic analyses1
Phylogenetic analyses1
 
Protein database
Protein databaseProtein database
Protein database
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Bioinformatics t4-alignments v2014
Bioinformatics t4-alignments v2014Bioinformatics t4-alignments v2014
Bioinformatics t4-alignments v2014
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 
PHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGAPHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGA
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Scop database
Scop databaseScop database
Scop database
 
Molecular Evolution and Phylogenetics (2009)
Molecular Evolution and Phylogenetics (2009)Molecular Evolution and Phylogenetics (2009)
Molecular Evolution and Phylogenetics (2009)
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
 
NCBI
NCBINCBI
NCBI
 
Ddbj
DdbjDdbj
Ddbj
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 

Viewers also liked

Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programsKeith Bradnam
 
Y DNA Surname Projects - Some Fresh Ideas
 Y DNA Surname Projects - Some Fresh Ideas Y DNA Surname Projects - Some Fresh Ideas
Y DNA Surname Projects - Some Fresh IdeasFamily Tree DNA
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesisschamber
 
Tetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenTetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenJonathan Eisen
 
20120622 fridayadelboden
20120622 fridayadelboden20120622 fridayadelboden
20120622 fridayadelbodenYannick Wurm
 
The role of cost in yeast gene expression
The role of cost in yeast gene expressionThe role of cost in yeast gene expression
The role of cost in yeast gene expressionMichael Barton
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen
 
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...Jonathan Eisen
 
Bio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlBio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlRutger Vos
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
OBF Address at BOSC 2012
OBF Address at BOSC 2012OBF Address at BOSC 2012
OBF Address at BOSC 2012Hilmar Lapp
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomyRoderic Page
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
 
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Jonathan Eisen
 
The neurobiological nature of free will
The neurobiological nature of free willThe neurobiological nature of free will
The neurobiological nature of free willBjörn Brembs
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-updateJan Aerts
 

Viewers also liked (20)

Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programs
 
Y DNA Surname Projects - Some Fresh Ideas
 Y DNA Surname Projects - Some Fresh Ideas Y DNA Surname Projects - Some Fresh Ideas
Y DNA Surname Projects - Some Fresh Ideas
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesis
 
Tetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan EisenTetrahymena genome project 2003 presentation by Jonathan Eisen
Tetrahymena genome project 2003 presentation by Jonathan Eisen
 
20120622 fridayadelboden
20120622 fridayadelboden20120622 fridayadelboden
20120622 fridayadelboden
 
The role of cost in yeast gene expression
The role of cost in yeast gene expressionThe role of cost in yeast gene expression
The role of cost in yeast gene expression
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
Jonathan Eisen: Phylogenetic approaches to the analysis of genomes and metage...
 
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...Jonathan Eisen talk for #SCS2012 at #ISMB  "Networks in genomics and bioinfor...
Jonathan Eisen talk for #SCS2012 at #ISMB "Networks in genomics and bioinfor...
 
Bio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlBio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perl
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
Bio4j
Bio4jBio4j
Bio4j
 
OBF Address at BOSC 2012
OBF Address at BOSC 2012OBF Address at BOSC 2012
OBF Address at BOSC 2012
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
 
The neurobiological nature of free will
The neurobiological nature of free willThe neurobiological nature of free will
The neurobiological nature of free will
 
ORCID Principles
ORCID PrinciplesORCID Principles
ORCID Principles
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-update
 

Similar to Perl for Phyloinformatics

Phylogenetic Tree evolution
Phylogenetic Tree evolutionPhylogenetic Tree evolution
Phylogenetic Tree evolutionMd Omama Jawaid
 
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdfphylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdfalizain9604
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofbhavnesthakur
 
Bioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptxBioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptxshabirhassan4585
 
PHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptxPHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptxSilpa87
 
Basics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.pptBasics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.pptSehrishSarfraz2
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshellAvinash Kumar
 
Humans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiHumans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiNarcisaBrandenburg70
 
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
 Multiple Sequence Alignment-just glims of viewes on bioinformatics. Multiple Sequence Alignment-just glims of viewes on bioinformatics.
Multiple Sequence Alignment-just glims of viewes on bioinformatics.Arghadip Samanta
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Phylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondPhylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondRoderic Page
 

Similar to Perl for Phyloinformatics (20)

Phylogenetic Tree evolution
Phylogenetic Tree evolutionPhylogenetic Tree evolution
Phylogenetic Tree evolution
 
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdfphylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
 
6238578.ppt
6238578.ppt6238578.ppt
6238578.ppt
 
Bioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptxBioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptx
 
Cg7 trees
Cg7 treesCg7 trees
Cg7 trees
 
PHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptxPHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptx
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
Tools in phylogeny
Tools in phylogeny Tools in phylogeny
Tools in phylogeny
 
Basics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.pptBasics of constructing Phylogenetic tree.ppt
Basics of constructing Phylogenetic tree.ppt
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshell
 
phy prAC.pptx
phy prAC.pptxphy prAC.pptx
phy prAC.pptx
 
Humans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiHumans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organi
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
 Multiple Sequence Alignment-just glims of viewes on bioinformatics. Multiple Sequence Alignment-just glims of viewes on bioinformatics.
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
phylogenetics.pdf
phylogenetics.pdfphylogenetics.pdf
phylogenetics.pdf
 
50320130403003 2
50320130403003 250320130403003 2
50320130403003 2
 
Tree building
Tree buildingTree building
Tree building
 
Phylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondPhylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-Emond
 

More from Rutger Vos

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Rutger Vos
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over EvolutieRutger Vos
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course BiodiversiteitRutger Vos
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data scienceRutger Vos
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionRutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterflyRutger Vos
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningRutger Vos
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Rutger Vos
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataRutger Vos
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Rutger Vos
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveRutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenRutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline introRutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Rutger Vos
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLRutger Vos
 

More from Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Perl for Phyloinformatics

  • 1. Perl for PhyloInformatics What did we learn? What did we do?
  • 2. Tree Concepts What are phylogenetic trees?
  • 3. Phylogenetic Trees Describe the historical relationships among lineages of organisms or their parts, such as their genes.
  • 4. Operational taxonomic units (OTU) / Taxa Sisters A Internal nodes B Terminal nodes or tips C D Root E F Branches Tree terminology
  • 5. Interpreting phylogenies These trees are the same shape
  • 6. Rooted vs. unrooted trees D A B E A B C D Root E C F F Rooted tree:Has a root that denotes common ancestry Unrootedtree:Only specifies the degree of kinship among taxa but not the evolutionary path Tree terminology
  • 7. Rooted and unrooted trees The number of rooted and unrooted trees for n species is NR = (2n - 3)!/2n-2(n - 2)! NU = (2n - 5)!/2n-3(n - 3)!
  • 9. Why more rooted than unrooted? On an unrooted tree, the root can be placed on any of the branches.
  • 11. Monophyletic A monophyletic group is a group of organisms which forms a clade, meaning that it consists of an ancestor and all its descendants. (Most clades on our Supertree are monophyletic.)
  • 12. Paraphyletic Aclade that excludes species that share a common ancestor with its members.
  • 13. Polyphyletic A polyphyletic group is one whose members' most recent common ancestor is not a member of the group.
  • 14. Example: birds and reptiles Reptiles, without the birds, form a paraphyletic group.
  • 16. A B C D E F Phylograms: Branch lengths are proportional to amount of change that occurred on that branch (these are the gene trees before r8s). Cladograms:Branch lengths are not proportional to the amount of changes (this is the Supertree from Monday). Cladograms and phylograms
  • 17. Ultrametric trees If the distance from the root represents time (not change) we can use trees to study how fast new species form. (This is our final tree after we put it all together.)
  • 18. Types of data What evidence are phylogenetic trees based on?
  • 19. Distance data Example: DNA-DNA hybridization. The more closely related two species are, the more similar their DNA. The more similar the DNA, the stronger the bond between the two strands, and the shorter the distance.
  • 20. Morphological characters Example: the shape of spider webs.
  • 21. Molecular sequence data I am sure you have all heard about DNA sequencing. Amino acid sequences are often used for more distantly related species.
  • 22. Types of Data Two categories Numerical data Evolutionary distance between two species Usually derived from sequence data Character data Each character has a finite number of states E.g. number or legs = 1, 2, 4 DNA = {A, C, T, G}
  • 24. Distance methods Types of data Distance matrices: DNA-DNA hybridization Computed from sequences Examples UPGMA is the oldest distance matrix method Neighbor-joining is more commonly used
  • 25. Distance data When using sequences, distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference
  • 26. Neighbor-Joining Methods Maintain a pairwise distance matrix Find the closest two taxa Collapse them into one row (internal node) and recompute distance from the merged row to every other row Loop to 2 Build tree as you go
  • 27. Character methods Types of data Any homologized data: Morphological data Molecular sequences Examples Optimality-criterion methods: Maximum parsimony Maximum likelihood Bayesian methods: MCMC
  • 28. What is homology? Example: forelimbs Definition Homology means any similarity between characters that is due to their shared ancestry. Anatomical structures that evolved from the same structure in some ancestor species are homologous. In genetics, homology can be observed inaligned DNA sequences.
  • 29. What is an “optimality criterion”? An optimality criterion is simply a way to quantify, using a number, how well a tree fits the data relative to other trees. Examples are parsimony tree length (this is how the Supertree was optimized on the CIPRES cluster) and likelihood score. The posterior probability can also be seen as an optimality criterion.
  • 30. Parsimony tree length Tree length is the minimum number of reconstructed changes. The most parsimonious tree is the tree with the fewest number of changes.
  • 31. Finding the optimal tree Under an optimality criterion, trees need to be compared with one another to find the one that maximizes the optimality criterion. When we talk about MP and ML trees, this is usually done with hill-climbing algorithms.
  • 32. …but this is not the whole story! Maximum Parsimony assumes a very simple model for evolutionary change – namely that change is rare. Especially molecular evolution can be modeled in more realistic ways, using substitution models. There are more complex ways to explore tree space than just hill-climbing (such as the Parsimony Ratchet). We can also sample different areas of tree space to see how optimality is distributed, using MCMC.
  • 34. Base frequencies and substitution rates
  • 35. Additional parameters Gamma distribution Invariant sites Perhaps some sites never change. Maybe specify their proportion?
  • 36. Likelihood and the number of parameters More parameters always leads to a better fit of the data
  • 37. Likelihood and the number of parameters More parameters always leads to a higher value of the likelihood whether or not the additional parameters are providing a ‘significantly’ better fit to the data
  • 38. Are the extra parameters justified? Maximum Likelihood | H1 ( ) Likelihood ratio statistic: 2 log Maximum Likelihood | H0 Has chi-squared distribution dof = number of additional parameters (We did this with ModelTest)
  • 39. How did we use the substitution models? Each substitution has an associated likelihood given a branch of a certain length and the estimated model parameters. A function is derived to represent the likelihood of the data given the tree, branch-lengths and additional parameters. Optimise the branch lengths to get the maximum likelihood estimate.
  • 41. Rate smoothing r8s methods attempt to simultaneously estimate unknown divergence times and smooth the rapidity of rate change along lineages. This is done by invoking some function that penalizes rates that change too quickly from branch to neighboring branch.
  • 42. supertree Given a cladogram, how do we infer the divergence dates of the true tree? A B C D E NOT time A C E A B D E The relative lengths of some branches can be obtained from genes that fit an MLK model.
  • 43. “true tree” A C E A B D E A B E D C time Simmons Hackman Estimates from multiple molecular sequences can subsequently be combined by calibrating the gene trees on a common node, and applying the resulting node depths to the supertree.
  • 44. Where did we get the other dates? If there is no extinction and constant speciation (!), the expected waiting time from one speciation event to the next is 1/n, where n=number of lineages. This is a little more complicated if we take multiple labeled histories into account… …but we can come up with expected ages this way.
  • 46. What is PhyloInformatics? A made up word! We’ve seen we have to deal with data of different types (trees, sequences, alignments, metadata). This are part of complex work flows or pipelines. We “do” phyloinformatics when we come up with repeatable ways to automate these pipelines.
  • 47. The power of UNIX UNIX is very useful for phyloinformatics: Everything is text-based Everything can be scripted and called from other programs Many programs for phylogenetics are available on UNIX platforms Everything can be piped together to create larger workflows
  • 48. The power of Perl Perl allows us to chain other UNIX tools together Many perl libraries exist for dealing with biological data Easy to learn, quick to develop
  • 49. Join us! We do a lot more phyloinformatics: Hackathons Google Summer of Code Ongoing projects Stay in touch, we can help each other!