SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Big data and outlier loci:
A cautionary tale with genome-scale
phylogenetic data
Lyndon M. Coghill1,Vinson Doyle1, Van Wishingrad2,Robert C. Thomson2 & JeremyM. Brown1
1.0 1.0?
Genome-scale Data Use Increasing for
Phylogenetics
0
5000
10000
15000
20000
25000
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
PublishedGenomic-ScalePhylogenies
Year
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Large datasets are desirable but…
• Process can be complicated.
• Different data generation
methods, produce different
results.
• How this process affects the
quality of these datasets is poorly
understood.
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
?
Lab
Magic
Pipeline.canned()
An Example (Turtle Placement)
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
1. Chiari et al.
2. Fong et al.
3. Wang et al.
4. Crawford et al.
5. Lu et al.
6. Shaffer et al.
All supported archosaur sister placement
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
?
1. Chiari et al.
2. Fong et al.
3. Wang et al.
4. Crawford et al.
5. Lu et al.
6. Shaffer et al.
All supported archosaur sister placement
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
?
Bayes Factors as branch specific support
• Alternative measure of
support for topological
relationships.
• Ratio of marginal
likelihoods between two
hypotheses.
𝑩𝒂𝒚𝒆𝒔	
   𝑭 𝒂𝒄𝒕𝒐𝒓 =	
  
𝑷 𝑫𝒂𝒕𝒂	
   	
   𝑯 𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒔 𝟏)
𝑷 𝑫𝒂𝒕𝒂	
   	
   𝑯 𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒔 𝟐)
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
H1:	
  Bi-­‐partition	
  is	
  present H2:	
  Bi-­‐partition	
  is	
  absent:
• Calculated 2 marginal likelihoods to
examine turtle placement.
• 1: Constrained turtle placement to a
single position in the tree.
• 2. Considered all other hypothesized
positions for turtles.
Bayes Factors (Turtle Placement)
Archosaur	
  Sister	
  PlacementAll	
  Other	
  Placements
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Bayes Factors Support for Turtle Placement
ChiariCrawfordFong
ShafferLuWang
Bayes Factors Support for Turtle Placement
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Low	
  number	
  of	
  genes	
  with	
  strong	
  support
ChiariCrawfordFong
ShafferLuWang
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
What genes support croc
sister placement
• Comparison of posterior probabilities
to 2ln(BF) values for croc and turtle
monophyly.
• 248 genes from Chiari dataset.
• Comparison of posterior probabilities
to 2ln(BF) values for croc and turtle
monophyly.
• 248 genes from Chiari dataset.
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
What genes support croc
sister placement
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
What genes support croc
sister placement
• Comparison of posterior probabilities
to 2ln(BF) values for croc and turtle
monophyly.
• 248 genes from Chiari dataset.
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
What genes support croc
sister placement
• Comparison of posterior probabilities
to 2ln(BF) values for croc sister
placement.
• 248 genes from Chiari dataset.
• Examine most extreme
outlier genes supporting
croc sister placement.
• ~ 1% of genes were outliers
with strong support.
• What is their effect on
inference…?
15 /	
  1113	
  genes
2 /	
  248	
  genes
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Testing the effect of
outliers
Wang	
  Dataset
Chiari	
  Dataset
All	
  Genes Top	
  1%	
  of	
  BF	
  outlier	
  genes	
  removed
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
1.0
1.0
Effect of outlier genes on topology
Brown et al. Sys. Bio. In Review.
• Paralogy
• Systematic Error
What’s driving the outliers?
A A B B
Duplication	
  Event
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Evidence of Paralogy
• BLAST genes against closest
genome.
• Pull hits > 70% (~ 2 – 3)
• Hits non-contiguous.
• Concatenate hits.
• Infer new tree..
+
Original	
  Sequence
Hit	
  1 Hit	
  2 Hit	
  3
Hit	
  Contig
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Evidence of Paralogy
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
• Paralogy
• Systematic Error
• Model Fit
Coming Attractions Systematic	
  Error
Random	
  Error
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Bayesian Posterior Prediction
I. Drawing trees and parameters
from posterior distribution
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
I
II
Bayesian Posterior Prediction
I. Drawing trees and parameters
from posterior distribution
II. Use that data to simulate new
data sets
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
I
II III
Bayesian Posterior Prediction
I. Drawing trees and parameters
from posterior distribution
II. Use that data to simulate new
data sets
III.Summarize each dataset using
a test statistic
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
I
II III
IV
Bayesian Posterior Prediction
I. Drawing trees and parameters
from posterior distribution
II. Use that data to simulate new
data sets
III.Summarize each dataset using
a test statistic
IV.Compare empirical test
statistic value to simulated
distribution
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
I
Take Home
• Support can be misleading when using genomic-scale data.
• Standard support values hide a lot of variation in underlying data.
• Some loci have outlying extreme support values.
• Caution:
• Outlier loci included in joint analyses can have huge influence.
• Small differences in analytical choices can have huge influence on results.
• Using Bayes Factors as a measure of support can help identify some of
this hidden variation.
Background Identifying	
  Outlier	
  Genes What’s	
  driving	
  outliers Take	
  Home
Acknowledgements
Brown Lab
Guifang Zhou
Genevieve Mount
David Morris
DEB-1355071
DEB-1354506
DBI-1356796

Weitere ähnliche Inhalte

Andere mochten auch

20160415 Klima2100 Gjensidige forsikring
20160415 Klima2100 Gjensidige forsikring20160415 Klima2100 Gjensidige forsikring
20160415 Klima2100 Gjensidige forsikringHanne Heiberg
 
Interpretación de la relaciones metabólicas de los organismos
Interpretación de la relaciones metabólicas de los organismosInterpretación de la relaciones metabólicas de los organismos
Interpretación de la relaciones metabólicas de los organismosYuzi Luna
 
Cuestionario #3
Cuestionario #3Cuestionario #3
Cuestionario #3Majo Wong
 
Quirógrafo
Quirógrafo Quirógrafo
Quirógrafo Yuzi Luna
 
Inteligencia artificial y test de alan turing 1
Inteligencia artificial y test de alan turing 1Inteligencia artificial y test de alan turing 1
Inteligencia artificial y test de alan turing 1anagayan
 
Heartland Pergolas who we are infographic
Heartland Pergolas who we are infographicHeartland Pergolas who we are infographic
Heartland Pergolas who we are infographicAaron Cook, MMR
 

Andere mochten auch (11)

Problemas 2
Problemas 2Problemas 2
Problemas 2
 
20160415 Klima2100 Gjensidige forsikring
20160415 Klima2100 Gjensidige forsikring20160415 Klima2100 Gjensidige forsikring
20160415 Klima2100 Gjensidige forsikring
 
El huerto
El huertoEl huerto
El huerto
 
Interpretación de la relaciones metabólicas de los organismos
Interpretación de la relaciones metabólicas de los organismosInterpretación de la relaciones metabólicas de los organismos
Interpretación de la relaciones metabólicas de los organismos
 
Cuestionario #3
Cuestionario #3Cuestionario #3
Cuestionario #3
 
Preoperatorio
Preoperatorio Preoperatorio
Preoperatorio
 
Quirógrafo
Quirógrafo Quirógrafo
Quirógrafo
 
Zona blanca
Zona blanca Zona blanca
Zona blanca
 
Agua
AguaAgua
Agua
 
Inteligencia artificial y test de alan turing 1
Inteligencia artificial y test de alan turing 1Inteligencia artificial y test de alan turing 1
Inteligencia artificial y test de alan turing 1
 
Heartland Pergolas who we are infographic
Heartland Pergolas who we are infographicHeartland Pergolas who we are infographic
Heartland Pergolas who we are infographic
 

Ähnlich wie Big Data and Outlier Loci: A Cautionary Tale with Genome-Scale Phylogenetic Data

Graphs are Feeding the World
Graphs are Feeding the WorldGraphs are Feeding the World
Graphs are Feeding the WorldTim Williamson
 
Cost of domestication - Plant & Animal Genome Conference 2018
Cost of domestication - Plant & Animal Genome Conference 2018Cost of domestication - Plant & Animal Genome Conference 2018
Cost of domestication - Plant & Animal Genome Conference 2018PeterMorrell4
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...FOODCROPS
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian Aurisano
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian Aurisano
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian Aurisano
 
Computational approaches to study Genetics
Computational approaches to study GeneticsComputational approaches to study Genetics
Computational approaches to study GeneticsArithmer Inc.
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsKato Mivule
 
Jillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian Aurisano
 
Genetic Engineering and Biotechnology
Genetic Engineering and BiotechnologyGenetic Engineering and Biotechnology
Genetic Engineering and BiotechnologyStephen Taylor
 
American Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk UniversityAmerican Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk Universitymcdonadt
 
NEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsNEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsHeather Piwowar
 
Mining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsMining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsLeighton Pritchard
 
Genetic_Research_Lesson1_Slides_NWABR.ppt
Genetic_Research_Lesson1_Slides_NWABR.pptGenetic_Research_Lesson1_Slides_NWABR.ppt
Genetic_Research_Lesson1_Slides_NWABR.pptDESMONDEZIEKE1
 
Genomics and its application in forest health
Genomics and its application in forest healthGenomics and its application in forest health
Genomics and its application in forest healthAmanda Roe
 

Ähnlich wie Big Data and Outlier Loci: A Cautionary Tale with Genome-Scale Phylogenetic Data (20)

Graphs are Feeding the World
Graphs are Feeding the WorldGraphs are Feeding the World
Graphs are Feeding the World
 
Cost of domestication - Plant & Animal Genome Conference 2018
Cost of domestication - Plant & Animal Genome Conference 2018Cost of domestication - Plant & Animal Genome Conference 2018
Cost of domestication - Plant & Animal Genome Conference 2018
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
 
Computational approaches to study Genetics
Computational approaches to study GeneticsComputational approaches to study Genetics
Computational approaches to study Genetics
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Jillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideo
 
Genetic Engineering and Biotechnology
Genetic Engineering and BiotechnologyGenetic Engineering and Biotechnology
Genetic Engineering and Biotechnology
 
American Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk UniversityAmerican Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk University
 
NEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and LaggardsNEDCC 2010 Piwowar Leaders and Laggards
NEDCC 2010 Piwowar Leaders and Laggards
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Mining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsMining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for Effectors
 
Genetic_Research_Lesson1_Slides_NWABR.ppt
Genetic_Research_Lesson1_Slides_NWABR.pptGenetic_Research_Lesson1_Slides_NWABR.ppt
Genetic_Research_Lesson1_Slides_NWABR.ppt
 
Genomics and its application in forest health
Genomics and its application in forest healthGenomics and its application in forest health
Genomics and its application in forest health
 

Kürzlich hochgeladen

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 

Kürzlich hochgeladen (20)

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 

Big Data and Outlier Loci: A Cautionary Tale with Genome-Scale Phylogenetic Data

  • 1. Big data and outlier loci: A cautionary tale with genome-scale phylogenetic data Lyndon M. Coghill1,Vinson Doyle1, Van Wishingrad2,Robert C. Thomson2 & JeremyM. Brown1 1.0 1.0?
  • 2. Genome-scale Data Use Increasing for Phylogenetics 0 5000 10000 15000 20000 25000 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 PublishedGenomic-ScalePhylogenies Year Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 3. Large datasets are desirable but… • Process can be complicated. • Different data generation methods, produce different results. • How this process affects the quality of these datasets is poorly understood. Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home ? Lab Magic Pipeline.canned()
  • 4. An Example (Turtle Placement) Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 5. 1. Chiari et al. 2. Fong et al. 3. Wang et al. 4. Crawford et al. 5. Lu et al. 6. Shaffer et al. All supported archosaur sister placement Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home ?
  • 6. 1. Chiari et al. 2. Fong et al. 3. Wang et al. 4. Crawford et al. 5. Lu et al. 6. Shaffer et al. All supported archosaur sister placement Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home ?
  • 7. Bayes Factors as branch specific support • Alternative measure of support for topological relationships. • Ratio of marginal likelihoods between two hypotheses. 𝑩𝒂𝒚𝒆𝒔   𝑭 𝒂𝒄𝒕𝒐𝒓 =   𝑷 𝑫𝒂𝒕𝒂     𝑯 𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒔 𝟏) 𝑷 𝑫𝒂𝒕𝒂     𝑯 𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒔 𝟐) Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home H1:  Bi-­‐partition  is  present H2:  Bi-­‐partition  is  absent:
  • 8. • Calculated 2 marginal likelihoods to examine turtle placement. • 1: Constrained turtle placement to a single position in the tree. • 2. Considered all other hypothesized positions for turtles. Bayes Factors (Turtle Placement) Archosaur  Sister  PlacementAll  Other  Placements Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 9. Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home Bayes Factors Support for Turtle Placement ChiariCrawfordFong ShafferLuWang
  • 10. Bayes Factors Support for Turtle Placement Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home Low  number  of  genes  with  strong  support ChiariCrawfordFong ShafferLuWang
  • 11. Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home What genes support croc sister placement • Comparison of posterior probabilities to 2ln(BF) values for croc and turtle monophyly. • 248 genes from Chiari dataset.
  • 12. • Comparison of posterior probabilities to 2ln(BF) values for croc and turtle monophyly. • 248 genes from Chiari dataset. Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home What genes support croc sister placement
  • 13. Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home What genes support croc sister placement • Comparison of posterior probabilities to 2ln(BF) values for croc and turtle monophyly. • 248 genes from Chiari dataset.
  • 14. Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home What genes support croc sister placement • Comparison of posterior probabilities to 2ln(BF) values for croc sister placement. • 248 genes from Chiari dataset.
  • 15. • Examine most extreme outlier genes supporting croc sister placement. • ~ 1% of genes were outliers with strong support. • What is their effect on inference…? 15 /  1113  genes 2 /  248  genes Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home Testing the effect of outliers Wang  Dataset Chiari  Dataset
  • 16. All  Genes Top  1%  of  BF  outlier  genes  removed Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home 1.0 1.0 Effect of outlier genes on topology Brown et al. Sys. Bio. In Review.
  • 17. • Paralogy • Systematic Error What’s driving the outliers? A A B B Duplication  Event Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 18. Evidence of Paralogy • BLAST genes against closest genome. • Pull hits > 70% (~ 2 – 3) • Hits non-contiguous. • Concatenate hits. • Infer new tree.. + Original  Sequence Hit  1 Hit  2 Hit  3 Hit  Contig Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 19. Evidence of Paralogy Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 20. • Paralogy • Systematic Error • Model Fit Coming Attractions Systematic  Error Random  Error Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 21. Bayesian Posterior Prediction I. Drawing trees and parameters from posterior distribution Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home I
  • 22. II Bayesian Posterior Prediction I. Drawing trees and parameters from posterior distribution II. Use that data to simulate new data sets Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home I
  • 23. II III Bayesian Posterior Prediction I. Drawing trees and parameters from posterior distribution II. Use that data to simulate new data sets III.Summarize each dataset using a test statistic Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home I
  • 24. II III IV Bayesian Posterior Prediction I. Drawing trees and parameters from posterior distribution II. Use that data to simulate new data sets III.Summarize each dataset using a test statistic IV.Compare empirical test statistic value to simulated distribution Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home I
  • 25. Take Home • Support can be misleading when using genomic-scale data. • Standard support values hide a lot of variation in underlying data. • Some loci have outlying extreme support values. • Caution: • Outlier loci included in joint analyses can have huge influence. • Small differences in analytical choices can have huge influence on results. • Using Bayes Factors as a measure of support can help identify some of this hidden variation. Background Identifying  Outlier  Genes What’s  driving  outliers Take  Home
  • 26. Acknowledgements Brown Lab Guifang Zhou Genevieve Mount David Morris DEB-1355071 DEB-1354506 DBI-1356796