SlideShare ist ein Scribd-Unternehmen logo
1 von 23
PREDICTING THE CLINICAL IMPACT OF HUMAN MUTATION
WITH DEEP NEURAL NETWORKS
JOURNAL CLUB PRESENTATION BY:
BRIAN M. SCHILDER, BIOINFORMATICIAN
RAJ LAB, DEPARTMENT OF NEUROSCIENCE
ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
01/15/2019
Sundaram et al. 2018
AUTHORS
• 1. Illumina Artificial Intelligence Laboratory, Illumina Inc, San Diego, CA, USA.
• 2. Department of Computer Science, Stanford University, Stanford, CA, USA.
• 3. National Science Foundation Center for Big Learning, University of Florida,
Gainesville, FL, USA.
• 4. Analytic and Translational Genetics Unit (ATGU), Department of Medicine,
Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
• 5. Toyota Technological Institute at Chicago, Chicago, IL, USA.
Illumina-ti???
BACKGROUND
CHALLENGES IN GENETICS
• Difficult to predict the effect of rare variants
• Subtle effects and/or undocumented
• Poses a major obstacle in both population-wide
and personalized medicine
• Limited by the lack of an adequate dataset
• Too small
• Limited diversity
• Don’t span whole genome
• Literature & expert curator biases
• ~50% ClinVar variants from 200 genes
• Supervised ML can learn these biases
Assimes & Roberts (2016)
HUMAN DIVERSITY BOTTLENECK
• Relatively little time and extremely long generation times left our population with very
little genetic diversity
• Chimpanzees and other non-human primates have far more genetic diversity than
humans (2X as many SNVs) despite a much smaller population size today
https://www.quora.com/What-would-happen-if-the-
Supervolcano-Toba-erupted-tomorrow
• Sumatra, ~75 KYA
• Decade-long winter
(all the way to
Vermont), 1000 years
of lower temps
• Human population
bottleneck attributed
to this event (only 2-
10k individuals)
• Though see Yost
et al. (2018)
http://wallace.genetics.uga.edu/
https://en.wikipedia.org/wiki/Population_bottleneck
Toba supervolcano cloud radius
Bottleneck Event
Large fraction of
human ancestral
diversity was lost
HYPOTHESIS
• Leveraging the genomic diversity of closely related non-human primates (NHPs) can enhance prediction
accuracy of pathogenicity in human variants
PART I: ASSESS
Common variants in other primates are largely benign in human
OBJECTIVE
Assess whether the frequency of NHP variants can serve as a reasonable
proxy for pathogenicity of those equivalent variants in humans.
Singletons: The rarest of rare variants, variants seen exactly one time.
Balancing selection: A class of selective regimes that maintain polymorphism above what is expected under neutrality.
Identical-by-state: Two different alleles derived from different evolutionary histories with the same effect (i.e. resulting in the same amino acid change)
METHODS
• Databases:
• Humans (123,000+ individuals 85k variants)
• Exome Aggregation Consortium (ExAC)
• Genome Aggregation Database (gnomAD)
• ClinVar
• NHPs (124 individuals, 300K variants)
• Great Ape Genome Project
• Single Nucleotide Polymorphism Database (dbSNP)
• Included only orthologous variants (identical-by-state)
• Each primate species contributes more variants than all of
ClinVar (~42K)
• Key Comparisons:
1. Human vs. chimpanzee (~6 MY divergence)
2. Human vs. 6 primate species (≤ ~35 MY divergence )
3. Human vs. mouse, pig, goat, cow, chicken & zebrafish
(≤ ~450 MY divergence)
Six Non-human Primate Species
arkive.org
oregonzoo.org
minizoo.cz
nationalgeographic.com
Pan troglodytes (n=24+35)
Gorilla gorilla (n=27)
Macaca mulatta (n=16) Callithrix jacchus (n=9)
Pongo abelii (n=10)
Pan paniscus (n=13)
GreatApes Old World MonkeyGreatapes
New World Monkey
RESULTS
• 27% of missense mutations that are
benign in distant species are actually
deleterious in humans
• This figure is only 9% if you use NHPs
• NHPs offer the benefit of a more diverse
sample while still being very relevant to
humans
Fig 2.
Fig 1.
Small but diverse NHP sample
= large benefits
MSR: missense/synonymous ratio
PART II: PREDICT
A deep learning network for variant pathogenicity classification
OBJECTIVE
Create a more accurate predictive model of human variant pathogenicity
using NHP and human variants + deep learning
METHODS
• Training Dataset of Common/Benign Variants:
• 300K NHP variants
• 84K human variants
• PrimateAI
• A novel deep learning-based predictive model
• 1D Convolutional Neural Network (CNN)
• Automatic feature extraction
• Prediction function:
• How likely is a mutation to be a
common/benign vs. rare/pathogenic variant?
Input –
[Multi-alignment (51 AAs x 99 vertebrates),
secondary structure,
solvent accessibility]
Hidden Layers –
[hierarchical features]
Output –
[0-1 pathogenicity score]
Separate CNN models to predict:
• Secondary structure (SPIDER2)
• Heffernan et al., 2016
• Helix, beta sheet, or coil
• Solvent accessibility (DeepCNF)
• Wang et al. 2016
• Buried, intermediate, or
exposed
SPIDER2 DeepCNF
RESULTS
Primate AI example output along
all SCN2A AA positions
Primate AI outperforms
existing tools on 10K withheld
common primate variants
PrimateAI can distinguish
between DDD cases vs. sibling
controls
c
Primate AI outperforms
existing tools on DDD
cases vs. sibling controls
c
DDD:
Deciphering Developmental Disorders cohort
with 4,293 cases and 2,517 sibling controls.
• PrimateAI had a 91%
accuracy score (next best
model: only 80%)
RESULTS II
• Needed to demonstrate that
PrimateAI not just scoring based
genes with higher rates of de novo
mutation
• Repeated with only de novo missense
variants within 605 disease genes
• However, greater diversity of disease
genes is needed before generalizing
all mendelian disorders
PrimateAI AUC almost at max!
RESULTS III
• Actually compared 20 other tools (Supp Fig. 9)
• PrimateAI outperformed them all in all tests sets
PART III: IDENTIFY
Novel candidate gene discovery
NOVEL CANDIDATE GENE DISCOVERY
• Increases enrichment of de novo missense mutations in DDD patients from 1.5-fold to 2.2-fold
• Identified 14 additional candidate genes in intellectual disability
PART IV: COMPARE
Comparison with human expert curation
COMPARE WITH HUMAN EXPERT CURATION
• Curators tend to:
• Overly rely on straightforward metrics like Grantham score
• Underutilize secondary structure and solvent accessibility
Table 2. Comparison of the difference in Grantham score, protein surface-exposure, and amino acid sequence conservation
between human expert annotated variants in ClinVar and de novo variants in DDD cases versus controls.
CONCLUSIONS
• Adding even a few primate species disproportionately improves pathogenicity prediction of human
variants
• “134 individuals from six non-human primate species examined in this study contribute nearly four times as
many common missense variants as the 123,136 humans from the ExAC study”
• Training PrimateAI on more distant species decreases performance
FUTURE DIRECTIONS
• More NHP species, more samples (high return for even a few)
• 27 NHP species now on UCSC Genome Browser
• Non-coding variants in conserved regions
REFERENCES
1. Sundaram, L., Gao, H., Padigepati, S. R., McRae, J. F., Li, Y., Kosmicki, J. A., … Farh, K. K.-H. (2018). Predicting the clinical impact of human mutation with deep
neural networks. Nature Genetics, 50(8), 1161–1170. https://doi.org/10.1038/s41588-018-0167-z
2. https://www.smithsonianmag.com/smart-news/ancient-humans-weathered-toba-supervolcano-just-fine-180968479/
3. Yost, C. L., Jackson, L. J., Stone, J. R., & Cohen, A. S. (2018). Subdecadal phytolith and charcoal records from Lake Malawi, East Africa imply minimal effects
on human evolution from the ∼74 ka Toba supereruption. Journal of Human Evolution, 116, 75–94. https://doi.org/10.1016/j.jhevol.2017.11.005
4. Assimes, T. L., & Roberts, R. (2016). Genetics: Implications for Prevention and Management of Coronary Artery Disease. Journal of the American College of
Cardiology, 68(25), 2797–2818. https://doi.org/10.1016/j.jacc.2016.10.039
5. Landrum, M. J., Lee, J. M., Benson, M., Brown, G., Chao, C., Chitipiralla, S., … Maglott, D. R. (2016). ClinVar: Public archive of interpretations of clinically
relevant variants. Nucleic Acids Research, 44(D1), D862–D868. https://doi.org/10.1093/nar/gkv1222

Weitere ähnliche Inhalte

Was ist angesagt?

20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
Computer Science Club
 
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
CSCJournals
 
The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...
The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...
The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...
Ronak Shah
 

Was ist angesagt? (20)

NetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver HartNetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver Hart
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
Comparison of LUMPY vs. DELLY for structural variant detection
Comparison of LUMPY vs. DELLY for structural variant detectionComparison of LUMPY vs. DELLY for structural variant detection
Comparison of LUMPY vs. DELLY for structural variant detection
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
 
Detecting clinically actionable somatic structural aberrations from targeted ...
Detecting clinically actionable somatic structural aberrations from targeted ...Detecting clinically actionable somatic structural aberrations from targeted ...
Detecting clinically actionable somatic structural aberrations from targeted ...
 
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic MethodsAnalytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
 
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
 
MAGIC :Multiparent advanced generation intercross and QTL discovery
MAGIC :Multiparent advanced generation intercross and  QTL discovery MAGIC :Multiparent advanced generation intercross and  QTL discovery
MAGIC :Multiparent advanced generation intercross and QTL discovery
 
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
RapportHicham
RapportHichamRapportHicham
RapportHicham
 
Analysis of genome-wide association studies uncovers genetic loci shared betw...
Analysis of genome-wide association studies uncovers genetic loci shared betw...Analysis of genome-wide association studies uncovers genetic loci shared betw...
Analysis of genome-wide association studies uncovers genetic loci shared betw...
 
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
 
Candidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop ImprovementCandidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop Improvement
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Gene Editing - Challenges and Future of CRISPR in Clinical Development
Gene Editing - Challenges and Future of CRISPR in Clinical DevelopmentGene Editing - Challenges and Future of CRISPR in Clinical Development
Gene Editing - Challenges and Future of CRISPR in Clinical Development
 
The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...
The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...
The IMPACT of INDEL realignment: Detecting insertions and deletions longer th...
 
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
 

Ähnlich wie Sundaram et al. 2018 Presentation

The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 

Ähnlich wie Sundaram et al. 2018 Presentation (20)

ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 
PadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptxPadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptx
 
Axt microarrays
Axt microarraysAxt microarrays
Axt microarrays
 
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ..."Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
MLGG_for_linkedIn
MLGG_for_linkedInMLGG_for_linkedIn
MLGG_for_linkedIn
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Genomics experimental-methods
Genomics experimental-methodsGenomics experimental-methods
Genomics experimental-methods
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
 
Application of adverse outcome pathways in chemical risk assessment, Dan Vill...
Application of adverse outcome pathways in chemical risk assessment, Dan Vill...Application of adverse outcome pathways in chemical risk assessment, Dan Vill...
Application of adverse outcome pathways in chemical risk assessment, Dan Vill...
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
2014 07 ismb personalized medicine
2014 07 ismb personalized medicine2014 07 ismb personalized medicine
2014 07 ismb personalized medicine
 
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Open-Source Bioinformatics for Data Scientists with Amanda SchierzOpen-Source Bioinformatics for Data Scientists with Amanda Schierz
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
 
Schierz ODSC Meetup pdf
Schierz ODSC Meetup pdfSchierz ODSC Meetup pdf
Schierz ODSC Meetup pdf
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdf
 

Kürzlich hochgeladen

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Kürzlich hochgeladen (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Sundaram et al. 2018 Presentation

  • 1. PREDICTING THE CLINICAL IMPACT OF HUMAN MUTATION WITH DEEP NEURAL NETWORKS JOURNAL CLUB PRESENTATION BY: BRIAN M. SCHILDER, BIOINFORMATICIAN RAJ LAB, DEPARTMENT OF NEUROSCIENCE ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI 01/15/2019 Sundaram et al. 2018
  • 2. AUTHORS • 1. Illumina Artificial Intelligence Laboratory, Illumina Inc, San Diego, CA, USA. • 2. Department of Computer Science, Stanford University, Stanford, CA, USA. • 3. National Science Foundation Center for Big Learning, University of Florida, Gainesville, FL, USA. • 4. Analytic and Translational Genetics Unit (ATGU), Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. • 5. Toyota Technological Institute at Chicago, Chicago, IL, USA. Illumina-ti???
  • 4. CHALLENGES IN GENETICS • Difficult to predict the effect of rare variants • Subtle effects and/or undocumented • Poses a major obstacle in both population-wide and personalized medicine • Limited by the lack of an adequate dataset • Too small • Limited diversity • Don’t span whole genome • Literature & expert curator biases • ~50% ClinVar variants from 200 genes • Supervised ML can learn these biases Assimes & Roberts (2016)
  • 5. HUMAN DIVERSITY BOTTLENECK • Relatively little time and extremely long generation times left our population with very little genetic diversity • Chimpanzees and other non-human primates have far more genetic diversity than humans (2X as many SNVs) despite a much smaller population size today https://www.quora.com/What-would-happen-if-the- Supervolcano-Toba-erupted-tomorrow • Sumatra, ~75 KYA • Decade-long winter (all the way to Vermont), 1000 years of lower temps • Human population bottleneck attributed to this event (only 2- 10k individuals) • Though see Yost et al. (2018) http://wallace.genetics.uga.edu/ https://en.wikipedia.org/wiki/Population_bottleneck Toba supervolcano cloud radius Bottleneck Event Large fraction of human ancestral diversity was lost
  • 6. HYPOTHESIS • Leveraging the genomic diversity of closely related non-human primates (NHPs) can enhance prediction accuracy of pathogenicity in human variants
  • 7. PART I: ASSESS Common variants in other primates are largely benign in human
  • 8. OBJECTIVE Assess whether the frequency of NHP variants can serve as a reasonable proxy for pathogenicity of those equivalent variants in humans. Singletons: The rarest of rare variants, variants seen exactly one time. Balancing selection: A class of selective regimes that maintain polymorphism above what is expected under neutrality. Identical-by-state: Two different alleles derived from different evolutionary histories with the same effect (i.e. resulting in the same amino acid change)
  • 9. METHODS • Databases: • Humans (123,000+ individuals 85k variants) • Exome Aggregation Consortium (ExAC) • Genome Aggregation Database (gnomAD) • ClinVar • NHPs (124 individuals, 300K variants) • Great Ape Genome Project • Single Nucleotide Polymorphism Database (dbSNP) • Included only orthologous variants (identical-by-state) • Each primate species contributes more variants than all of ClinVar (~42K) • Key Comparisons: 1. Human vs. chimpanzee (~6 MY divergence) 2. Human vs. 6 primate species (≤ ~35 MY divergence ) 3. Human vs. mouse, pig, goat, cow, chicken & zebrafish (≤ ~450 MY divergence) Six Non-human Primate Species arkive.org oregonzoo.org minizoo.cz nationalgeographic.com Pan troglodytes (n=24+35) Gorilla gorilla (n=27) Macaca mulatta (n=16) Callithrix jacchus (n=9) Pongo abelii (n=10) Pan paniscus (n=13) GreatApes Old World MonkeyGreatapes New World Monkey
  • 10. RESULTS • 27% of missense mutations that are benign in distant species are actually deleterious in humans • This figure is only 9% if you use NHPs • NHPs offer the benefit of a more diverse sample while still being very relevant to humans Fig 2. Fig 1. Small but diverse NHP sample = large benefits MSR: missense/synonymous ratio
  • 11. PART II: PREDICT A deep learning network for variant pathogenicity classification
  • 12. OBJECTIVE Create a more accurate predictive model of human variant pathogenicity using NHP and human variants + deep learning
  • 13. METHODS • Training Dataset of Common/Benign Variants: • 300K NHP variants • 84K human variants • PrimateAI • A novel deep learning-based predictive model • 1D Convolutional Neural Network (CNN) • Automatic feature extraction • Prediction function: • How likely is a mutation to be a common/benign vs. rare/pathogenic variant? Input – [Multi-alignment (51 AAs x 99 vertebrates), secondary structure, solvent accessibility] Hidden Layers – [hierarchical features] Output – [0-1 pathogenicity score] Separate CNN models to predict: • Secondary structure (SPIDER2) • Heffernan et al., 2016 • Helix, beta sheet, or coil • Solvent accessibility (DeepCNF) • Wang et al. 2016 • Buried, intermediate, or exposed SPIDER2 DeepCNF
  • 14. RESULTS Primate AI example output along all SCN2A AA positions Primate AI outperforms existing tools on 10K withheld common primate variants PrimateAI can distinguish between DDD cases vs. sibling controls c Primate AI outperforms existing tools on DDD cases vs. sibling controls c DDD: Deciphering Developmental Disorders cohort with 4,293 cases and 2,517 sibling controls. • PrimateAI had a 91% accuracy score (next best model: only 80%)
  • 15. RESULTS II • Needed to demonstrate that PrimateAI not just scoring based genes with higher rates of de novo mutation • Repeated with only de novo missense variants within 605 disease genes • However, greater diversity of disease genes is needed before generalizing all mendelian disorders PrimateAI AUC almost at max!
  • 16. RESULTS III • Actually compared 20 other tools (Supp Fig. 9) • PrimateAI outperformed them all in all tests sets
  • 17. PART III: IDENTIFY Novel candidate gene discovery
  • 18. NOVEL CANDIDATE GENE DISCOVERY • Increases enrichment of de novo missense mutations in DDD patients from 1.5-fold to 2.2-fold • Identified 14 additional candidate genes in intellectual disability
  • 19. PART IV: COMPARE Comparison with human expert curation
  • 20. COMPARE WITH HUMAN EXPERT CURATION • Curators tend to: • Overly rely on straightforward metrics like Grantham score • Underutilize secondary structure and solvent accessibility Table 2. Comparison of the difference in Grantham score, protein surface-exposure, and amino acid sequence conservation between human expert annotated variants in ClinVar and de novo variants in DDD cases versus controls.
  • 21. CONCLUSIONS • Adding even a few primate species disproportionately improves pathogenicity prediction of human variants • “134 individuals from six non-human primate species examined in this study contribute nearly four times as many common missense variants as the 123,136 humans from the ExAC study” • Training PrimateAI on more distant species decreases performance
  • 22. FUTURE DIRECTIONS • More NHP species, more samples (high return for even a few) • 27 NHP species now on UCSC Genome Browser • Non-coding variants in conserved regions
  • 23. REFERENCES 1. Sundaram, L., Gao, H., Padigepati, S. R., McRae, J. F., Li, Y., Kosmicki, J. A., … Farh, K. K.-H. (2018). Predicting the clinical impact of human mutation with deep neural networks. Nature Genetics, 50(8), 1161–1170. https://doi.org/10.1038/s41588-018-0167-z 2. https://www.smithsonianmag.com/smart-news/ancient-humans-weathered-toba-supervolcano-just-fine-180968479/ 3. Yost, C. L., Jackson, L. J., Stone, J. R., & Cohen, A. S. (2018). Subdecadal phytolith and charcoal records from Lake Malawi, East Africa imply minimal effects on human evolution from the ∼74 ka Toba supereruption. Journal of Human Evolution, 116, 75–94. https://doi.org/10.1016/j.jhevol.2017.11.005 4. Assimes, T. L., & Roberts, R. (2016). Genetics: Implications for Prevention and Management of Coronary Artery Disease. Journal of the American College of Cardiology, 68(25), 2797–2818. https://doi.org/10.1016/j.jacc.2016.10.039 5. Landrum, M. J., Lee, J. M., Benson, M., Brown, G., Chao, C., Chitipiralla, S., … Maglott, D. R. (2016). ClinVar: Public archive of interpretations of clinically relevant variants. Nucleic Acids Research, 44(D1), D862–D868. https://doi.org/10.1093/nar/gkv1222

Hinweis der Redaktion

  1. Common chimp variants defined as occurring ≥2 times in a cohort of 24 99.8% of variants have been under purifying selection Excluded major histocompatibility complex (MHC) regions
  2. Accounted for other factors include mutation rate, technical artifacts such as sequencing coverage, and factors impacting neutral genetic drift such as gene conversion. As the number of unlabeled variants greatly exceeds the size of the labeled benign training dataset, we trained eight networks in parallel, each using a different set of unlabeled variants matched to the benign training dataset, to obtain a consensus prediction. Three 51-length position frequency matrices are generated from multiple sequence alignments of 99 vertebrates, including one for 11 primates, one for 50 mammals excluding primates, and one for 38 vertebrates excluding primates and mammals. SPIDER2: Structural Property prediction with Integrated DEep neuRal network DeepCNF: Deep Learning extension of Conditional Neural Fields (CNF)
  3. Given that the DDD population largely consists of index cases of affected children without affected first degree relatives, it is essential to show that the classifier has not inflated its accuracy by favoring pathogenicity in genes with de novo dominant modes of inheritance. We restricted the analysis to 605 genes that were nominally significant for disease association in the DDD study, calculated from protein-truncating variation only 
  4. Simulations with ExAC show that discovery of common human variants (>0.1% allele frequency) plateaus quickly after only a few hundred individuals