SlideShare ist ein Scribd-Unternehmen logo
1 von 21
SFselect-E
Ensemble classification techniques for
detecting signatures of natural selection from
site frequency spectrum
Andrew Stewart
JHU
Spring 2014
Introduction
● Searching for signatures of selection
● SFselect (Ronen, 2013)
● Multi-K (Whiteman, 2010)
● Introducing: SFselect-E
Contents
1) The selection classification problem
2) Overview of SVM classification with
SFselect
3) Ensemble preprocessing with Multi-*
4) Generating model variance
5) Introducing SFselect-E
6) Experimental Results
7) Conclusion
Natural selection
● Population genetics
● Evolution: Descent with modification
● Selection
o Directional
 Positive
 Negative
o Neutral
Classifying natural selection
● Record of demographic history
● Increased LD, reduced variation
● Site frequency spectrum
o ie, Tajima’s D
Background: SFSelect (Ronen, 2013)
● Scaled Site Frequency Spectrum
● Linear kernel Support Vector Machines
● Trained on extensive population simulations
o SFselect, SFselect-s, SFselect-XP
Background: Multi-K Clustering
● Bootstrap aggregation
o Random sampling
o Aggregation method
o Highly accurate, but computationally expensive
● Multi-K
o Iterative K-means clustering
o Classify new points based off centroid proximity
o Optimize Kend with cross validation
● Multi-KX, Multi-SVD
Generating ensemble diversity
● Generating ensemble diversity
o Generalizers
o Specializers
● Applied to SFS classification:
o Improve overall classification accuracy?
o Produce classifiers robust to wide variations in
genetic diversity
SFselect-E
● SFselect General SVM
● SFselect-E: Bagging approach
● SFselect-E: Multi-K approach
Population simulations
● 1000 individuals
● s = [0.005, 0.01, 0.02, 0.04, 0.08]
● t = [0, 50, 150, 200, …, 3500, 4000]
● n = 500
● labels = [-1, 1] (neutral, selected)
Training the standard model
● Compute allele frequencies
● Scale, normalize, bin into vectors
● Trained linear kernel SVM on entire dataset
Computational limits
● Very time intensive
o Population simulations
o Vectorization of SFS
o Training SVMs on SFS
● Simulations grouped/indexed by replicate
o Proved a major limitation on ensemble sampling
SFselect-E: Bagging approach
● Random sampling
o k = 100, n = 200
● Aggregation
o Majority voting
● Validation
o Cross validation
SFselect: Multi-K approach
Iterative K-means clustering of D
Kstart = 2 : Kend = 8
Train on each K
Cross validation to determine optimal Kend
Experimental analysis: K-fold C.V.
How to cross validate an ensemble???
For each K, hold out Ki, train on D-Ki
Test classifier on Ki
Report mean accuracy (# correct
classifications)
Experimental analysis: C.V. Results
Model
Accuracy
Standard SFselect SVM: 74.28
Bagged SFselect-E SVM: 73.86
Multi-K SFselect-E SVM: NA
Experimental analysis: Time series
● For t = [0, 4000], test Dt
o Neutral vs Selected
o Dependent T-Test on time sample accuracies
p-value of 2.0136 X 10-24
Conclusions
● SFselect-E consistent with SFselect
o No separation of specialized classifiers
o Smaller subsets?
● Limitations of structure of training data as
implemented in SFselect
● Model variance best obtained by separating
by s, t.
Conclusions
● Computing time for training a major obstacle
● Multi-SVD preprocessing could reduce
training time
● Refactoring required first
Future work
Refactor to treat populations independently
Bagging: random sampling across s, t
Multi-K: hierarchical clustering of training data
Multi-KX, Multi-SVD
SFselect-s as component models
Future work
Cross population: SFselect-XP, XP-SFS
Cross species: SFS + conserved regions
XS-SFS
Tune ensemble diversity to population genetic
diversity

Weitere ähnliche Inhalte

Andere mochten auch

Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Feng Zhang
 
Design of robust classifiers for adversarial environments - Systems, Man, and...
Design of robust classifiers for adversarial environments - Systems, Man, and...Design of robust classifiers for adversarial environments - Systems, Man, and...
Design of robust classifiers for adversarial environments - Systems, Man, and...
Pluribus One
 
Ensemble Learning: The Wisdom of Crowds (of Machines)
Ensemble Learning: The Wisdom of Crowds (of Machines)Ensemble Learning: The Wisdom of Crowds (of Machines)
Ensemble Learning: The Wisdom of Crowds (of Machines)
Lior Rokach
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
butest
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
Alejandro Correa Bahnsen, PhD
 
APAMPA MASTER THESIS PRESENTATION
APAMPA MASTER THESIS PRESENTATIONAPAMPA MASTER THESIS PRESENTATION
APAMPA MASTER THESIS PRESENTATION
Olatunji Apampa
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
Saeed Iqbal
 

Andere mochten auch (17)

An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
 
Classifying toward an Ensemble of Works: an essay on the centrality of classi...
Classifying toward an Ensemble of Works: an essay on the centrality of classi...Classifying toward an Ensemble of Works: an essay on the centrality of classi...
Classifying toward an Ensemble of Works: an essay on the centrality of classi...
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
 
Design of robust classifiers for adversarial environments - Systems, Man, and...
Design of robust classifiers for adversarial environments - Systems, Man, and...Design of robust classifiers for adversarial environments - Systems, Man, and...
Design of robust classifiers for adversarial environments - Systems, Man, and...
 
Ensemble Learning: The Wisdom of Crowds (of Machines)
Ensemble Learning: The Wisdom of Crowds (of Machines)Ensemble Learning: The Wisdom of Crowds (of Machines)
Ensemble Learning: The Wisdom of Crowds (of Machines)
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
APAMPA MASTER THESIS PRESENTATION
APAMPA MASTER THESIS PRESENTATIONAPAMPA MASTER THESIS PRESENTATION
APAMPA MASTER THESIS PRESENTATION
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Thesis ppt
Thesis pptThesis ppt
Thesis ppt
 
Types of Research Designs RS Mehta
Types of Research Designs RS MehtaTypes of Research Designs RS Mehta
Types of Research Designs RS Mehta
 
Casting and its types
Casting and its typesCasting and its types
Casting and its types
 
PowerPoint Hacks for Rookies: 4 Must Consider Aspects
PowerPoint Hacks for Rookies: 4 Must Consider AspectsPowerPoint Hacks for Rookies: 4 Must Consider Aspects
PowerPoint Hacks for Rookies: 4 Must Consider Aspects
 

Ähnlich wie Ensemble classification techniques for detecting signatures of natural selection from site frequency spectrum

Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
James Clause
 
Improving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithmImproving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithm
Kasun Ranga Wijeweera
 
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic ApproachProsodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Christophe Veaux
 
IGARSS_2011.pptx
IGARSS_2011.pptxIGARSS_2011.pptx
IGARSS_2011.pptx
grssieee
 

Ähnlich wie Ensemble classification techniques for detecting signatures of natural selection from site frequency spectrum (20)

Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
BRM Unit 2 Sampling.ppt
BRM Unit 2 Sampling.pptBRM Unit 2 Sampling.ppt
BRM Unit 2 Sampling.ppt
 
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
 
Improving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithmImproving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithm
 
Dog Breed Classification Using Part Localization
Dog Breed Classification Using Part LocalizationDog Breed Classification Using Part Localization
Dog Breed Classification Using Part Localization
 
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic ApproachProsodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
 
Sampling research method
Sampling research methodSampling research method
Sampling research method
 
Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...Divergence optimization in nonnegative matrix factorization with spectrogram ...
Divergence optimization in nonnegative matrix factorization with spectrogram ...
 
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptxSPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
SPECFORMER: SPECTRAL GRAPH NEURAL NETWORKS MEET TRANSFORMERS.pptx
 
Vector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionVector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture Recognition
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-Attention
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
IGARSS_2011.pptx
IGARSS_2011.pptxIGARSS_2011.pptx
IGARSS_2011.pptx
 
A multivariate approach for process variograms
A multivariate approach for process variogramsA multivariate approach for process variograms
A multivariate approach for process variograms
 
Splice site recognition among different organisms
Splice site recognition among different organismsSplice site recognition among different organisms
Splice site recognition among different organisms
 
Sampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying FrameworkSampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying Framework
 
is2015_poster
is2015_posteris2015_poster
is2015_poster
 
3 SAMPLING LATEST.pptx
3 SAMPLING LATEST.pptx3 SAMPLING LATEST.pptx
3 SAMPLING LATEST.pptx
 
Nighthawk: A Two-Level Genetic-Random Unit Test Data Generator
Nighthawk: A Two-Level Genetic-Random Unit Test Data GeneratorNighthawk: A Two-Level Genetic-Random Unit Test Data Generator
Nighthawk: A Two-Level Genetic-Random Unit Test Data Generator
 
Hybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invitedHybrid NMF APSIPA2014 invited
Hybrid NMF APSIPA2014 invited
 

Kürzlich hochgeladen

怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 

Kürzlich hochgeladen (20)

Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 

Ensemble classification techniques for detecting signatures of natural selection from site frequency spectrum

  • 1. SFselect-E Ensemble classification techniques for detecting signatures of natural selection from site frequency spectrum Andrew Stewart JHU Spring 2014
  • 2. Introduction ● Searching for signatures of selection ● SFselect (Ronen, 2013) ● Multi-K (Whiteman, 2010) ● Introducing: SFselect-E
  • 3. Contents 1) The selection classification problem 2) Overview of SVM classification with SFselect 3) Ensemble preprocessing with Multi-* 4) Generating model variance 5) Introducing SFselect-E 6) Experimental Results 7) Conclusion
  • 4. Natural selection ● Population genetics ● Evolution: Descent with modification ● Selection o Directional  Positive  Negative o Neutral
  • 5. Classifying natural selection ● Record of demographic history ● Increased LD, reduced variation ● Site frequency spectrum o ie, Tajima’s D
  • 6. Background: SFSelect (Ronen, 2013) ● Scaled Site Frequency Spectrum ● Linear kernel Support Vector Machines ● Trained on extensive population simulations o SFselect, SFselect-s, SFselect-XP
  • 7. Background: Multi-K Clustering ● Bootstrap aggregation o Random sampling o Aggregation method o Highly accurate, but computationally expensive ● Multi-K o Iterative K-means clustering o Classify new points based off centroid proximity o Optimize Kend with cross validation ● Multi-KX, Multi-SVD
  • 8. Generating ensemble diversity ● Generating ensemble diversity o Generalizers o Specializers ● Applied to SFS classification: o Improve overall classification accuracy? o Produce classifiers robust to wide variations in genetic diversity
  • 9. SFselect-E ● SFselect General SVM ● SFselect-E: Bagging approach ● SFselect-E: Multi-K approach
  • 10. Population simulations ● 1000 individuals ● s = [0.005, 0.01, 0.02, 0.04, 0.08] ● t = [0, 50, 150, 200, …, 3500, 4000] ● n = 500 ● labels = [-1, 1] (neutral, selected)
  • 11. Training the standard model ● Compute allele frequencies ● Scale, normalize, bin into vectors ● Trained linear kernel SVM on entire dataset
  • 12. Computational limits ● Very time intensive o Population simulations o Vectorization of SFS o Training SVMs on SFS ● Simulations grouped/indexed by replicate o Proved a major limitation on ensemble sampling
  • 13. SFselect-E: Bagging approach ● Random sampling o k = 100, n = 200 ● Aggregation o Majority voting ● Validation o Cross validation
  • 14. SFselect: Multi-K approach Iterative K-means clustering of D Kstart = 2 : Kend = 8 Train on each K Cross validation to determine optimal Kend
  • 15. Experimental analysis: K-fold C.V. How to cross validate an ensemble??? For each K, hold out Ki, train on D-Ki Test classifier on Ki Report mean accuracy (# correct classifications)
  • 16. Experimental analysis: C.V. Results Model Accuracy Standard SFselect SVM: 74.28 Bagged SFselect-E SVM: 73.86 Multi-K SFselect-E SVM: NA
  • 17. Experimental analysis: Time series ● For t = [0, 4000], test Dt o Neutral vs Selected o Dependent T-Test on time sample accuracies p-value of 2.0136 X 10-24
  • 18. Conclusions ● SFselect-E consistent with SFselect o No separation of specialized classifiers o Smaller subsets? ● Limitations of structure of training data as implemented in SFselect ● Model variance best obtained by separating by s, t.
  • 19. Conclusions ● Computing time for training a major obstacle ● Multi-SVD preprocessing could reduce training time ● Refactoring required first
  • 20. Future work Refactor to treat populations independently Bagging: random sampling across s, t Multi-K: hierarchical clustering of training data Multi-KX, Multi-SVD SFselect-s as component models
  • 21. Future work Cross population: SFselect-XP, XP-SFS Cross species: SFS + conserved regions XS-SFS Tune ensemble diversity to population genetic diversity