SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Lecture 6:
Ensemble Methods
October 2013
Machine Learning for Language Technology
Marina Santini, Uppsala University
Department of Linguistics and Philology
Where we are…
Lecture 6: Ensemble Methods2
 Previous lectures, various different learning
methods:
 Decision trees
 Nearest neighbors
 Linear classifiers
 Structured Prediction
 This lecture:
 How to combine classifiers
Combining Multiple Learners
Lecture 6: Ensemble Methods3
Thanks to E. Alpaydin and Oscar Täckström
Wisdom of the Crowd
Lecture 6: Ensemble Methods4
 Guess the weight of an ox
 Average of people's votes close to true weight
 Better than most individual members' votes and
cattle experts' votes
 Intuitively, the law of large numbers…
Definition
Lecture 6: Ensemble Methods5
 An ensemble of classifiers is a set of classifiers
whose individual decisions are combined in some
way to classify new examples (Dietterich, 2000)
Diversity vs accuracy
Lecture 6: Ensemble Methods6
 An ensemble of classifiers must be more accurate
than any of its individual members.
 The indivudual classifiers composing an ensemble
must be accurate and diverse:
 An accurate classifier is one that has an error rate better
than random when guessing new examples
 Two classifiers are diverse if they make different errors on
new data points.
Why it can be a good idea to build an ensemble
Lecture 6: Ensemble Methods7
 It is possible to build good ensembles for three fundamental reasons.
(Dietterich , 2000):
1. Statistical: if little data
2. Computational: enough data, but local optima produced by local search
3. Representational: when the true function f cannot be represeted by any of the
hypothesis in H (weighted sums of hypotheses drawn from H might expand the
space
Distinctions
Lecture 6: Ensemble Methods8
 Base learner
 Arbitrary learning algorithm which could be used on its
own
 Ensemble
 A learning algorithm composed of a set of base learners.
The base learners may be organized in some structure
 However, not completely clear cut
 E.g. a linear classifier is a combination of multiple simple
learners, in the sense that each dimension is in a simple
predictor…
The main purpose of an ensemble:
maximising individual accuracy and diversity
Lecture 6: Ensemble Methods9
 Different learners use different
 Algorithms
 Hyperparameters
 Representations /Modalities/Views
 Training sets
 Subproblems
Practical Example
Lecture 6: Ensemble Methods10
Rationale
Lecture 6: Ensemble Methods11
 No Free Lunch Theorem: There is no algorithm
that is always the most accurate in all
situations.
 Generate a group of base-learners which when
combined has higher accuracy.
Methods for Constructing Ensembles
Lecture 6: Ensemble Methods12
Approaches…
Lecture 6: Ensemble Methods13
 How do we generate base-learners that complement each
other?
 How do we combine the outputs of base learner for maximum
accuracy?
 Examples:
Voting
Boostrap Resampling
Bagging
Boosting
AdaBoost
Stacking
Cascading
Voting
 Linear combination
Lecture 6: Ensemble Methods
14
Fixed Combination Rules
Lecture 6: Ensemble Methods15
Boostrap Resampling
Lecture 6: Ensemble Methods16
 Daume’ (2012): 150
Bagging (bootstrap+aggregating)
Lecture 6: Ensemble Methods17
 Use bootstrapping to generate L training sets
 Train L base learners using an unstable
learning procedure
 During test, take the avarage
 In bagging, generating complementary base-learners
is left to chance and to the instability of the learning
method.
**Unstable algorithm: when small change in the
training set causes a large differnce in the base
learners.
Boosting: Weak learner vs Strong learner
Lecture 6: Ensemble Methods18
 In boosting, we actively try to generate
complementary base-learners by training the next
learner on the mistakes of the previous learners.
 The original boosting algorithm (Schapire 1990)
combines three weak learners to generate a strong
learner.
 A weak learner has error probability less than
1/2, which makes it better than random guessing on
a two-class problem
 A strong learner has arbitrarily small error probability.
Boosting (ii) [Alpaydin, 2012: 431]
Lecture 6: Ensemble Methods19
 Given a large training set, we randomly divide it into
three.
 We use X1 and train d1.
 We then take X2 and feed it to d1. We take all
instances misclassified by d1 and also as many
instances on which d1 is correct from X2, and these
together form the training set of d2.
 We then take X3 and feed it to d1 and d2. The
instances on which d1 and d2 disagree form the
training set of d3.
 During testing, given an instance, we give it to d1
and d2; if they agree, that is the response, otherwise
the response of d3 is taken as the output.
Boosting: drawback
Lecture 6: Ensemble Methods20
 Though it is quite successful, the disadvantage of
the original boosting method is that it requires a very
large training sample.
Adaboost (adaptive boosting)
Lecture 6: Ensemble Methods21
 Use the same training set over and over and thus
need not to be large.
 Classifiers must be simple so they do not overfit.
 Can combine an arbitrary number of base
learners, not only three.
AdaBoost
Lecture 6: Ensemble Methods22
Generate a
sequence of base-
learners each
focusing on
previous one’s
errors.
The porbability of a
correctly classified
instance is
decreased, and the
probability of a
missclassified
instance increases.
This has the effect
that the next
classifier focuses
more on instances
missclassified by
the previous
classifier.
[Alpaydin, 2012:
432-433]
Adaboost: Testing
Lecture 6: Ensemble Methods23
 Given an instance, all the classifiers decide and a
weighted vote is taken.
 The weights are proportional to the base learners’
accuracies on the training set.
  improved accuracy
 The success of Adaboost is due to its property of
increasing the margin. If the margin increases, the
training istances are better separated and errors are
less likely. (This aim is similar to SVMs)
Stacking (i)
Lecture 6: Ensemble Methods24
 In stacked
generalization, the
combiner f( ) is
another learner
and is not restricted
to being a linear
combination as in
voting.
Stacking (ii)
Lecture 6: Ensemble Methods25
 The combiner system should learn how the base
learners make errors.
 Stacking is a means of estimating and correcting for
the biases of the base-learners.
 Therefore, the combiner should be trained on data
unused in training the base-learners
Cascading
Lecture 6: Ensemble Methods26
Use dj only if
preceding ones
are not confident
Cascade learners
in order of
complexity
Cascading
Lecture 6: Ensemble Methods27
 Cascading is a multistage method, and we use dj
only if all preceding learners are not confident.
 Associated with each learner is a confidence wj such
that we say dj is confident of its output and can be
used if wj > θj (the threshold).
 Confident: misclassifications as well as the instances
for which the posterior is not high enough.
 Important: The idea is that an early simple classifier
handles the majority of instances, and a more
complex classifier is used only for a small
percentage, so does not significantly increase the
overall complexity.
Summary
Lecture 6: Ensemble Methods28
 It is often a good idea to combine several learning
methods
 We want diverse classifiers, so their errors cancel
out
 However, remember, ensemble methods do not get
free lunch…
Example
Lecture 6: Ensemble Methods29
 in the case of arc-factored graph-based parsing, we
relied on spanning tree over a dense graph over the
input.
 a dense graph is a graph that contains all possible
arcs (wordforms)
 a spanning tree is a tree that has an incoming arc for
each word.
Example:
Ensemble MST Dependency Parsing
Lecture 6: Ensemble Methods30
Conclusions
Lecture 6: Ensemble Methods31
 Combining multiple learners has been a popular
topic in machine learning since the early 1990s, and
research has been going on ever since.
 Recently, it has been noticed that ensembles do not
always improve accuracy and research has started
to focus on the criteria that a good ensemble should
satisfy or how to form a good one.
Reading
Lecture 6: Ensemble Methods32
 Dietterich (2000)
 Alpaydin (2010): Ch. 17
 Daumé (2012): Ch. 11
Thanx for your attention!
Lecture 6: Ensemble Methods33

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learningUmmeSalmaM1
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligenceMdAlAmin187
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 

Was ist angesagt? (20)

Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Decision tree
Decision treeDecision tree
Decision tree
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Random forest
Random forestRandom forest
Random forest
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 

Ähnlich wie Lecture 6: Ensemble Methods

Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdfDynamicPitch
 
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed ClassificationSeulki Park
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesPier Luca Lanzi
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine LearningStepUp Analytics
 
Discussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topiDiscussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topiwiddowsonerica
 
Discussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docxDiscussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docxduketjoy27252
 
Manufacturing Effective Training Strategies.ppt
Manufacturing Effective Training Strategies.pptManufacturing Effective Training Strategies.ppt
Manufacturing Effective Training Strategies.pptMadan Karki
 
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskAdaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskXavier Ochoa
 
Download It
Download ItDownload It
Download Itbutest
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...Eirini Ntoutsi
 
Discussion Please use the Referencemodule I provided. Professor
Discussion Please use the Referencemodule I provided. Professor Discussion Please use the Referencemodule I provided. Professor
Discussion Please use the Referencemodule I provided. Professor widdowsonerica
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Discussion Discuss, elaborate and give example on the topic below
Discussion Discuss, elaborate and give example on the topic belowDiscussion Discuss, elaborate and give example on the topic below
Discussion Discuss, elaborate and give example on the topic belowwiddowsonerica
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptxKaviya452563
 

Ähnlich wie Lecture 6: Ensemble Methods (20)

Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
 
Discussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topiDiscussion Please discuss, elaborate and give example on the topi
Discussion Please discuss, elaborate and give example on the topi
 
Discussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docxDiscussion Please discuss, elaborate and give example on the topi.docx
Discussion Please discuss, elaborate and give example on the topi.docx
 
Manufacturing Effective Training Strategies.ppt
Manufacturing Effective Training Strategies.pptManufacturing Effective Training Strategies.ppt
Manufacturing Effective Training Strategies.ppt
 
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskAdaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
 
Download It
Download ItDownload It
Download It
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
 
Discussion Please use the Referencemodule I provided. Professor
Discussion Please use the Referencemodule I provided. Professor Discussion Please use the Referencemodule I provided. Professor
Discussion Please use the Referencemodule I provided. Professor
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Discussion Discuss, elaborate and give example on the topic below
Discussion Discuss, elaborate and give example on the topic belowDiscussion Discuss, elaborate and give example on the topic below
Discussion Discuss, elaborate and give example on the topic below
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptx
 

Mehr von Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 

Mehr von Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Kürzlich hochgeladen

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 

Kürzlich hochgeladen (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 

Lecture 6: Ensemble Methods

  • 1. Lecture 6: Ensemble Methods October 2013 Machine Learning for Language Technology Marina Santini, Uppsala University Department of Linguistics and Philology
  • 2. Where we are… Lecture 6: Ensemble Methods2  Previous lectures, various different learning methods:  Decision trees  Nearest neighbors  Linear classifiers  Structured Prediction  This lecture:  How to combine classifiers
  • 3. Combining Multiple Learners Lecture 6: Ensemble Methods3 Thanks to E. Alpaydin and Oscar Täckström
  • 4. Wisdom of the Crowd Lecture 6: Ensemble Methods4  Guess the weight of an ox  Average of people's votes close to true weight  Better than most individual members' votes and cattle experts' votes  Intuitively, the law of large numbers…
  • 5. Definition Lecture 6: Ensemble Methods5  An ensemble of classifiers is a set of classifiers whose individual decisions are combined in some way to classify new examples (Dietterich, 2000)
  • 6. Diversity vs accuracy Lecture 6: Ensemble Methods6  An ensemble of classifiers must be more accurate than any of its individual members.  The indivudual classifiers composing an ensemble must be accurate and diverse:  An accurate classifier is one that has an error rate better than random when guessing new examples  Two classifiers are diverse if they make different errors on new data points.
  • 7. Why it can be a good idea to build an ensemble Lecture 6: Ensemble Methods7  It is possible to build good ensembles for three fundamental reasons. (Dietterich , 2000): 1. Statistical: if little data 2. Computational: enough data, but local optima produced by local search 3. Representational: when the true function f cannot be represeted by any of the hypothesis in H (weighted sums of hypotheses drawn from H might expand the space
  • 8. Distinctions Lecture 6: Ensemble Methods8  Base learner  Arbitrary learning algorithm which could be used on its own  Ensemble  A learning algorithm composed of a set of base learners. The base learners may be organized in some structure  However, not completely clear cut  E.g. a linear classifier is a combination of multiple simple learners, in the sense that each dimension is in a simple predictor…
  • 9. The main purpose of an ensemble: maximising individual accuracy and diversity Lecture 6: Ensemble Methods9  Different learners use different  Algorithms  Hyperparameters  Representations /Modalities/Views  Training sets  Subproblems
  • 10. Practical Example Lecture 6: Ensemble Methods10
  • 11. Rationale Lecture 6: Ensemble Methods11  No Free Lunch Theorem: There is no algorithm that is always the most accurate in all situations.  Generate a group of base-learners which when combined has higher accuracy.
  • 12. Methods for Constructing Ensembles Lecture 6: Ensemble Methods12
  • 13. Approaches… Lecture 6: Ensemble Methods13  How do we generate base-learners that complement each other?  How do we combine the outputs of base learner for maximum accuracy?  Examples: Voting Boostrap Resampling Bagging Boosting AdaBoost Stacking Cascading
  • 14. Voting  Linear combination Lecture 6: Ensemble Methods 14
  • 15. Fixed Combination Rules Lecture 6: Ensemble Methods15
  • 16. Boostrap Resampling Lecture 6: Ensemble Methods16  Daume’ (2012): 150
  • 17. Bagging (bootstrap+aggregating) Lecture 6: Ensemble Methods17  Use bootstrapping to generate L training sets  Train L base learners using an unstable learning procedure  During test, take the avarage  In bagging, generating complementary base-learners is left to chance and to the instability of the learning method. **Unstable algorithm: when small change in the training set causes a large differnce in the base learners.
  • 18. Boosting: Weak learner vs Strong learner Lecture 6: Ensemble Methods18  In boosting, we actively try to generate complementary base-learners by training the next learner on the mistakes of the previous learners.  The original boosting algorithm (Schapire 1990) combines three weak learners to generate a strong learner.  A weak learner has error probability less than 1/2, which makes it better than random guessing on a two-class problem  A strong learner has arbitrarily small error probability.
  • 19. Boosting (ii) [Alpaydin, 2012: 431] Lecture 6: Ensemble Methods19  Given a large training set, we randomly divide it into three.  We use X1 and train d1.  We then take X2 and feed it to d1. We take all instances misclassified by d1 and also as many instances on which d1 is correct from X2, and these together form the training set of d2.  We then take X3 and feed it to d1 and d2. The instances on which d1 and d2 disagree form the training set of d3.  During testing, given an instance, we give it to d1 and d2; if they agree, that is the response, otherwise the response of d3 is taken as the output.
  • 20. Boosting: drawback Lecture 6: Ensemble Methods20  Though it is quite successful, the disadvantage of the original boosting method is that it requires a very large training sample.
  • 21. Adaboost (adaptive boosting) Lecture 6: Ensemble Methods21  Use the same training set over and over and thus need not to be large.  Classifiers must be simple so they do not overfit.  Can combine an arbitrary number of base learners, not only three.
  • 22. AdaBoost Lecture 6: Ensemble Methods22 Generate a sequence of base- learners each focusing on previous one’s errors. The porbability of a correctly classified instance is decreased, and the probability of a missclassified instance increases. This has the effect that the next classifier focuses more on instances missclassified by the previous classifier. [Alpaydin, 2012: 432-433]
  • 23. Adaboost: Testing Lecture 6: Ensemble Methods23  Given an instance, all the classifiers decide and a weighted vote is taken.  The weights are proportional to the base learners’ accuracies on the training set.   improved accuracy  The success of Adaboost is due to its property of increasing the margin. If the margin increases, the training istances are better separated and errors are less likely. (This aim is similar to SVMs)
  • 24. Stacking (i) Lecture 6: Ensemble Methods24  In stacked generalization, the combiner f( ) is another learner and is not restricted to being a linear combination as in voting.
  • 25. Stacking (ii) Lecture 6: Ensemble Methods25  The combiner system should learn how the base learners make errors.  Stacking is a means of estimating and correcting for the biases of the base-learners.  Therefore, the combiner should be trained on data unused in training the base-learners
  • 26. Cascading Lecture 6: Ensemble Methods26 Use dj only if preceding ones are not confident Cascade learners in order of complexity
  • 27. Cascading Lecture 6: Ensemble Methods27  Cascading is a multistage method, and we use dj only if all preceding learners are not confident.  Associated with each learner is a confidence wj such that we say dj is confident of its output and can be used if wj > θj (the threshold).  Confident: misclassifications as well as the instances for which the posterior is not high enough.  Important: The idea is that an early simple classifier handles the majority of instances, and a more complex classifier is used only for a small percentage, so does not significantly increase the overall complexity.
  • 28. Summary Lecture 6: Ensemble Methods28  It is often a good idea to combine several learning methods  We want diverse classifiers, so their errors cancel out  However, remember, ensemble methods do not get free lunch…
  • 29. Example Lecture 6: Ensemble Methods29  in the case of arc-factored graph-based parsing, we relied on spanning tree over a dense graph over the input.  a dense graph is a graph that contains all possible arcs (wordforms)  a spanning tree is a tree that has an incoming arc for each word.
  • 30. Example: Ensemble MST Dependency Parsing Lecture 6: Ensemble Methods30
  • 31. Conclusions Lecture 6: Ensemble Methods31  Combining multiple learners has been a popular topic in machine learning since the early 1990s, and research has been going on ever since.  Recently, it has been noticed that ensembles do not always improve accuracy and research has started to focus on the criteria that a good ensemble should satisfy or how to form a good one.
  • 32. Reading Lecture 6: Ensemble Methods32  Dietterich (2000)  Alpaydin (2010): Ch. 17  Daumé (2012): Ch. 11
  • 33. Thanx for your attention! Lecture 6: Ensemble Methods33