SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Random Forest and K Nearest Neighbor
K Nearest Neighbor (KNN)
Logic of KNN
 Find from historical record that looks as similar as
possible to the new record.
Which group will I be classified?
KNN instances and distance
measure
 Each instance/samples is categorized as a vector of
numbers, so all instances correspond to points in an n-
dimensional Euclidean space.
North Carolina state bird: p = (p1, p2,..., pn)
Dinosaur: q = (q1, q2,..., qn)
 How to measure the distance between instances?
Euclidean distance:
K nearest neighbor
 You have k nearest neighbors and you need to
pick k to get the classification – 1, 3, 5 are
people often pick.
Question: Why is number of nearest neighbors often odd number?
Answer: because the classification is decided by majority vote!
Random Forest
Random Forest is an ensemble of many decision
trees.
Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
Single, Divorced
< 80K > 80K
Splitting Attributes
Training Data
Decision Tree
http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Start from the root of tree.
http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to “No”
http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
Special feature of decision tree of random
forest
Trees should not be
pruned.
Each individual tree is
over fitting (not
generalized well), but it
will be okay after taking
the majority vote (which
will be explained later).
Persecuting a tree is NOT allowed
in the random forest world!
Logic of ensemble
 High-dimensional pattern reorganization problem is as
complicated as an elephant to a blind man – too many
perspectives to touch and to know!
A single decision tree is like a single blind man. It is subject to over fitting and unstab
“Unstable” means that small changes in the training set leads to large changes in
Predictions.
The logic of ensemble - continued
A single blind man is limited. Why
not send many blind men and let
them to investigate the elephant
from different perspectives, and
then aggregate their opinion?
The MANY blind men approach is
like random forest, an ensemble
of many trees!
In random forest, each tree is like a blind man and they will use the training set
(the part of
the elephant they touched) to draw conclusions (build the training model) and
then to make
Translating it to a little bit jargon….
 Random forest is an ensemble classifier of many
decision trees.
 Each tree casts a vote at its terminal nodes. (For
binary endpoint, the vote will be “YES” or “NO”.)
 The final decision of prediction depends on the
majority vote of trees.
 The motivation for generating multiple trees is to
increase predictive accuracy.
Need to get some ensemble rules….
 To avoid a blind men to announce an elephant is like a
carpet, there must be some rules so that their votes
make as much sense as they can in aggregation.
elephant (hair) carpet
Boostrap (randomness by the samples)
Bootstrap sampling: create new training sets by random sampling from
original data WITH replacement.
Dataset
Bootstrap Dataset1
Bootstrap Dataset2
Bootstrap Dataset 3
OOB samples (around 1/3)
OOB samples (around 1/3)
OOB samples (around 1/3)
Bootstrap data (about 2/3 of training data) is to grow the tree and OOB
samples is for self testing – to evaluate the performance of each tree and to
get unbiased estimate of classification error.
Bootstrap data is the mainstream random forest. People some times use
sampling without replacement.
.
.
.
.
Random subspace (randomness by features)
 For a bootstrap samples with M
predictors, at each node, m (m<M)
variables are selected at random
and only those m features are
considered for splitting. This is to
let trees grow using different
features, like letting each blind
men see the data from different
perspectives.
 Find the best split on the selected
m variables.
 The value of m is fixed when the
forest is grown.
How to classify new objects using random
forest?
Put the input vector on each of the trees in the forest. Each tree gives a
classification (a vote) and the forest chooses the classification having the
majority votes (over all the trees in the forest).
New sample
Tree 3
New sample
Tree 2
New sample
Tree 1
New sample
Tree 4
New sample
Tree n
Final decision – majority vote
Review in stats language
 Definition: Random forest is learning ensemble consisting of bagging
(or other type of re-sampling) of un-pruned decision tree learners
with a randomized selection of features at each split.
 Random forest algorithm
 Let Ntrees be the number of trees to build
 for each of Ntrees iterations
 1. Select a new bootstrap (or other type of re-sampling) sample from
training set
 2. Grow an un-pruned tree on this bootstrap.
 3. At each internal node, randomly select mtry predictors and
determine the best split using only these predictors.
 4. Do not perform cost complexity pruning. Save tree as is, along side
those built thus far.
 Output overall prediction as the average response (regression) or
majority vote (classification) from all individually trained trees
http://www.dabi.temple.edu/~hbling/8590.002/Montillo_RandomForests_4-2-2009.pdf
Pattern recognition is fun
Lunar mining robot
"Give me a place to stand on, and I will move the Earth with a lever .” – Archimedes
Give the machine enough data and algorithm, he/she will behave similar like you.
Mars Rover

Weitere ähnliche Inhalte

Was ist angesagt?

Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Simplilearn
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Edureka!
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 

Was ist angesagt? (20)

Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Decision tree
Decision treeDecision tree
Decision tree
 
Xgboost
XgboostXgboost
Xgboost
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Gradient Boosting
Gradient BoostingGradient Boosting
Gradient Boosting
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 

Andere mochten auch

Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forestsDebdoot Sheet
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at ScaleSri Ambati
 
Sdforum 11-04-2010
Sdforum 11-04-2010Sdforum 11-04-2010
Sdforum 11-04-2010Ted Dunning
 
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Wei Zhong Toh
 
R User Group Singapore, Data Mining with R (Workshop II) - Random forests
R User Group Singapore, Data Mining with R (Workshop II) - Random forestsR User Group Singapore, Data Mining with R (Workshop II) - Random forests
R User Group Singapore, Data Mining with R (Workshop II) - Random forestsWei Zhong Toh
 
Unsupervised Learning Techniques to Diversifying and Pruning Random Forest
Unsupervised Learning Techniques to Diversifying and Pruning Random ForestUnsupervised Learning Techniques to Diversifying and Pruning Random Forest
Unsupervised Learning Techniques to Diversifying and Pruning Random ForestMohamed Medhat Gaber
 
Text Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree LearningText Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree LearningAdrian Cuyugan
 
2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis 2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis Yawen Li
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahoutGaurav Kasliwal
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)Shweta Ghate
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Crime Analysis using Data Analysis
Crime Analysis using Data AnalysisCrime Analysis using Data Analysis
Crime Analysis using Data AnalysisChetan Hireholi
 

Andere mochten auch (20)

Decision trees and random forests
Decision trees and random forestsDecision trees and random forests
Decision trees and random forests
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Sdforum 11-04-2010
Sdforum 11-04-2010Sdforum 11-04-2010
Sdforum 11-04-2010
 
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
 
R User Group Singapore, Data Mining with R (Workshop II) - Random forests
R User Group Singapore, Data Mining with R (Workshop II) - Random forestsR User Group Singapore, Data Mining with R (Workshop II) - Random forests
R User Group Singapore, Data Mining with R (Workshop II) - Random forests
 
Unsupervised Learning Techniques to Diversifying and Pruning Random Forest
Unsupervised Learning Techniques to Diversifying and Pruning Random ForestUnsupervised Learning Techniques to Diversifying and Pruning Random Forest
Unsupervised Learning Techniques to Diversifying and Pruning Random Forest
 
Text Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree LearningText Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree Learning
 
2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis 2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Knn
KnnKnn
Knn
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
 
Decision trees
Decision treesDecision trees
Decision trees
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Crime Analysis using Data Analysis
Crime Analysis using Data AnalysisCrime Analysis using Data Analysis
Crime Analysis using Data Analysis
 
Decision tree
Decision treeDecision tree
Decision tree
 

Ähnlich wie Random Forest and KNN is fun

Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsBrian Lange
 
DecisionTree_RandomForest.pptx
DecisionTree_RandomForest.pptxDecisionTree_RandomForest.pptx
DecisionTree_RandomForest.pptxSagynKarabay
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptxPRIYACHAURASIYA25
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble AlgorithmsSara Hooker
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Wuhyun Rico Shin
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisAlex Henderson
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyabatubao
 

Ähnlich wie Random Forest and KNN is fun (20)

Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Decision tree
Decision tree Decision tree
Decision tree
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
 
DecisionTree_RandomForest.pptx
DecisionTree_RandomForest.pptxDecisionTree_RandomForest.pptx
DecisionTree_RandomForest.pptx
 
13 random forest
13 random forest13 random forest
13 random forest
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Adam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the OddballsAdam Ashenfelter - Finding the Oddballs
Adam Ashenfelter - Finding the Oddballs
 
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Kürzlich hochgeladen (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Random Forest and KNN is fun

  • 1. Random Forest and K Nearest Neighbor
  • 3. Logic of KNN  Find from historical record that looks as similar as possible to the new record. Which group will I be classified?
  • 4. KNN instances and distance measure  Each instance/samples is categorized as a vector of numbers, so all instances correspond to points in an n- dimensional Euclidean space. North Carolina state bird: p = (p1, p2,..., pn) Dinosaur: q = (q1, q2,..., qn)  How to measure the distance between instances? Euclidean distance:
  • 5. K nearest neighbor  You have k nearest neighbors and you need to pick k to get the classification – 1, 3, 5 are people often pick. Question: Why is number of nearest neighbors often odd number? Answer: because the classification is decided by majority vote!
  • 6. Random Forest Random Forest is an ensemble of many decision trees.
  • 7. Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Refund MarSt TaxInc YESNO NO NO Yes No Single, Divorced < 80K > 80K Splitting Attributes Training Data Decision Tree http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
  • 8. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Start from the root of tree. http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
  • 9. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
  • 10. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
  • 11. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
  • 12. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
  • 13. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Assign Cheat to “No” http://www.scribd.com/doc/56167859/7/Decision-Tree-Classification-Task
  • 14. Special feature of decision tree of random forest Trees should not be pruned. Each individual tree is over fitting (not generalized well), but it will be okay after taking the majority vote (which will be explained later). Persecuting a tree is NOT allowed in the random forest world!
  • 15. Logic of ensemble  High-dimensional pattern reorganization problem is as complicated as an elephant to a blind man – too many perspectives to touch and to know! A single decision tree is like a single blind man. It is subject to over fitting and unstab “Unstable” means that small changes in the training set leads to large changes in Predictions.
  • 16. The logic of ensemble - continued A single blind man is limited. Why not send many blind men and let them to investigate the elephant from different perspectives, and then aggregate their opinion? The MANY blind men approach is like random forest, an ensemble of many trees! In random forest, each tree is like a blind man and they will use the training set (the part of the elephant they touched) to draw conclusions (build the training model) and then to make
  • 17. Translating it to a little bit jargon….  Random forest is an ensemble classifier of many decision trees.  Each tree casts a vote at its terminal nodes. (For binary endpoint, the vote will be “YES” or “NO”.)  The final decision of prediction depends on the majority vote of trees.  The motivation for generating multiple trees is to increase predictive accuracy.
  • 18. Need to get some ensemble rules….  To avoid a blind men to announce an elephant is like a carpet, there must be some rules so that their votes make as much sense as they can in aggregation. elephant (hair) carpet
  • 19. Boostrap (randomness by the samples) Bootstrap sampling: create new training sets by random sampling from original data WITH replacement. Dataset Bootstrap Dataset1 Bootstrap Dataset2 Bootstrap Dataset 3 OOB samples (around 1/3) OOB samples (around 1/3) OOB samples (around 1/3) Bootstrap data (about 2/3 of training data) is to grow the tree and OOB samples is for self testing – to evaluate the performance of each tree and to get unbiased estimate of classification error. Bootstrap data is the mainstream random forest. People some times use sampling without replacement. . . . .
  • 20. Random subspace (randomness by features)  For a bootstrap samples with M predictors, at each node, m (m<M) variables are selected at random and only those m features are considered for splitting. This is to let trees grow using different features, like letting each blind men see the data from different perspectives.  Find the best split on the selected m variables.  The value of m is fixed when the forest is grown.
  • 21. How to classify new objects using random forest? Put the input vector on each of the trees in the forest. Each tree gives a classification (a vote) and the forest chooses the classification having the majority votes (over all the trees in the forest). New sample Tree 3 New sample Tree 2 New sample Tree 1 New sample Tree 4 New sample Tree n Final decision – majority vote
  • 22. Review in stats language  Definition: Random forest is learning ensemble consisting of bagging (or other type of re-sampling) of un-pruned decision tree learners with a randomized selection of features at each split.  Random forest algorithm  Let Ntrees be the number of trees to build  for each of Ntrees iterations  1. Select a new bootstrap (or other type of re-sampling) sample from training set  2. Grow an un-pruned tree on this bootstrap.  3. At each internal node, randomly select mtry predictors and determine the best split using only these predictors.  4. Do not perform cost complexity pruning. Save tree as is, along side those built thus far.  Output overall prediction as the average response (regression) or majority vote (classification) from all individually trained trees http://www.dabi.temple.edu/~hbling/8590.002/Montillo_RandomForests_4-2-2009.pdf
  • 23. Pattern recognition is fun Lunar mining robot "Give me a place to stand on, and I will move the Earth with a lever .” – Archimedes Give the machine enough data and algorithm, he/she will behave similar like you. Mars Rover