ML Decision Tree_2.pptx

BCSE209L Machine Learning
Module- : Decision Trees
Dr. R. Jothi
Decision Tree Induction Algorithms
■ ID3
■ Can handle both numerical and categorical features
■ Feature selection – Entropy
■ CART (continuous features and continuous
label)
■ Can handle both numerical and categorical features
■ Feature selection – Gini
■ Generally used for both regression and
classification
Measure of Impurity: GINI
• The Gini Index is the probability that a variable will not
be classified correctly if it was chosen randomly.
• Gini Index for a given node t with classes j
NOTE: p( j | t) is computed as the relative frequency of class j at
node t



j
t
j
p
t
GINI 2
)]
|
(
[
1
)
(
3
GINI Index : Example
• Example: Two classes C1 & C2 and node t has 5 C1 and
5 C2 examples. Compute Gini(t)
• 1 – [p(C1|t) + p(C2|t)] = 1 – [(5/10)2 + [(5/10)2 ]
• 1 – [¼ + ¼] = ½.



j
t
j
p
t
GINI 2
)]
|
(
[
1
)
(
4
The Gini index will always be between [0, 0.5], where 0 is
a selection that perfectly splits each class in your dataset
(pure), and 0.5 means that neither of the classes was
correctly classified (impure).
More on Gini
• Worst Gini corresponds to probabilities of 1/nc, where nc is the number of
classes.
• For 2-class problems the worst Gini will be ½
• How do we get the best Gini? Come up with an example for node t with 10
examples for classes C1 and C2
• 10 C1 and 0 C2
• Now what is the Gini?
• 1 – [(10/10)2 + (0/10)2 = 1 – [1 + 0] = 0
• So 0 is the best Gini
• So for 2-class problems:
• Gini varies from 0 (best) to ½ (worst).
5
Some More Examples
• Below we see the Gini values for 4 nodes with
different distributions. They are ordered from best to
worst. See next slide for details
• Note that thus far we are only computing GINI for one node.
We need to compute it for a split and then compute the
change in Gini from the parent node.
C1 0
C2 6
Gini=0.000
C1 2
C2 4
Gini=0.444
C1 3
C2 3
Gini=0.500
C1 1
C2 5
Gini=0.278
6
Examples for computing GINI
C1 0
C2 6
C1 2
C2 4
C1 1
C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0



j
t
j
p
t
GINI 2
)]
|
(
[
1
)
(
P(C1) = 1/6 P(C2) = 5/6
Gini = 1 – (1/6)2 – (5/6)2 = 0.278
P(C1) = 2/6 P(C2) = 4/6
Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Examples for Computing Error
C1 0
C2 6
C1 2
C2 4
C1 1
C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Error = 1 – max (0, 1) = 1 – 1 = 0
P(C1) = 1/6 P(C2) = 5/6
Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6
P(C1) = 2/6 P(C2) = 4/6
Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3
)
|
(
max
1
)
( t
i
P
t
Error i


8
CSEDIU
Comparison among Splitting Criteria
For a 2-class problem:
9
CSEDIU
Example: Construct Decision tree using Gini
index
Example: Construct Decision tree using Gini
index
Therefore, attribute B will be chosen to split the node.
Gini vs Entropy
•Computationally, entropy is more complex since it makes use
of logarithms and consequently, the calculation of the Gini Index
will be faster.
• Accuracy using the entropy criterion are slightly better (not
always).
Table 11.6
Algorithm Splitting Criteria Remark
ID3 Information Gain
𝛼 𝐴, 𝐷 = 𝐸 𝐷 − 𝐸𝐴(D)
Where
𝐸 𝐷 = Entropy of D (a
measure of uncertainty) =
− 𝑖=1
𝑘
𝑝𝑖 log 2𝑝𝑖
where D is with set of k classes
𝑐1, 𝑐2, … , 𝑐𝑘 and 𝑝𝑖 =
|𝐶𝑖,𝐷|
|𝐷|
;
Here, 𝐶𝑖,𝐷 is the set of tuples
with class 𝑐𝑖 in D.
𝐸𝐴 (D) = Weighted average
entropy when D is partitioned
on the values of attribute A =
𝑗=1
𝑚 |𝐷𝑗|
|𝐷|
𝐸(𝐷𝑗)
Here, m denotes the distinct
values of attribute A.
• The algorithm calculates
𝛼(𝐴𝑖,D) for all 𝐴𝑖 in D
and choose that attribute
which has maximum
𝛼(𝐴𝑖,D).
• The algorithm can handle
both categorical and
numerical attributes.
• It favors splitting those
attributes, which has a
large number of distinct
values.
13
Algorithm Splitting Criteria Remark
CART Gini Index
𝛾 𝐴, 𝐷 = 𝐺 𝐷 − 𝐺𝐴(D)
where
𝐺 𝐷 = Gini index (a measure of
impurity)
= 1 − 𝑖=1
𝑘
𝑝𝑖
2
Here, 𝑝𝑖 =
|𝐶𝑖,𝐷|
|𝐷|
and D is with k
number of classes and
GA(D) =
|𝐷1|
|𝐷|
𝐺(𝐷1) +
|𝐷2|
|𝐷|
𝐺(𝐷2),
when D is partitioned into two
data sets 𝐷1 and 𝐷2 based on
some values of attribute A.
• The algorithm calculates
all binary partitions for
all possible values of
attribute A and choose
that binary partition
which has the maximum
𝛾 𝐴, 𝐷 .
• The algorithm is
computationally very
expensive when the
attribute A has a large
number of values.
14
Algorithm Splitting Criteria Remark
C4.5 Gain Ratio
𝛽 𝐴, 𝐷 =
𝛼 𝐴, 𝐷
𝐸𝐴
∗
(D)
where
𝛼 𝐴, 𝐷 = Information gain of
D (same as in ID3, and
𝐸𝐴
∗
(D) = splitting information
= − 𝑗=1
𝑚 |𝐷𝑗|
|𝐷|
𝑙𝑜𝑔2
|𝐷𝑗|
|𝐷|
when D is partitioned into 𝐷1,
𝐷2, … , 𝐷𝑚 partitions
corresponding to m distinct
attribute values of A.
• The attribute A with
maximum value of
𝛽 𝐴, 𝐷 is selected for
splitting.
• Splitting information is a
kind of normalization, so
that it can check the
biasness of information
gain towards the
choosing attributes with a
large number of distinct
values.
In addition to this, we also highlight few important characteristics
of decision tree induction algorithms in the following.
15
1 von 15

Recomendados

BAS 250 Lecture 8 von
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8Wake Tech BAS
367 views44 Folien
ID3 Algorithm & ROC Analysis von
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisTalha Kabakus
6.2K views51 Folien
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts von
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
30.9K views81 Folien
Decision Tree von
Decision Tree Decision Tree
Decision Tree Konkuk University, Korea
130 views20 Folien
Decision tree of cart von
Decision tree of cartDecision tree of cart
Decision tree of cartkalung0313
201 views20 Folien
Random forest von
Random forestRandom forest
Random forestMusa Hawamdah
53.4K views23 Folien

Más contenido relacionado

Similar a ML Decision Tree_2.pptx

Chapter 8. Classification Basic Concepts.ppt von
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptSubrata Kumer Paul
74 views81 Folien
08ClassBasic.ppt von
08ClassBasic.ppt08ClassBasic.ppt
08ClassBasic.pptGauravWani20
1 view105 Folien
08ClassBasic.ppt von
08ClassBasic.ppt08ClassBasic.ppt
08ClassBasic.pptharsh708944
18 views81 Folien
Basics of Classification.ppt von
Basics of Classification.pptBasics of Classification.ppt
Basics of Classification.pptNBACriteria2SICET
4 views81 Folien
[系列活動] Data exploration with modern R von
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R台灣資料科學年會
4.7K views61 Folien
08ClassBasic VT.ppt von
08ClassBasic VT.ppt08ClassBasic VT.ppt
08ClassBasic VT.pptGaneshaAdhik
7 views42 Folien

Similar a ML Decision Tree_2.pptx(20)

Asymptotic analysis von Soujanya V
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
Soujanya V3.7K views
Data Mining Concepts and Techniques.ppt von Rvishnupriya2
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya217 views
Data Mining Concepts and Techniques.ppt von Rvishnupriya2
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya228 views
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro... von Chiheb Ben Hammouda
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Pair-Atomic Resolution-of-the-Identity von patrime
Pair-Atomic Resolution-of-the-IdentityPair-Atomic Resolution-of-the-Identity
Pair-Atomic Resolution-of-the-Identity
patrime418 views

Último

[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx von
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptxDataScienceConferenc1
5 views12 Folien
LIVE OAK MEMORIAL PARK.pptx von
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptxms2332always
7 views6 Folien
Infomatica-MDM.pptx von
Infomatica-MDM.pptxInfomatica-MDM.pptx
Infomatica-MDM.pptxKapil Rangwani
11 views16 Folien
UNEP FI CRS Climate Risk Results.pptx von
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 Folien
Ukraine Infographic_22NOV2023_v2.pdf von
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.4K views3 Folien
Cross-network in Google Analytics 4.pdf von
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 Folien

Último(20)

[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx von DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
UNEP FI CRS Climate Risk Results.pptx von pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Ukraine Infographic_22NOV2023_v2.pdf von AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
Cross-network in Google Analytics 4.pdf von GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
PRIVACY AWRE PERSONAL DATA STORAGE von antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204215 views
CRM stick or twist workshop von info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821711 views
CRIJ4385_Death Penalty_F23.pptx von yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx von ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... von DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
CRM stick or twist.pptx von info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
Data Journeys Hard Talk workshop final.pptx von info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
Survey on Factuality in LLM's.pptx von NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
Short Story Assignment by Kelly Nguyen von kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx von DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx

ML Decision Tree_2.pptx

  • 1. BCSE209L Machine Learning Module- : Decision Trees Dr. R. Jothi
  • 2. Decision Tree Induction Algorithms ■ ID3 ■ Can handle both numerical and categorical features ■ Feature selection – Entropy ■ CART (continuous features and continuous label) ■ Can handle both numerical and categorical features ■ Feature selection – Gini ■ Generally used for both regression and classification
  • 3. Measure of Impurity: GINI • The Gini Index is the probability that a variable will not be classified correctly if it was chosen randomly. • Gini Index for a given node t with classes j NOTE: p( j | t) is computed as the relative frequency of class j at node t    j t j p t GINI 2 )] | ( [ 1 ) ( 3
  • 4. GINI Index : Example • Example: Two classes C1 & C2 and node t has 5 C1 and 5 C2 examples. Compute Gini(t) • 1 – [p(C1|t) + p(C2|t)] = 1 – [(5/10)2 + [(5/10)2 ] • 1 – [¼ + ¼] = ½.    j t j p t GINI 2 )] | ( [ 1 ) ( 4 The Gini index will always be between [0, 0.5], where 0 is a selection that perfectly splits each class in your dataset (pure), and 0.5 means that neither of the classes was correctly classified (impure).
  • 5. More on Gini • Worst Gini corresponds to probabilities of 1/nc, where nc is the number of classes. • For 2-class problems the worst Gini will be ½ • How do we get the best Gini? Come up with an example for node t with 10 examples for classes C1 and C2 • 10 C1 and 0 C2 • Now what is the Gini? • 1 – [(10/10)2 + (0/10)2 = 1 – [1 + 0] = 0 • So 0 is the best Gini • So for 2-class problems: • Gini varies from 0 (best) to ½ (worst). 5
  • 6. Some More Examples • Below we see the Gini values for 4 nodes with different distributions. They are ordered from best to worst. See next slide for details • Note that thus far we are only computing GINI for one node. We need to compute it for a split and then compute the change in Gini from the parent node. C1 0 C2 6 Gini=0.000 C1 2 C2 4 Gini=0.444 C1 3 C2 3 Gini=0.500 C1 1 C2 5 Gini=0.278 6
  • 7. Examples for computing GINI C1 0 C2 6 C1 2 C2 4 C1 1 C2 5 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0    j t j p t GINI 2 )] | ( [ 1 ) ( P(C1) = 1/6 P(C2) = 5/6 Gini = 1 – (1/6)2 – (5/6)2 = 0.278 P(C1) = 2/6 P(C2) = 4/6 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
  • 8. Examples for Computing Error C1 0 C2 6 C1 2 C2 4 C1 1 C2 5 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Error = 1 – max (0, 1) = 1 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6 P(C1) = 2/6 P(C2) = 4/6 Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3 ) | ( max 1 ) ( t i P t Error i   8 CSEDIU
  • 9. Comparison among Splitting Criteria For a 2-class problem: 9 CSEDIU
  • 10. Example: Construct Decision tree using Gini index
  • 11. Example: Construct Decision tree using Gini index Therefore, attribute B will be chosen to split the node.
  • 12. Gini vs Entropy •Computationally, entropy is more complex since it makes use of logarithms and consequently, the calculation of the Gini Index will be faster. • Accuracy using the entropy criterion are slightly better (not always).
  • 13. Table 11.6 Algorithm Splitting Criteria Remark ID3 Information Gain 𝛼 𝐴, 𝐷 = 𝐸 𝐷 − 𝐸𝐴(D) Where 𝐸 𝐷 = Entropy of D (a measure of uncertainty) = − 𝑖=1 𝑘 𝑝𝑖 log 2𝑝𝑖 where D is with set of k classes 𝑐1, 𝑐2, … , 𝑐𝑘 and 𝑝𝑖 = |𝐶𝑖,𝐷| |𝐷| ; Here, 𝐶𝑖,𝐷 is the set of tuples with class 𝑐𝑖 in D. 𝐸𝐴 (D) = Weighted average entropy when D is partitioned on the values of attribute A = 𝑗=1 𝑚 |𝐷𝑗| |𝐷| 𝐸(𝐷𝑗) Here, m denotes the distinct values of attribute A. • The algorithm calculates 𝛼(𝐴𝑖,D) for all 𝐴𝑖 in D and choose that attribute which has maximum 𝛼(𝐴𝑖,D). • The algorithm can handle both categorical and numerical attributes. • It favors splitting those attributes, which has a large number of distinct values. 13
  • 14. Algorithm Splitting Criteria Remark CART Gini Index 𝛾 𝐴, 𝐷 = 𝐺 𝐷 − 𝐺𝐴(D) where 𝐺 𝐷 = Gini index (a measure of impurity) = 1 − 𝑖=1 𝑘 𝑝𝑖 2 Here, 𝑝𝑖 = |𝐶𝑖,𝐷| |𝐷| and D is with k number of classes and GA(D) = |𝐷1| |𝐷| 𝐺(𝐷1) + |𝐷2| |𝐷| 𝐺(𝐷2), when D is partitioned into two data sets 𝐷1 and 𝐷2 based on some values of attribute A. • The algorithm calculates all binary partitions for all possible values of attribute A and choose that binary partition which has the maximum 𝛾 𝐴, 𝐷 . • The algorithm is computationally very expensive when the attribute A has a large number of values. 14
  • 15. Algorithm Splitting Criteria Remark C4.5 Gain Ratio 𝛽 𝐴, 𝐷 = 𝛼 𝐴, 𝐷 𝐸𝐴 ∗ (D) where 𝛼 𝐴, 𝐷 = Information gain of D (same as in ID3, and 𝐸𝐴 ∗ (D) = splitting information = − 𝑗=1 𝑚 |𝐷𝑗| |𝐷| 𝑙𝑜𝑔2 |𝐷𝑗| |𝐷| when D is partitioned into 𝐷1, 𝐷2, … , 𝐷𝑚 partitions corresponding to m distinct attribute values of A. • The attribute A with maximum value of 𝛽 𝐴, 𝐷 is selected for splitting. • Splitting information is a kind of normalization, so that it can check the biasness of information gain towards the choosing attributes with a large number of distinct values. In addition to this, we also highlight few important characteristics of decision tree induction algorithms in the following. 15