SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Induction of Decision Trees
Blaž Zupan and Ivan Bratko
magix.fri.uni-lj.si/predavanja/uisp
CS583, Bing Liu, UIC 2
An example application
• An emergency room in a hospital measures 17
variables (e.g., blood pressure, age, etc) of newly
admitted patients.
• A decision is needed: whether to put a new
patient in an intensive-care unit.
• Due to the high cost of ICU, those patients who
may survive less than a month are given higher
priority.
• Problem: to predict high-risk patients and
discriminate them from low-risk patients.
CS583, Bing Liu, UIC 3
Another application
• A credit card company receives thousands of
applications for new cards. Each application
contains information about an applicant,
– age
– Marital status
– annual salary
– outstanding debts
– credit rating
– etc.
• Problem: to decide whether an application should
approved, or to classify applications into two
categories, approved and not approved.
CS583, Bing Liu, UIC 4
Machine learning and our focus
• Like human learning from past experiences.
• A computer does not have “experiences”.
• A computer system learns from data, which
represent some “past experiences” of an
application domain.
• Our focus: learn a target function that can be
used to predict the values of a discrete class
attribute, e.g., approve or not-approved, and
high-risk or low risk.
• The task is commonly called: Supervised
learning, classification, or inductive learning.
CS583, Bing Liu, UIC 5
• Data: A set of data records (also called
examples, instances or cases) described by
– k attributes: A1, A2, … Ak.
– a class: Each example is labelled with a pre-
defined class.
• Goal: To learn a classification model from the
data that can be used to predict the classes
of new (future, or test) cases/instances.
The data and the goal
CS583, Bing Liu, UIC 6
An example: data (loan
application) Approved or not
CS583, Bing Liu, UIC 7
An example: the learning
task
• Learn a classification model from the data
• Use the model to classify future loan applications
into
– Yes (approved) and
– No (not approved)
• What is the class for following case/instance?
CS583, Bing Liu, UIC 8
Supervised vs. unsupervised
Learning
• Supervised learning: classification is seen as
supervised learning from examples.
– Supervision: The data (observations,
measurements, etc.) are labeled with pre-defined
classes. It is like that a “teacher” gives the classes
(supervision).
– Test data are classified into these classes too.
• Unsupervised learning (clustering)
– Class labels of the data are unknown
– Given a set of data, the task is to establish the
existence of classes or clusters in the data
CS583, Bing Liu, UIC 9
Supervised learning process:
two steps
 Learning (training): Learn a model using the
training data
 Testing: Test the model using unseen test
data to assess the model accuracy
,
casestestofnumberTotal
tionsclassificacorrectofNumber
=Accuracy
CS583, Bing Liu, UIC 10
What do we mean by learning?
• Given
– a data set D,
– a task T, and
– a performance measure M,
a computer system is said to learn from D to
perform the task T if after learning the
system’s performance on T improves as
measured by M.
• In other words, the learned model helps the
system to perform T better as compared to
no learning.
CS583, Bing Liu, UIC 11
An example
• Data: Loan application data
• Task: Predict whether a loan should be
approved or not.
• Performance measure: accuracy.
No learning: classify all future applications (test
data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
CS583, Bing Liu, UIC 12
Fundamental assumption of
learning
Assumption: The distribution of training
examples is identical to the distribution of
test examples (including future unseen
examples).
• In practice, this assumption is often violated
to certain degree.
• Strong violations will clearly result in poor
classification accuracy.
• To achieve good accuracy on the test data,
training examples must be sufficiently
representative of the test data.
The worst pH value
at ICU
The worst active partial
thromboplastin time
PH_ICU and APPT_WORST are exactly the two factors
(theoretically) advocated to be the most important ones in
the study by Rotondo et al., 1997.
Analysis of Severe Trauma
Patients Data
Death
0.0 (0/15)
Well
0.82 (9/11)
Death
0.0 (0/7)
APPT_WORST
Well
0.88 (14/16)
PH_ICU
<78.7 >=78.7
>7.33<7.2
7.2-7.33
Tree induced by Assistant Professional
Interesting: Accuracy of this tree compared to medical
specialists
Breast Cancer Recurrence
no rec 125
recurr 39
recurr 27
no_rec 10
Tumor Size
no rec 30
recurr 18
Degree of Malig
< 3
Involved Nodes
Age
no rec 4
recurr 1
no rec 32
recurr 0
>= 3
< 15 >= 15 < 3 >= 3
Prostate cancer recurrence
Secondary Gleason GradeSecondary Gleason Grade
NoNo YesYesPSA LevelPSA Level StageStage
Primary Gleason GradePrimary Gleason Grade
NoNo YesYes
NoNo NoNo YesYes
1,2 3 4 5
≤14.9 >14.9 T1c,T2a,
T2b,T2c
T1ab,T3
2,3 4
CS583, Bing Liu, UIC 16
Introduction
• Decision tree learning is one of the most
widely used techniques for classification.
– Its classification accuracy is competitive with
other methods, and
– it is very efficient.
• The classification model is a tree, called
decision tree.
• C4.5 by Ross Quinlan is perhaps the best
known system. It can be downloaded from
the Web.
CS583, Bing Liu, UIC 17
The loan data (reproduced)
Approved or not
CS583, Bing Liu, UIC 18
A decision tree from the loan
data
 Decision nodes and leaf nodes (classes)
CS583, Bing Liu, UIC 19
Use the decision tree
No
CS583, Bing Liu, UIC 20
Is the decision tree unique?
 No. Here is a simpler tree.
 We want smaller tree and accurate tree.
 Easy to understand and perform better.
 Finding the best tree is
NP-hard.
 All current tree building
algorithms are heuristic
algorithms
An Example Data Set and
Decision Tree
yes
no
yes no
sunny rainy
no
med
yes
small big
big
outlook
company
sailboat
# Class
Outlook Company Sailboat Sail?
1 sunny big small yes
2 sunny med small yes
3 sunny med big yes
4 sunny no small yes
5 sunny big big yes
6 rainy no small no
7 rainy med small yes
8 rainy big big yes
9 rainy no big no
10 rainy med big no
Attribute
Classification
yes
no
yes no
sunny rainy
no
med
yes
small big
big
outlook
company
sailboat
# Class
Outlook Company Sailboat Sail?
1 sunny no big ?
2 rainy big small ?
Attribute
Induction of Decision Trees
• Data Set (Learning Set)
– Each example = Attributes + Class
• Induced description = Decision tree
• TDIDT
– Top Down Induction of Decision Trees
• Recursive Partitioning
Some TDIDT Systems
• ID3 (Quinlan 79)
• CART (Brieman et al. 84)
• Assistant (Cestnik et al. 87)
• C4.5 (Quinlan 93)
• See5 (Quinlan 97)
• ...
• Orange (Demšar, Zupan 98-03)
TDIDT Algorithm
• Also known as ID3 (Quinlan)
• To construct decision tree T from learning
set S:
– If all examples in S belong to some class C Then
make leaf labeled C
– Otherwise
• select the “most informative” attribute A
• partition S according to A’s values
• recursively construct subtrees T1, T2, ..., for the
subsets of S
Hunt’s Algorithm
Don’t
Cheat
Refund
Don’t
Cheat
Don’t
Cheat
Yes No
Refund
Don’t
Cheat
Yes No
Marital
Status
Don’t
Cheat
Cheat
Single,
Divorced
Married
Taxable
Income
Don’t
Cheat
< 80K >= 80K
Refund
Don’t
Cheat
Yes No
Marital
Status
Don’t
Cheat
Cheat
Single,
Divorced
Married
TDIDT Algorithm
• Resulting tree T is:
A
v1
T1 T2 Tn
v2 vn
Attribute A
A’s values
Subtrees
Another Example
# Class
Outlook Temperature Humidity Windy Play
1 sunny hot high no N
2 sunny hot high yes N
3 overcast hot high no P
4 rainy moderate high no P
5 rainy cold normal no P
6 rainy cold normal yes N
7 overcast cold normal yes P
8 sunny moderate high no N
9 sunny cold normal no P
10 rainy moderate normal no P
11 sunny moderate normal yes P
12 overcast moderate high yes P
13 overcast hot normal no P
14 rainy moderate high yes N
Attribute
Simple Tree
Outlook
Humidity WindyP
sunny
overcast
rainy
PN
high normal
PN
yes no
Complicated Tree
Temperature
Outlook Windy
cold moderate
hot
P
sunny rainy
N
yes no
P
overcast
Outlook
sunny rainy
P
overcast
Windy
PN
yes no
Windy
NP
yes no
Humidity
P
high normal
Windy
PN
yes no
Humidity
P
high normal
Outlook
N
sunny rainy
P
overcast
null
Attribute Selection Criteria
• Main principle
– Select attribute which partitions the learning set into
subsets as “pure” as possible
• Various measures of purity
– Information-theoretic
– Gini index
– X2
– ReliefF
– ...
• Various improvements
– probability estimates
– normalization
– binarization, subsetting
Information-Theoretic Approach
• To classify an object, a certain information is
needed
– I, information
• After we have learned the value of attribute A,
we only need some remaining amount of
information to classify the object
– Ires, residual information
• Gain
– Gain(A) = I – Ires(A)
• The most ‘informative’ attribute is the one that
minimizes Ires, i.e., maximizes Gain
Entropy
• The average amount of information I needed to
classify an object is given by the entropy measure
• For a two-class problem:
entropy
p(c1)
Residual Information
• After applying attribute A, S is partitioned
into subsets according to values v of A
• Ires is equal to weighted sum of the
amounts of information for the subsets
Triangles and Squares
# Shape
Color Outline Dot
1 green dashed no triange
2 green dashed yes triange
3 yellow dashed no square
4 red dashed no square
5 red solid no square
6 red solid yes triange
7 green solid no square
8 green dashed no triange
9 yellow solid yes square
10 red solid no square
11 green solid yes square
12 yellow dashed yes square
13 yellow solid no square
14 red dashed yes triange
Attribute
Triangles and Squares
.
.
.
.
.
.
# Shape
Color Outline Dot
1 green dashed no triange
2 green dashed yes triange
3 yellow dashed no square
4 red dashed no square
5 red solid no square
6 red solid yes triange
7 green solid no square
8 green dashed no triange
9 yellow solid yes square
10 red solid no square
11 green solid yes square
12 yellow dashed yes square
13 yellow solid no square
14 red dashed yes triange
Attribute
Data Set:
A set of classified objects
Entropy
• 5 triangles
• 9 squares
• class probabilities
• entropy
.
.
.
.
.
.
Entropy
reduction
by
data set
partitioning
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
Entropijavrednosti
atributa
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
InformationGain
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
Information Gain of The Attribute
• Attributes
– Gain(Color) = 0.246
– Gain(Outline) = 0.151
– Gain(Dot) = 0.048
• Heuristics: attribute with the highest gain is
chosen
• This heuristics is local (local minimization of
impurity)
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
Gain(Outline) = 0.971 – 0 = 0.971 bits
Gain(Dot) = 0.971 – 0.951 = 0.020 bits
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
.
.
Outline?
dashed
solid
Gain(Outline) = 0.971 – 0.951 = 0.020 bits
Gain(Dot) = 0.971 – 0 = 0.971 bits
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
.
.
dashed
solid
Dot?
no
yes
.
.
Outline?
Decision Tree
Color
Dot Outlinesquare
red
yellow
green
squaretriangle
yes no
squaretriangle
dashed solid
.
.
.
.
.
.
A Defect of Ires
• Ires favors attributes with many values
• Such attribute splits S to many subsets, and if
these are small, they will tend to be pure anyway
• One way to rectify this is through a corrected
measure of information gain ratio.
Information Gain Ratio
• I(A) is amount of information needed to
determine the value of an attribute A
• Information gain ratio
InformationGainRatio
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
Information Gain and
Information Gain Ratio
A |v(A)| Gain(A) GainRatio(A)
Color 3 0.247 0.156
Outline 2 0.152 0.152
Dot 2 0.048 0.049
Gini Index
• Another sensible measure of impurity
(i and j are classes)
• After applying attribute A, the resulting Gini
index is
• Gini can be interpreted as expected error rate
Gini Index
.
.
.
.
.
.
GiniIndexforColor
.
.
.
.
.
.
..
.
.
.
.
Color?
red
yellow
green
Gain of Gini Index
Three Impurity Measures
A Gain(A) GainRatio(A) GiniGain(A)
Color 0.247 0.156 0.058
Outline 0.152 0.152 0.046
Dot 0.048 0.049 0.015
• These impurity measures assess the effect of a
single attribute
• Criterion “most informative” that they define is
local (and “myopic”)
• It does not reliably predict the effect of several
attributes applied jointly

Weitere ähnliche Inhalte

Was ist angesagt?

K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learningShareDocView.com
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Statistical learning
Statistical learningStatistical learning
Statistical learningSlideshare
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in aiRobert Antony
 
Machine learning in Cyber Security
Machine learning in Cyber SecurityMachine learning in Cyber Security
Machine learning in Cyber SecurityRajathV2
 
Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)VARUN KUMAR
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learningUmmeSalmaM1
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 

Was ist angesagt? (20)

K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Decision tree
Decision treeDecision tree
Decision tree
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Statistical learning
Statistical learningStatistical learning
Statistical learning
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in ai
 
Machine learning in Cyber Security
Machine learning in Cyber SecurityMachine learning in Cyber Security
Machine learning in Cyber Security
 
Decision tree
Decision treeDecision tree
Decision tree
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)
 
Data Mining
Data MiningData Mining
Data Mining
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 

Ähnlich wie Information Retrieval 08

CS583-supervised-learning (1).ppt
CS583-supervised-learning (1).pptCS583-supervised-learning (1).ppt
CS583-supervised-learning (1).pptssuserec53e73
 
CS583-supervised-learning.ppt
CS583-supervised-learning.pptCS583-supervised-learning.ppt
CS583-supervised-learning.pptnaima768128
 
CS583-supervised-learning (3).ppt
CS583-supervised-learning (3).pptCS583-supervised-learning (3).ppt
CS583-supervised-learning (3).pptsagfjhsgh
 
CS583-supervised-learning.ppt CS583-supervised-learning
CS583-supervised-learning.ppt CS583-supervised-learningCS583-supervised-learning.ppt CS583-supervised-learning
CS583-supervised-learning.ppt CS583-supervised-learningcmpt cmpt
 
CS583-supervised-learning.ppt
CS583-supervised-learning.pptCS583-supervised-learning.ppt
CS583-supervised-learning.pptssuser764276
 
CS583-supervised-learning.ppt
CS583-supervised-learning.pptCS583-supervised-learning.ppt
CS583-supervised-learning.pptCaesarMaulana2
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxnehashanbhag5
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data miningALIZAIB KHAN
 

Ähnlich wie Information Retrieval 08 (20)

Unit-1.ppt
Unit-1.pptUnit-1.ppt
Unit-1.ppt
 
CS583-supervised-learning (1).ppt
CS583-supervised-learning (1).pptCS583-supervised-learning (1).ppt
CS583-supervised-learning (1).ppt
 
CS583-supervised-learning.ppt
CS583-supervised-learning.pptCS583-supervised-learning.ppt
CS583-supervised-learning.ppt
 
CS583-supervised-learning (3).ppt
CS583-supervised-learning (3).pptCS583-supervised-learning (3).ppt
CS583-supervised-learning (3).ppt
 
CS583-supervised-learning.ppt CS583-supervised-learning
CS583-supervised-learning.ppt CS583-supervised-learningCS583-supervised-learning.ppt CS583-supervised-learning
CS583-supervised-learning.ppt CS583-supervised-learning
 
CS583-supervised-learning.ppt
CS583-supervised-learning.pptCS583-supervised-learning.ppt
CS583-supervised-learning.ppt
 
CS583-supervised-learning.ppt
CS583-supervised-learning.pptCS583-supervised-learning.ppt
CS583-supervised-learning.ppt
 
CS583-supervised-learning.ppt
CS583-supervised-learning.pptCS583-supervised-learning.ppt
CS583-supervised-learning.ppt
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Lecture 3 ml
Lecture 3 mlLecture 3 ml
Lecture 3 ml
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
3 classification
3  classification3  classification
3 classification
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data mining
 

Mehr von Jeet Das

Lecture 13
Lecture 13Lecture 13
Lecture 13Jeet Das
 
Lecture 12
Lecture 12Lecture 12
Lecture 12Jeet Das
 
Lecture 11
Lecture 11Lecture 11
Lecture 11Jeet Das
 
Lecture 10
Lecture 10Lecture 10
Lecture 10Jeet Das
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02Jeet Das
 
Information Retrieval 07
Information Retrieval 07Information Retrieval 07
Information Retrieval 07Jeet Das
 
Information Retrieval-06
Information Retrieval-06Information Retrieval-06
Information Retrieval-06Jeet Das
 
Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Jeet Das
 
Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Jeet Das
 
Information Retrieval-1
Information Retrieval-1Information Retrieval-1
Information Retrieval-1Jeet Das
 
Token classification using Bengali Tokenizer
Token classification using Bengali TokenizerToken classification using Bengali Tokenizer
Token classification using Bengali TokenizerJeet Das
 
Silent sound technology
Silent sound technologySilent sound technology
Silent sound technologyJeet Das
 

Mehr von Jeet Das (14)

Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Lecture 12
Lecture 12Lecture 12
Lecture 12
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Lecture 10
Lecture 10Lecture 10
Lecture 10
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02
 
Information Retrieval 07
Information Retrieval 07Information Retrieval 07
Information Retrieval 07
 
Information Retrieval-06
Information Retrieval-06Information Retrieval-06
Information Retrieval-06
 
Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)
 
Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)
 
Information Retrieval-1
Information Retrieval-1Information Retrieval-1
Information Retrieval-1
 
NLP
NLPNLP
NLP
 
Token classification using Bengali Tokenizer
Token classification using Bengali TokenizerToken classification using Bengali Tokenizer
Token classification using Bengali Tokenizer
 
Silent sound technology
Silent sound technologySilent sound technology
Silent sound technology
 

Kürzlich hochgeladen

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 

Kürzlich hochgeladen (20)

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

Information Retrieval 08

  • 1. Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp
  • 2. CS583, Bing Liu, UIC 2 An example application • An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. • A decision is needed: whether to put a new patient in an intensive-care unit. • Due to the high cost of ICU, those patients who may survive less than a month are given higher priority. • Problem: to predict high-risk patients and discriminate them from low-risk patients.
  • 3. CS583, Bing Liu, UIC 3 Another application • A credit card company receives thousands of applications for new cards. Each application contains information about an applicant, – age – Marital status – annual salary – outstanding debts – credit rating – etc. • Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved.
  • 4. CS583, Bing Liu, UIC 4 Machine learning and our focus • Like human learning from past experiences. • A computer does not have “experiences”. • A computer system learns from data, which represent some “past experiences” of an application domain. • Our focus: learn a target function that can be used to predict the values of a discrete class attribute, e.g., approve or not-approved, and high-risk or low risk. • The task is commonly called: Supervised learning, classification, or inductive learning.
  • 5. CS583, Bing Liu, UIC 5 • Data: A set of data records (also called examples, instances or cases) described by – k attributes: A1, A2, … Ak. – a class: Each example is labelled with a pre- defined class. • Goal: To learn a classification model from the data that can be used to predict the classes of new (future, or test) cases/instances. The data and the goal
  • 6. CS583, Bing Liu, UIC 6 An example: data (loan application) Approved or not
  • 7. CS583, Bing Liu, UIC 7 An example: the learning task • Learn a classification model from the data • Use the model to classify future loan applications into – Yes (approved) and – No (not approved) • What is the class for following case/instance?
  • 8. CS583, Bing Liu, UIC 8 Supervised vs. unsupervised Learning • Supervised learning: classification is seen as supervised learning from examples. – Supervision: The data (observations, measurements, etc.) are labeled with pre-defined classes. It is like that a “teacher” gives the classes (supervision). – Test data are classified into these classes too. • Unsupervised learning (clustering) – Class labels of the data are unknown – Given a set of data, the task is to establish the existence of classes or clusters in the data
  • 9. CS583, Bing Liu, UIC 9 Supervised learning process: two steps  Learning (training): Learn a model using the training data  Testing: Test the model using unseen test data to assess the model accuracy , casestestofnumberTotal tionsclassificacorrectofNumber =Accuracy
  • 10. CS583, Bing Liu, UIC 10 What do we mean by learning? • Given – a data set D, – a task T, and – a performance measure M, a computer system is said to learn from D to perform the task T if after learning the system’s performance on T improves as measured by M. • In other words, the learned model helps the system to perform T better as compared to no learning.
  • 11. CS583, Bing Liu, UIC 11 An example • Data: Loan application data • Task: Predict whether a loan should be approved or not. • Performance measure: accuracy. No learning: classify all future applications (test data) to the majority class (i.e., Yes): Accuracy = 9/15 = 60%. • We can do better than 60% with learning.
  • 12. CS583, Bing Liu, UIC 12 Fundamental assumption of learning Assumption: The distribution of training examples is identical to the distribution of test examples (including future unseen examples). • In practice, this assumption is often violated to certain degree. • Strong violations will clearly result in poor classification accuracy. • To achieve good accuracy on the test data, training examples must be sufficiently representative of the test data.
  • 13. The worst pH value at ICU The worst active partial thromboplastin time PH_ICU and APPT_WORST are exactly the two factors (theoretically) advocated to be the most important ones in the study by Rotondo et al., 1997. Analysis of Severe Trauma Patients Data Death 0.0 (0/15) Well 0.82 (9/11) Death 0.0 (0/7) APPT_WORST Well 0.88 (14/16) PH_ICU <78.7 >=78.7 >7.33<7.2 7.2-7.33
  • 14. Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists Breast Cancer Recurrence no rec 125 recurr 39 recurr 27 no_rec 10 Tumor Size no rec 30 recurr 18 Degree of Malig < 3 Involved Nodes Age no rec 4 recurr 1 no rec 32 recurr 0 >= 3 < 15 >= 15 < 3 >= 3
  • 15. Prostate cancer recurrence Secondary Gleason GradeSecondary Gleason Grade NoNo YesYesPSA LevelPSA Level StageStage Primary Gleason GradePrimary Gleason Grade NoNo YesYes NoNo NoNo YesYes 1,2 3 4 5 ≤14.9 >14.9 T1c,T2a, T2b,T2c T1ab,T3 2,3 4
  • 16. CS583, Bing Liu, UIC 16 Introduction • Decision tree learning is one of the most widely used techniques for classification. – Its classification accuracy is competitive with other methods, and – it is very efficient. • The classification model is a tree, called decision tree. • C4.5 by Ross Quinlan is perhaps the best known system. It can be downloaded from the Web.
  • 17. CS583, Bing Liu, UIC 17 The loan data (reproduced) Approved or not
  • 18. CS583, Bing Liu, UIC 18 A decision tree from the loan data  Decision nodes and leaf nodes (classes)
  • 19. CS583, Bing Liu, UIC 19 Use the decision tree No
  • 20. CS583, Bing Liu, UIC 20 Is the decision tree unique?  No. Here is a simpler tree.  We want smaller tree and accurate tree.  Easy to understand and perform better.  Finding the best tree is NP-hard.  All current tree building algorithms are heuristic algorithms
  • 21. An Example Data Set and Decision Tree yes no yes no sunny rainy no med yes small big big outlook company sailboat # Class Outlook Company Sailboat Sail? 1 sunny big small yes 2 sunny med small yes 3 sunny med big yes 4 sunny no small yes 5 sunny big big yes 6 rainy no small no 7 rainy med small yes 8 rainy big big yes 9 rainy no big no 10 rainy med big no Attribute
  • 22. Classification yes no yes no sunny rainy no med yes small big big outlook company sailboat # Class Outlook Company Sailboat Sail? 1 sunny no big ? 2 rainy big small ? Attribute
  • 23. Induction of Decision Trees • Data Set (Learning Set) – Each example = Attributes + Class • Induced description = Decision tree • TDIDT – Top Down Induction of Decision Trees • Recursive Partitioning
  • 24. Some TDIDT Systems • ID3 (Quinlan 79) • CART (Brieman et al. 84) • Assistant (Cestnik et al. 87) • C4.5 (Quinlan 93) • See5 (Quinlan 97) • ... • Orange (Demšar, Zupan 98-03)
  • 25. TDIDT Algorithm • Also known as ID3 (Quinlan) • To construct decision tree T from learning set S: – If all examples in S belong to some class C Then make leaf labeled C – Otherwise • select the “most informative” attribute A • partition S according to A’s values • recursively construct subtrees T1, T2, ..., for the subsets of S
  • 26. Hunt’s Algorithm Don’t Cheat Refund Don’t Cheat Don’t Cheat Yes No Refund Don’t Cheat Yes No Marital Status Don’t Cheat Cheat Single, Divorced Married Taxable Income Don’t Cheat < 80K >= 80K Refund Don’t Cheat Yes No Marital Status Don’t Cheat Cheat Single, Divorced Married
  • 27. TDIDT Algorithm • Resulting tree T is: A v1 T1 T2 Tn v2 vn Attribute A A’s values Subtrees
  • 28. Another Example # Class Outlook Temperature Humidity Windy Play 1 sunny hot high no N 2 sunny hot high yes N 3 overcast hot high no P 4 rainy moderate high no P 5 rainy cold normal no P 6 rainy cold normal yes N 7 overcast cold normal yes P 8 sunny moderate high no N 9 sunny cold normal no P 10 rainy moderate normal no P 11 sunny moderate normal yes P 12 overcast moderate high yes P 13 overcast hot normal no P 14 rainy moderate high yes N Attribute
  • 30. Complicated Tree Temperature Outlook Windy cold moderate hot P sunny rainy N yes no P overcast Outlook sunny rainy P overcast Windy PN yes no Windy NP yes no Humidity P high normal Windy PN yes no Humidity P high normal Outlook N sunny rainy P overcast null
  • 31. Attribute Selection Criteria • Main principle – Select attribute which partitions the learning set into subsets as “pure” as possible • Various measures of purity – Information-theoretic – Gini index – X2 – ReliefF – ... • Various improvements – probability estimates – normalization – binarization, subsetting
  • 32. Information-Theoretic Approach • To classify an object, a certain information is needed – I, information • After we have learned the value of attribute A, we only need some remaining amount of information to classify the object – Ires, residual information • Gain – Gain(A) = I – Ires(A) • The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain
  • 33. Entropy • The average amount of information I needed to classify an object is given by the entropy measure • For a two-class problem: entropy p(c1)
  • 34. Residual Information • After applying attribute A, S is partitioned into subsets according to values v of A • Ires is equal to weighted sum of the amounts of information for the subsets
  • 35. Triangles and Squares # Shape Color Outline Dot 1 green dashed no triange 2 green dashed yes triange 3 yellow dashed no square 4 red dashed no square 5 red solid no square 6 red solid yes triange 7 green solid no square 8 green dashed no triange 9 yellow solid yes square 10 red solid no square 11 green solid yes square 12 yellow dashed yes square 13 yellow solid no square 14 red dashed yes triange Attribute
  • 36. Triangles and Squares . . . . . . # Shape Color Outline Dot 1 green dashed no triange 2 green dashed yes triange 3 yellow dashed no square 4 red dashed no square 5 red solid no square 6 red solid yes triange 7 green solid no square 8 green dashed no triange 9 yellow solid yes square 10 red solid no square 11 green solid yes square 12 yellow dashed yes square 13 yellow solid no square 14 red dashed yes triange Attribute Data Set: A set of classified objects
  • 37. Entropy • 5 triangles • 9 squares • class probabilities • entropy . . . . . .
  • 41. Information Gain of The Attribute • Attributes – Gain(Color) = 0.246 – Gain(Outline) = 0.151 – Gain(Dot) = 0.048 • Heuristics: attribute with the highest gain is chosen • This heuristics is local (local minimization of impurity)
  • 42. . . . . . . .. . . . . Color? red yellow green Gain(Outline) = 0.971 – 0 = 0.971 bits Gain(Dot) = 0.971 – 0.951 = 0.020 bits
  • 43. . . . . . . .. . . . . Color? red yellow green . . Outline? dashed solid Gain(Outline) = 0.971 – 0.951 = 0.020 bits Gain(Dot) = 0.971 – 0 = 0.971 bits
  • 46. A Defect of Ires • Ires favors attributes with many values • Such attribute splits S to many subsets, and if these are small, they will tend to be pure anyway • One way to rectify this is through a corrected measure of information gain ratio.
  • 47. Information Gain Ratio • I(A) is amount of information needed to determine the value of an attribute A • Information gain ratio
  • 49. Information Gain and Information Gain Ratio A |v(A)| Gain(A) GainRatio(A) Color 3 0.247 0.156 Outline 2 0.152 0.152 Dot 2 0.048 0.049
  • 50. Gini Index • Another sensible measure of impurity (i and j are classes) • After applying attribute A, the resulting Gini index is • Gini can be interpreted as expected error rate
  • 53. Gain of Gini Index
  • 54. Three Impurity Measures A Gain(A) GainRatio(A) GiniGain(A) Color 0.247 0.156 0.058 Outline 0.152 0.152 0.046 Dot 0.048 0.049 0.015 • These impurity measures assess the effect of a single attribute • Criterion “most informative” that they define is local (and “myopic”) • It does not reliably predict the effect of several attributes applied jointly

Hinweis der Redaktion

  1. Induction is different from deduction and DBMS does not not support induction; The result of induction is higher-level information or knowledge: general statements about data There are many approaches. Refer to the lecture notes for CS3244 available at the Co-Op. We focus on three approaches here, other examples: Other approaches Instance-based learning other neural networks Concept learning (Version space, Focus, Aq11, …) Genetic algorithms Reinforcement learning