Lesson 35

•

0 gefällt mir•145 views

Avijit Kumar

Business Technologie

Module
12
Machine Learning
Version 2 CSE IIT, Kharagpur

Lesson
35
Rule Induction and
Decision Tree - I
Version 2 CSE IIT, Kharagpur

12.3 Decision Trees
Decision trees are a class of learning models that are more robust to noise as well as more
powerful as compared to concept learning. Consider the problem of classifying a star
based on some astronomical measurements. It can naturally be represented by the
following set of decisions on each measurement arranged in a tree like fashion.

Luminosity

Mass <= T1 > T1

<= T2 > T2 Type C

Type A Type B

12.3.1 Decision Tree: Definition
• A decision-tree learning algorithm approximates a target concept using a tree
representation, where each internal node corresponds to an attribute, and every
terminal node corresponds to a class.
• There are two types of nodes:
o Internal node.- Splits into different branches according to the different values
the corresponding attribute can take. Example: luminosity <= T1 or
luminosity > T1.
o Terminal Node.- Decides the class assigned to the example.

12.3.2 Classifying Examples Using Decision Tree
To classify an example X we start at the root of the tree, and check the value of that
attribute on X. We follow the branch corresponding to that value and jump to the next
node. We continue until we reach a terminal node and take that class as our best
prediction.

Version 2 CSE IIT, Kharagpur

X = (Luminosity <= T1, Mass > T2)

Luminosity

<= T1 > T1
M as s

<= T2 > T2 Type C

Type B Assigned Class
Type A

Decision trees adopt a DNF (Disjunctive Normal Form) representation. For a fixed class,
every branch from the root of the tree to a terminal node with that class is a conjunction
of attribute values; different branches ending in that class form a disjunction.

In the following example, the rules for class A are: (~X1 & ~x2) OR (X1 & ~x3)
x1
0 1
x x

0 1 0 1

B A C

12.3.3 Decision Tree Construction
There are different ways to construct trees from data. We will concentrate on the top-
down, greedy search approach:

Basic idea:

1. Choose the best attribute a* to place at the root of the tree.

Version 2 CSE IIT, Kharagpur

2. Separate training set D into subsets {D1, D2, .., Dk} where each subset Di contains
examples having the same value for a*

3. Recursively apply the algorithm on each new subset until examples have the same
class or there are few of them.

Illustration:

t2 Class P: Poisonous
humidity
Class N: Non-
t3 poisonous

t1 size

Attributes: size and humidity.
Size has two values: >t1 or <= t1
Humidity has three values: >t2, (>t3 and <=t2), <= t3

Suppose we choose size as the best

t2
size
t3
<= T1 > T1

t1 ?
P
Class P: poisonous
Class N: not-poisonous

Version 2 CSE IIT, Kharagpur

Suppose we choose humidity as the next best

t2
size
t3
<= T1 > T1

t1 humidity
P
>t2 <= t3
> t3 & <= t2

N P N
Steps:

• Create a root for the tree
• If all examples are of the same class or the number of examples is below a threshold
return that class
• If no attributes available return majority class
• Let a* be the best attribute
• For each possible value v of a*
• Add a branch below a* labeled “a = v”
• Let Sv be the subsets of example where attribute a*=v
• Recursively apply the algorithm to Sv

Version 2 CSE IIT, Kharagpur

Empfohlen

Integrated methods for optimizationSpringer

K nearest neighborUjjawal

Linear Discriminant Analysis and Its Generalization일상 온

Lec 4,5alaa223

模糊决策树—Soft decision treebookcold yuan

Bq25399403IJERA Editor

Usage of Different Matrix Operation for MIMO CommunicationVARUN KUMAR

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher

Empfohlen

Integrated methods for optimizationSpringer

K nearest neighborUjjawal

Linear Discriminant Analysis and Its Generalization일상 온

Lec 4,5alaa223

模糊决策树—Soft decision treebookcold yuan

Bq25399403IJERA Editor

Usage of Different Matrix Operation for MIMO CommunicationVARUN KUMAR

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher

CommunicationComplexity1_jierenjie ren

E2lksoo

Kernels and Support Vector MachinesEdgar Marca

Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang

Time Domain Signal Analysis Using Modified Haar and Modified Daubechies Wavel...CSCJournals

Support vector machinePrasenjit Dey

Machine Learning and Data Mining - Decision Treeswebisslides

Stringsreet_kaur001

Stringsmandeepcse

20070823neostar

Numerical solution of fuzzy hybrid differential equation by third order runge...Alexander Decker

11.numerical solution of fuzzy hybrid differential equation by third order ru...Alexander Decker

11.[8 17]numerical solution of fuzzy hybrid differential equation by third or...Alexander Decker

Combinatorial Problems23ashmawy

Probabilistic information retrieval models & systemsSelman Bozkır

Svm algorithmWaleed Khan

Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah

Ai unit-3Tribhuvan University

Lesson 20Avijit Kumar

Lesson 18Avijit Kumar

Lesson 23Avijit Kumar

Lesson 19Avijit Kumar

Weitere ähnliche Inhalte

Was ist angesagt?

CommunicationComplexity1_jierenjie ren

E2lksoo

Kernels and Support Vector MachinesEdgar Marca

Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang

Time Domain Signal Analysis Using Modified Haar and Modified Daubechies Wavel...CSCJournals

Support vector machinePrasenjit Dey

Machine Learning and Data Mining - Decision Treeswebisslides

Stringsreet_kaur001

Stringsmandeepcse

20070823neostar

Numerical solution of fuzzy hybrid differential equation by third order runge...Alexander Decker

11.numerical solution of fuzzy hybrid differential equation by third order ru...Alexander Decker

11.[8 17]numerical solution of fuzzy hybrid differential equation by third or...Alexander Decker

Combinatorial Problems23ashmawy

Probabilistic information retrieval models & systemsSelman Bozkır

Svm algorithmWaleed Khan

Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah

Ai unit-3Tribhuvan University

Was ist angesagt? (18)

CommunicationComplexity1_jieren

Kernels and Support Vector Machines

Hands-on Tutorial of Machine Learning in Python

Time Domain Signal Analysis Using Modified Haar and Modified Daubechies Wavel...

Support vector machine

Machine Learning and Data Mining - Decision Trees

Strings

20070823

Numerical solution of fuzzy hybrid differential equation by third order runge...

11.numerical solution of fuzzy hybrid differential equation by third order ru...

11.[8 17]numerical solution of fuzzy hybrid differential equation by third or...

Combinatorial Problems2

Probabilistic information retrieval models & systems

Svm algorithm

Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...

Ai unit-3

Andere mochten auch

Lesson 20Avijit Kumar

Lesson 18Avijit Kumar

Lesson 23Avijit Kumar

Lesson 19Avijit Kumar

Lesson 25Avijit Kumar

Lesson 21Avijit Kumar

Inaugural AddressesBooz Allen Hamilton

Teaching Students with Emojis, Emoticons, & TextspeakShelly Sanchez Terrell

Hype vs. Reality: The AI ExplainerLuminary Labs

Andere mochten auch (9)

Lesson 20

Lesson 18

Lesson 23

Lesson 19

Lesson 25

Lesson 21

Inaugural Addresses

Teaching Students with Emojis, Emoticons, & Textspeak

Hype vs. Reality: The AI Explainer

Ähnlich wie Lesson 35

CLIQUE Automatic subspace clustering of high dimensional data for data mining...Raed Aldahdooh

20070702 Text Categorizationmidi

Divide and ConquerDr Shashikant Athawale

Decision Trees.pptmuhammadabdullah400131

Neural network precept diagnosis on petrochemical pipelines for quality maint...Alexander Decker

Mining Implications from Lattices of Closed TreesAlbert Bifet

20211019 When does label smoothing help_shared verHsing-chuan Hsieh

Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo

Ähnlich wie Lesson 35 (8)

CLIQUE Automatic subspace clustering of high dimensional data for data mining...

20070702 Text Categorization

Divide and Conquer

Decision Trees.ppt

Neural network precept diagnosis on petrochemical pipelines for quality maint...

Mining Implications from Lattices of Closed Trees

20211019 When does label smoothing help_shared ver

Aaa ped-17-Unsupervised Learning: Dimensionality reduction

Mehr von Avijit Kumar

Lesson 24Avijit Kumar

Lesson 22Avijit Kumar

Lesson 26Avijit Kumar

Lesson 27Avijit Kumar

Lesson 28Avijit Kumar

Lesson 29Avijit Kumar

Lesson 30Avijit Kumar

Lesson 31Avijit Kumar

Lesson 32Avijit Kumar

Lesson 33Avijit Kumar

Lesson 36Avijit Kumar

Lesson 37Avijit Kumar

Lesson 39Avijit Kumar

Lesson 38Avijit Kumar

Lesson 41Avijit Kumar

Lesson 40Avijit Kumar

Mehr von Avijit Kumar (16)

Lesson 24

Lesson 22

Lesson 26

Lesson 27

Lesson 28

Lesson 29

Lesson 30

Lesson 31

Lesson 32

Lesson 33

Lesson 36

Lesson 37

Lesson 39

Lesson 38

Lesson 41

Lesson 40

Kürzlich hochgeladen

KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account

A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823

VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Grateful 7 speech thanking everyone that has helped.pdfPaul Menig

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club

Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen

The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee

RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data

Forklift Operations: Safety through CartoonsForklift Trucks in Minnesota

Monthly Social Media Update April 2024 pptx.pptxAndy Lambert

7.pdf This presentation captures many uses and the significance of the number...Paul Menig

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage

Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla

It will be International Nurses' Day on 12 MayNZSG

VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823

HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE AbudhabiAbortion pills in Kuwait Cytotec pills in Kuwait

Insurers' journeys to build a mastery in the IoT usageMatteo Carbone

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9

Kürzlich hochgeladen (20)

KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...

A DAY IN THE LIFE OF A SALESMAN / WOMAN

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...

VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...

Grateful 7 speech thanking everyone that has helped.pdf

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...

Boost the utilization of your HCL environment by reevaluating use cases and f...

The Coffee Bean & Tea Leaf(CBTL), Business strategy case study

RSA Conference Exhibitor List 2024 - Exhibitors Data

Forklift Operations: Safety through Cartoons

Monthly Social Media Update April 2024 pptx.pptx

7.pdf This presentation captures many uses and the significance of the number...

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...

Monte Carlo simulation : Simulation using MCSM

It will be International Nurses' Day on 12 May

VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒

HONOR Veterans Event Keynote by Michael Hawkins

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi

Insurers' journeys to build a mastery in the IoT usage

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...

Lesson 35

1. Module 12 Machine Learning Version 2 CSE IIT, Kharagpur

2. Lesson 35 Rule Induction and Decision Tree - I Version 2 CSE IIT, Kharagpur

3. 12.3 Decision Trees Decision trees are a class of learning models that are more robust to noise as well as more powerful as compared to concept learning. Consider the problem of classifying a star based on some astronomical measurements. It can naturally be represented by the following set of decisions on each measurement arranged in a tree like fashion. Luminosity Mass <= T1 > T1 <= T2 > T2 Type C Type A Type B 12.3.1 Decision Tree: Definition • A decision-tree learning algorithm approximates a target concept using a tree representation, where each internal node corresponds to an attribute, and every terminal node corresponds to a class. • There are two types of nodes: o Internal node.- Splits into different branches according to the different values the corresponding attribute can take. Example: luminosity <= T1 or luminosity > T1. o Terminal Node.- Decides the class assigned to the example. 12.3.2 Classifying Examples Using Decision Tree To classify an example X we start at the root of the tree, and check the value of that attribute on X. We follow the branch corresponding to that value and jump to the next node. We continue until we reach a terminal node and take that class as our best prediction. Version 2 CSE IIT, Kharagpur

4. X = (Luminosity <= T1, Mass > T2) Luminosity <= T1 > T1 M as s <= T2 > T2 Type C Type B Assigned Class Type A Decision trees adopt a DNF (Disjunctive Normal Form) representation. For a fixed class, every branch from the root of the tree to a terminal node with that class is a conjunction of attribute values; different branches ending in that class form a disjunction. In the following example, the rules for class A are: (~X1 & ~x2) OR (X1 & ~x3) x1 0 1 x x 0 1 0 1 B A C 12.3.3 Decision Tree Construction There are different ways to construct trees from data. We will concentrate on the top- down, greedy search approach: Basic idea: 1. Choose the best attribute a* to place at the root of the tree. Version 2 CSE IIT, Kharagpur

5. 2. Separate training set D into subsets {D1, D2, .., Dk} where each subset Di contains examples having the same value for a* 3. Recursively apply the algorithm on each new subset until examples have the same class or there are few of them. Illustration: t2 Class P: Poisonous humidity Class N: Non- t3 poisonous t1 size Attributes: size and humidity. Size has two values: >t1 or <= t1 Humidity has three values: >t2, (>t3 and <=t2), <= t3 Suppose we choose size as the best t2 size t3 <= T1 > T1 t1 ? P Class P: poisonous Class N: not-poisonous Version 2 CSE IIT, Kharagpur

6. Suppose we choose humidity as the next best t2 size t3 <= T1 > T1 t1 humidity P >t2 <= t3 > t3 & <= t2 N P N Steps: • Create a root for the tree • If all examples are of the same class or the number of examples is below a threshold return that class • If no attributes available return majority class • Let a* be the best attribute • For each possible value v of a* • Add a branch below a* labeled “a = v” • Let Sv be the subsets of example where attribute a*=v • Recursively apply the algorithm to Sv Version 2 CSE IIT, Kharagpur