SlideShare a Scribd company logo
1 of 15
ISSUES IN DECISION TREE LEARNING
Practical issues in learning decision trees include
1. Determining how deeply to grow the decision tree,
2. Handling continuous attributes,
choosing an appropriate attribute selection measure,
3. Handling training data with missing attribute values,
4. Handling attributes with differing costs, and
improving computational efficiency.
1. AVOIDING OVERFITTING THE DATA
 When we are designing a machine learning model, a
model is said to be a good machine learning model, if it
generalizes any new input data from the problem domain
in a proper way.
 This helps us to make predictions in the future data, that
data model has never seen.
 Underfitting
 A machine learning algorithm is said to have underfitting
when it cannot capture the underlying trend of the data.
 Underfitting destroys the accuracy of our machine learning
model.
 Its occurrence simply means that our model or the algorithm
does not fit the data well enough.
 It usually happens when we have less data to build an
accurate model and also when we try to build a linear model
with a non-linear data.
 Overfitting
 A machine learning algorithm is said to be overfitted, when
we train it with a lot of data.
 When a model gets trained with so much of data, it starts
learning from the noise and inaccurate data entries in our
data set.
 Then the model does not categorize the data correctly,
because of too much of details and noise.
 A solution to avoid over fitting is using a linear algorithm if
we have linear data or using the parameters like the
maximal depth if we are using decision trees.
 Definition — Overfit: Given a hypothesis space H, a
hypothesis h ∈ H is said to overfit the training data if there
exists some alternative hypothesis h’ ∈ H, such that h has
smaller error than h’ over the training examples, but h’ has
a smaller error than h over the entire distribution of
instances.
 Lets try to understand the effect of adding the following
positive training example, incorrectly labeled as negative, to
the training examples Table.
 <Sunny, Hot, Normal, Strong, ->, Example is noisy because
the correct label is +.
Given the original error-free data, ID3 produces the
decision tree shown in Figure.
AVOIDING OVER FITTING
 There are several approaches to avoiding overfitting in
decision tree learning. These can be grouped into two
classes:

- Pre-pruning (avoidance): Stop growing the tree
earlier, before it reaches the point where it perfectly
classifies the training data

- Post-pruning (recovery): Allow the tree to overfit the
data, and then post-prune the tree
 Criterion used to determine the correct final tree
size
 Use a separate set of examples, distinct from the
training examples, to evaluate the utility of post-
pruning nodes from the tree
 Use all the available data for training, but apply a
statistical test to estimate whether expanding (or
pruning) a particular node is likely to produce an
improvement beyond the training set
1. REDUCED ERROR PRUNING
 How exactly might we use a validation set to prevent
overfitting? One approach, called reduced-error pruning
(Quinlan 1987), is to consider each of the decision nodes in the
tree to be candidates for pruning.
 Reduced-error pruning, is to consider each of the decision
nodes in the tree to be candidates for pruning
 Pruning a decision node consists of removing the subtree
rooted at that node, making it a leaf node, and assigning it the
most common classification of the training examples affiliated
with that node
 Nodes are removed only if the resulting pruned tree performs
no worse than-the original over the validation set.
 Reduced error pruning has the effect that any leaf node added
due to coincidental regularities in the training set is likely to be
pruned because these same coincidences are unlikely to occur
in the validation set
2. RULE POST-PRUNING
 Rule post-pruning involves the following steps:
 Infer the decision tree from the training set, growing the
tree until the training data is fit as well as possible and
allowing over fitting to occur.
 Convert the learned tree into an equivalent set of rules by
creating one rule for each path from the root node to a
leaf node.
 Prune (generalize) each rule by removing any
preconditions that result in improving its estimated
accuracy.
 Sort the pruned rules by their estimated accuracy, and
consider them in this sequence when classifying
subsequent instances.
THERE ARE THREE MAIN ADVANTAGES BY CONVERTING
THE DECISION TREE TO RULES BEFORE PRUNING
 Converting to rules allows distinguishing among the
different contexts in which a decision node is used.
 Because each distinct path through the decision tree
node produces a distinct rule, the pruning decision
regarding that attribute test can be made differently for
each path.
 Converting to rules removes the distinction between
attribute tests that occur near the root of the tree and
those that occur near the leaves.
 Thus, it avoid messy bookkeeping issues such as how to
reorganize the tree if the root node is pruned while
retaining part of the subtree below this test.
 Converting to rules improves readability. Rules are often
easier for to understand.
2. INCORPORATING CONTINUOUS-VALUED
ATTRIBUTES
 Our initial definition of ID3 is restricted to attributes
that take on a discrete set of values.
 1. The target attribute whose value is predicted by
learned tree must be discrete valued.
 2. The attributes tested in the decision nodes of the
tree must also be discrete valued.
 This second restriction can easily be removed so
that continuous-valued decision attributes can be
incorporated into the learned tree.
3. ALTERNATIVE MEASURES FOR SELECTING
ATTRIBUTES
 There is a natural bias in the information gain measure
that favours attributes with many values over those with
few values.
 As an extreme example, consider the attribute Date, which
has a very large number of possible values. What is wrong
with the attribute Date?
 Simply put, it has so many possible values that it is bound
to separate the training examples into very small subsets.
 Because of this, it will have a very high information gain
relative to the training examples.
 How ever, having very high information gain, its a very
poor predictor of the target function over unseen
instances.
4. HANDLING MISSING ATTRIBUTE VALUES
 In certain cases, the available data may be missing
values for some attributes.
 For example, in a medical domain in which we wish
to predict patient outcome based on various
laboratory tests, it may be that the Blood-Test-
Result is available only for a subset of the patients.
 In such cases, it is common to estimate the missing
attribute value based on other examples for which
this attribute has a known value.
5. HANDLING ATTRIBUTES WITH DIFFERING
COSTS
 In some learning tasks the instance attributes may
have associated costs.
 For example, in learning to classify medical
diseases we might describe patients in terms of
attributes such as Temperature, BiopsyResult,
Pulse, BloodTestResults, etc.
 These attributes vary significantly in their costs,
both in terms of monetary cost and cost to patient
comfort.

More Related Content

What's hot

# Neural network toolbox
# Neural network toolbox # Neural network toolbox
# Neural network toolbox VineetKumar508
 
Performance analysis and randamized agoritham
Performance analysis and randamized agorithamPerformance analysis and randamized agoritham
Performance analysis and randamized agorithamlilyMalar1
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 
Congetion Control.pptx
Congetion Control.pptxCongetion Control.pptx
Congetion Control.pptxNaveen Dubey
 
Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplicationKumar
 
Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)Anshul gour
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Aiswaryadevi Jaganmohan
 
AI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemAI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemMohammad Imam Hossain
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back trackingTech_MX
 

What's hot (20)

# Neural network toolbox
# Neural network toolbox # Neural network toolbox
# Neural network toolbox
 
Performance analysis and randamized agoritham
Performance analysis and randamized agorithamPerformance analysis and randamized agoritham
Performance analysis and randamized agoritham
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Congetion Control.pptx
Congetion Control.pptxCongetion Control.pptx
Congetion Control.pptx
 
Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplication
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)Dijkstra & flooding ppt(Routing algorithm)
Dijkstra & flooding ppt(Routing algorithm)
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Greedy algorithm
 
Turbo prolog 2.0 basics
Turbo prolog 2.0 basicsTurbo prolog 2.0 basics
Turbo prolog 2.0 basics
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
 
asymptotic notation
asymptotic notationasymptotic notation
asymptotic notation
 
AI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction ProblemAI 7 | Constraint Satisfaction Problem
AI 7 | Constraint Satisfaction Problem
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
Daa notes 3
Daa notes 3Daa notes 3
Daa notes 3
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back tracking
 
Computer networks - Channelization
Computer networks - ChannelizationComputer networks - Channelization
Computer networks - Channelization
 

Similar to Issues in DTL.pptx

Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONaciijournal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challengesijcnes
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkGabriel Hughes PhD
 

Similar to Issues in DTL.pptx (20)

Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
dm1.pdf
dm1.pdfdm1.pdf
dm1.pdf
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talk
 

More from Ramakrishna Reddy Bijjam

Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxRamakrishna Reddy Bijjam
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxRamakrishna Reddy Bijjam
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxRamakrishna Reddy Bijjam
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxRamakrishna Reddy Bijjam
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxRamakrishna Reddy Bijjam
 

More from Ramakrishna Reddy Bijjam (20)

Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
 
Apriori.pptx
Apriori.pptxApriori.pptx
Apriori.pptx
 
Eclat.pptx
Eclat.pptxEclat.pptx
Eclat.pptx
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Issues in DTL.pptx

  • 1. ISSUES IN DECISION TREE LEARNING Practical issues in learning decision trees include 1. Determining how deeply to grow the decision tree, 2. Handling continuous attributes, choosing an appropriate attribute selection measure, 3. Handling training data with missing attribute values, 4. Handling attributes with differing costs, and improving computational efficiency.
  • 2. 1. AVOIDING OVERFITTING THE DATA  When we are designing a machine learning model, a model is said to be a good machine learning model, if it generalizes any new input data from the problem domain in a proper way.  This helps us to make predictions in the future data, that data model has never seen.
  • 3.  Underfitting  A machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data.  Underfitting destroys the accuracy of our machine learning model.  Its occurrence simply means that our model or the algorithm does not fit the data well enough.  It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear data.
  • 4.  Overfitting  A machine learning algorithm is said to be overfitted, when we train it with a lot of data.  When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries in our data set.  Then the model does not categorize the data correctly, because of too much of details and noise.  A solution to avoid over fitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.
  • 5.  Definition — Overfit: Given a hypothesis space H, a hypothesis h ∈ H is said to overfit the training data if there exists some alternative hypothesis h’ ∈ H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances.  Lets try to understand the effect of adding the following positive training example, incorrectly labeled as negative, to the training examples Table.  <Sunny, Hot, Normal, Strong, ->, Example is noisy because the correct label is +. Given the original error-free data, ID3 produces the decision tree shown in Figure.
  • 6.
  • 7. AVOIDING OVER FITTING  There are several approaches to avoiding overfitting in decision tree learning. These can be grouped into two classes:  - Pre-pruning (avoidance): Stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data  - Post-pruning (recovery): Allow the tree to overfit the data, and then post-prune the tree
  • 8.  Criterion used to determine the correct final tree size  Use a separate set of examples, distinct from the training examples, to evaluate the utility of post- pruning nodes from the tree  Use all the available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set
  • 9. 1. REDUCED ERROR PRUNING  How exactly might we use a validation set to prevent overfitting? One approach, called reduced-error pruning (Quinlan 1987), is to consider each of the decision nodes in the tree to be candidates for pruning.  Reduced-error pruning, is to consider each of the decision nodes in the tree to be candidates for pruning  Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node  Nodes are removed only if the resulting pruned tree performs no worse than-the original over the validation set.  Reduced error pruning has the effect that any leaf node added due to coincidental regularities in the training set is likely to be pruned because these same coincidences are unlikely to occur in the validation set
  • 10. 2. RULE POST-PRUNING  Rule post-pruning involves the following steps:  Infer the decision tree from the training set, growing the tree until the training data is fit as well as possible and allowing over fitting to occur.  Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.  Prune (generalize) each rule by removing any preconditions that result in improving its estimated accuracy.  Sort the pruned rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances.
  • 11. THERE ARE THREE MAIN ADVANTAGES BY CONVERTING THE DECISION TREE TO RULES BEFORE PRUNING  Converting to rules allows distinguishing among the different contexts in which a decision node is used.  Because each distinct path through the decision tree node produces a distinct rule, the pruning decision regarding that attribute test can be made differently for each path.  Converting to rules removes the distinction between attribute tests that occur near the root of the tree and those that occur near the leaves.  Thus, it avoid messy bookkeeping issues such as how to reorganize the tree if the root node is pruned while retaining part of the subtree below this test.  Converting to rules improves readability. Rules are often easier for to understand.
  • 12. 2. INCORPORATING CONTINUOUS-VALUED ATTRIBUTES  Our initial definition of ID3 is restricted to attributes that take on a discrete set of values.  1. The target attribute whose value is predicted by learned tree must be discrete valued.  2. The attributes tested in the decision nodes of the tree must also be discrete valued.  This second restriction can easily be removed so that continuous-valued decision attributes can be incorporated into the learned tree.
  • 13. 3. ALTERNATIVE MEASURES FOR SELECTING ATTRIBUTES  There is a natural bias in the information gain measure that favours attributes with many values over those with few values.  As an extreme example, consider the attribute Date, which has a very large number of possible values. What is wrong with the attribute Date?  Simply put, it has so many possible values that it is bound to separate the training examples into very small subsets.  Because of this, it will have a very high information gain relative to the training examples.  How ever, having very high information gain, its a very poor predictor of the target function over unseen instances.
  • 14. 4. HANDLING MISSING ATTRIBUTE VALUES  In certain cases, the available data may be missing values for some attributes.  For example, in a medical domain in which we wish to predict patient outcome based on various laboratory tests, it may be that the Blood-Test- Result is available only for a subset of the patients.  In such cases, it is common to estimate the missing attribute value based on other examples for which this attribute has a known value.
  • 15. 5. HANDLING ATTRIBUTES WITH DIFFERING COSTS  In some learning tasks the instance attributes may have associated costs.  For example, in learning to classify medical diseases we might describe patients in terms of attributes such as Temperature, BiopsyResult, Pulse, BloodTestResults, etc.  These attributes vary significantly in their costs, both in terms of monetary cost and cost to patient comfort.