SlideShare ist ein Scribd-Unternehmen logo
1 von 16
MARKET BASKET ANALYSIS
LEARNING OBJECTIVES:
• EXPLAINWHAT ASSOCIATION RULESAND ITEM SETSARE.
• DESCRIBETHE BASIC PROCESS FOR MARKET BASKETANALYSIS.
• KNOWWHENTOAPPLY MARKET BASKETANALYSIS.
• UNDERSTANDTHE STRENGTHSANDWEAKNESSESOF MARKET BASKETANALYSIS.
By Obakeng Brian Pheelwane & Marc Berman – Group 14
ASSOCIATION RULES
• Association discovery is to find items that imply the presence of other items in the same transaction.
• Association rules are in the form:
If <Left hand Side (LHS)> then <Right hand Side(RHS)>
• To indicate the validity and importance of a rule, each rule has two parameters:
• Support Factor
• Confidence Factor.
• RHS usually has one item; LHS has one or more items.
EXAMPLES OF ASSOCIATION RULES
In a database of transactions of two items (X,Y) in a departmental store, an example association rule is:
Thus if a customer buys X (which occurs in 70% of the cases studied) then he/she will also buyY.This
occurs 20% of all purchases made at the departmental store.
Therefore, the rule has a 70% confidence factor and 20% support factor.
CONFIDENCE AND SUPPORT FACTORS
• Assume we have an association rule indicated as LHS -> RHS.
• T is the total number of cases in the database.
• X is the number of cases covered by the LHS.
• Y is the number of cases covered by the RHS.
• XY is the number of cases covered by both the LHS and the RHS,
indicated by the overlapping area in Figure 1.Figure 1: Confidence and support factors visualised
Figure 2: Confidence and support factor formula
• Confidence factor is calculated based upon the number of cases present
in both the left and right hand sides of the scenario, divided by the total
number of cases in the left hand side.
• Support factor is calculated based upon the number of cases present in
both the left and right hand sides of the scenario, divided by the total
number of cases in the database.
THE BASIC PROCESS OF MARKET BASKET ANALYSIS
1. Choosing the right item set.
• The objective is to define a set of items.When association rules are formed among these items, some
of the rules provide a meaningful interpretation that may lead to useful rules.
• Several methods to generating the right item sets:
• Use taxonomy to get the right level, range from general items to special items (see Figure 3)
• Use virtual items (see Figures 4 & 5)
• A combination of both
• The taxonomy and virtual items (to be prepared by the users or domain expert) become the means to
assist users to choose the right item set during the exploration to find useful rules.
Figure 3:Taxonomies start with the most general and move to increasing detail.
Figure 4:This is an example of poor choice of virtual items since the rules are likely to be
redundant.
The problem with this visualisation is the rules are just repeats of the definition.
Figure 5:This is an example of a good choice of virtual items, though one must be careful
to not totally encompass the items used for analysis as this would create redundancy
again.
BASIC PROCESS CONTINUED
2. Generating rules:
• The rule generation process involves generating the co-occurrence matrix, counting the frequencies of
co-occurrence between n items in the item set.
• To generate a rule of n item of the form:
If X1, X2,…,X(n-1)Then Xn
A co-occurrence matrix of n items is required.
Number of items on LHS Total number of combinations
1 100
2 4,950
3 161,700
4 3,921,225
5 75,287,520
6 1,192,052,400
7 16,007,560,800
8 186,087,894,300
Figure 5:This is a computationally expensive process, especially when a
large data set is present.
BASIC PROCESS CONTINUED
3. Identifying useful rules that are unknown, valid and actionable.
• First, specify the threshold values for confidence factor and support factor to filter out rules which are
not supported by the data automatically by the rule generation algorithm.
• Second, human judgement is required to identify the interestingness, validity and actionability of the
rules which have sifted through the automatic filter.
WHENTO APPLY MARKET BASKET ANALYSIS
• Problems that consist of well-defined items that group together in potentially interesting ways.
• Time-series problems that can be adapted for market basket analysis by relatively simple data
transformations.
STRENGTHS ANDWEAKNESSES
Strengths:
• Clear and understandable results
• Support undirected data mining
• Work on variable-length data
• Simple computational process
Weaknesses:
• Computation increases exponentially as
• the problem size grows.
• Limited support for attributes on the data.
• Difficult to determine the right number of items.
• It discounts rare items.
DISSOCIATION RULES
• Similar to association rules except that a negation “NOT” is used to an item. An example of
dissociation rule is:
• If X and notY then Z.
Problems with dissociation rules:
1. Doubling the items significantly slows down the runtime
2. The size of transactions grows because it includes inverted items
3. Tend to produce rules in which all items are inverted because the frequency of the inverted items are
usually much larger.
WHATWE HAVE LEARNED:
• We have learned about association and dissociation rules.
• How to generate more specialised items using taxonomy and virtual items.
• When to apply Market Basket Analysis
• Finally, the strengths and weaknesses of Market Basket Analysis
REVIEW QUESTIONS
1. Discuss the similarities and differences between a decision rule and an association rule in terms of rule structure and
how it is used.
Decision rule (Separate-and-conquer)
Decision rules are closely related to decision trees.The terminal nodes of a tree can be grouped into rules. Attempts to find a
partial solution for a part of a problem. Looking for the optimal solution to the problem
How it is used:
- One partial solution in each step
Association rule
An association rule does not have a target. It finds all rules that exist in data. Attempts to find a full set of solutions of a problem.
Looking for the optimal solution to the problem.
How it is used:
- Multiple combinations in each step
2. Discuss the due caution one should have when applying association rules. Relate your explanation to
the definition of data mining: Data mining is a process of extracting previously unknown, valid, and
actionable information from large databases and then using the information to make crucial decisions.
REVIEW QUESTIONS
3. Compare the model selection process in predictive modelling with the similar process in market basket
analysis. Answer the following questions in your comparison:
i. What is a model?
A model in predictive modelling tasks is one built to make prediction for unseen data.
E.g. the trained model is used to make a positive or negative diagnosis about a disease for a new patient.
A model in market basket analysis is in the form of a set of rules that describes the association between
attributes and they are not meant for prediction.
ii. How do the model selection processes differ?
The model selection process in predictive modelling is guided by maximising a measure determined
during the problem definition step, this process can be carried out objectively.
The model selection process in the market basket analysis is more subjective, although a few measures
can be used to reduce the set of candidate rules.
REVIEW QUESTIONS

Weitere ähnliche Inhalte

Was ist angesagt?

Decision tree
Decision treeDecision tree
Decision tree
R A Akerkar
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
Bilkent University
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
Girish Dhareshwar
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
GauravSonawane51
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 

Was ist angesagt? (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Market Basket Analysis in SAS
Market Basket Analysis in SASMarket Basket Analysis in SAS
Market Basket Analysis in SAS
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 

Andere mochten auch

Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with Hadoop
DataWorks Summit
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basket
Swapnil Soni
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 

Andere mochten auch (17)

Real-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with HadoopReal-time Market Basket Analysis for Retail with Hadoop
Real-time Market Basket Analysis for Retail with Hadoop
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basket
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Increasing Order Size With Basket Analysis
Increasing Order Size With Basket AnalysisIncreasing Order Size With Basket Analysis
Increasing Order Size With Basket Analysis
 
Mining Fuzzy Moving Object Clusters
Mining Fuzzy Moving Object ClustersMining Fuzzy Moving Object Clusters
Mining Fuzzy Moving Object Clusters
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
PhD Defense -- Ashish Mangalampalli
PhD Defense -- Ashish MangalampalliPhD Defense -- Ashish Mangalampalli
PhD Defense -- Ashish Mangalampalli
 
Luftwaffe in pictures
Luftwaffe in picturesLuftwaffe in pictures
Luftwaffe in pictures
 
Python programming advance lab api npr 2
Python programming advance lab api npr  2Python programming advance lab api npr  2
Python programming advance lab api npr 2
 
Ijcatr04051004
Ijcatr04051004Ijcatr04051004
Ijcatr04051004
 
Why Awareness of Cognitive Dissonance Is So Elusive
Why Awareness of Cognitive Dissonance Is So ElusiveWhy Awareness of Cognitive Dissonance Is So Elusive
Why Awareness of Cognitive Dissonance Is So Elusive
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means
 
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud ComputingMarket Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
 

Ă„hnlich wie Masket Basket Analysis

2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
FEG
 
Cluster2
Cluster2Cluster2
Cluster2
work
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
HarshitGoel87
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126
IJRAT
 
Module 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxModule 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptx
IqbalAli61
 

Ă„hnlich wie Masket Basket Analysis (20)

2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Cluster2
Cluster2Cluster2
Cluster2
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
 
An Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional DatasetAn Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional Dataset
 
Analysis in Action 21 September 2021
Analysis in Action 21 September 2021Analysis in Action 21 September 2021
Analysis in Action 21 September 2021
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data Analytics
 
A literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesA literature review of modern association rule mining techniques
A literature review of modern association rule mining techniques
 
Ae32208215
Ae32208215Ae32208215
Ae32208215
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
IJCS_37_4_06
IJCS_37_4_06IJCS_37_4_06
IJCS_37_4_06
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Module 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxModule 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptx
 

KĂĽrzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

KĂĽrzlich hochgeladen (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Masket Basket Analysis

  • 1. MARKET BASKET ANALYSIS LEARNING OBJECTIVES: • EXPLAINWHAT ASSOCIATION RULESAND ITEM SETSARE. • DESCRIBETHE BASIC PROCESS FOR MARKET BASKETANALYSIS. • KNOWWHENTOAPPLY MARKET BASKETANALYSIS. • UNDERSTANDTHE STRENGTHSANDWEAKNESSESOF MARKET BASKETANALYSIS. By Obakeng Brian Pheelwane & Marc Berman – Group 14
  • 2. ASSOCIATION RULES • Association discovery is to find items that imply the presence of other items in the same transaction. • Association rules are in the form: If <Left hand Side (LHS)> then <Right hand Side(RHS)> • To indicate the validity and importance of a rule, each rule has two parameters: • Support Factor • Confidence Factor. • RHS usually has one item; LHS has one or more items.
  • 3. EXAMPLES OF ASSOCIATION RULES In a database of transactions of two items (X,Y) in a departmental store, an example association rule is: Thus if a customer buys X (which occurs in 70% of the cases studied) then he/she will also buyY.This occurs 20% of all purchases made at the departmental store. Therefore, the rule has a 70% confidence factor and 20% support factor.
  • 4. CONFIDENCE AND SUPPORT FACTORS • Assume we have an association rule indicated as LHS -> RHS. • T is the total number of cases in the database. • X is the number of cases covered by the LHS. • Y is the number of cases covered by the RHS. • XY is the number of cases covered by both the LHS and the RHS, indicated by the overlapping area in Figure 1.Figure 1: Confidence and support factors visualised Figure 2: Confidence and support factor formula • Confidence factor is calculated based upon the number of cases present in both the left and right hand sides of the scenario, divided by the total number of cases in the left hand side. • Support factor is calculated based upon the number of cases present in both the left and right hand sides of the scenario, divided by the total number of cases in the database.
  • 5. THE BASIC PROCESS OF MARKET BASKET ANALYSIS 1. Choosing the right item set. • The objective is to define a set of items.When association rules are formed among these items, some of the rules provide a meaningful interpretation that may lead to useful rules. • Several methods to generating the right item sets: • Use taxonomy to get the right level, range from general items to special items (see Figure 3) • Use virtual items (see Figures 4 & 5) • A combination of both • The taxonomy and virtual items (to be prepared by the users or domain expert) become the means to assist users to choose the right item set during the exploration to find useful rules.
  • 6. Figure 3:Taxonomies start with the most general and move to increasing detail.
  • 7. Figure 4:This is an example of poor choice of virtual items since the rules are likely to be redundant. The problem with this visualisation is the rules are just repeats of the definition. Figure 5:This is an example of a good choice of virtual items, though one must be careful to not totally encompass the items used for analysis as this would create redundancy again.
  • 8. BASIC PROCESS CONTINUED 2. Generating rules: • The rule generation process involves generating the co-occurrence matrix, counting the frequencies of co-occurrence between n items in the item set. • To generate a rule of n item of the form: If X1, X2,…,X(n-1)Then Xn A co-occurrence matrix of n items is required. Number of items on LHS Total number of combinations 1 100 2 4,950 3 161,700 4 3,921,225 5 75,287,520 6 1,192,052,400 7 16,007,560,800 8 186,087,894,300 Figure 5:This is a computationally expensive process, especially when a large data set is present.
  • 9. BASIC PROCESS CONTINUED 3. Identifying useful rules that are unknown, valid and actionable. • First, specify the threshold values for confidence factor and support factor to filter out rules which are not supported by the data automatically by the rule generation algorithm. • Second, human judgement is required to identify the interestingness, validity and actionability of the rules which have sifted through the automatic filter.
  • 10. WHENTO APPLY MARKET BASKET ANALYSIS • Problems that consist of well-defined items that group together in potentially interesting ways. • Time-series problems that can be adapted for market basket analysis by relatively simple data transformations.
  • 11. STRENGTHS ANDWEAKNESSES Strengths: • Clear and understandable results • Support undirected data mining • Work on variable-length data • Simple computational process Weaknesses: • Computation increases exponentially as • the problem size grows. • Limited support for attributes on the data. • Difficult to determine the right number of items. • It discounts rare items.
  • 12. DISSOCIATION RULES • Similar to association rules except that a negation “NOT” is used to an item. An example of dissociation rule is: • If X and notY then Z. Problems with dissociation rules: 1. Doubling the items significantly slows down the runtime 2. The size of transactions grows because it includes inverted items 3. Tend to produce rules in which all items are inverted because the frequency of the inverted items are usually much larger.
  • 13. WHATWE HAVE LEARNED: • We have learned about association and dissociation rules. • How to generate more specialised items using taxonomy and virtual items. • When to apply Market Basket Analysis • Finally, the strengths and weaknesses of Market Basket Analysis
  • 14. REVIEW QUESTIONS 1. Discuss the similarities and differences between a decision rule and an association rule in terms of rule structure and how it is used. Decision rule (Separate-and-conquer) Decision rules are closely related to decision trees.The terminal nodes of a tree can be grouped into rules. Attempts to find a partial solution for a part of a problem. Looking for the optimal solution to the problem How it is used: - One partial solution in each step Association rule An association rule does not have a target. It finds all rules that exist in data. Attempts to find a full set of solutions of a problem. Looking for the optimal solution to the problem. How it is used: - Multiple combinations in each step
  • 15. 2. Discuss the due caution one should have when applying association rules. Relate your explanation to the definition of data mining: Data mining is a process of extracting previously unknown, valid, and actionable information from large databases and then using the information to make crucial decisions. REVIEW QUESTIONS
  • 16. 3. Compare the model selection process in predictive modelling with the similar process in market basket analysis. Answer the following questions in your comparison: i. What is a model? A model in predictive modelling tasks is one built to make prediction for unseen data. E.g. the trained model is used to make a positive or negative diagnosis about a disease for a new patient. A model in market basket analysis is in the form of a set of rules that describes the association between attributes and they are not meant for prediction. ii. How do the model selection processes differ? The model selection process in predictive modelling is guided by maximising a measure determined during the problem definition step, this process can be carried out objectively. The model selection process in the market basket analysis is more subjective, although a few measures can be used to reduce the set of candidate rules. REVIEW QUESTIONS