SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Association Rule Mining with R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
2 / 30
Association Rule Mining with R 1 
I basic concepts of association rules 
I association rules mining with R 
I pruning redundant rules 
I interpreting and visualizing association rules 
I recommended readings 
1Chapter 9: Association Rules, R and Data Mining: Examples and Case 
Studies. http://www.rdatamining.com/docs/RDataMining.pdf 
3 / 30
Association Rules 
Association rules are rules presenting association or correlation 
between itemsets. 
support(A ) B) = P(A [ B) 
con
dence(A ) B) = P(BjA) 
= 
P(A [ B) 
P(A) 
lift(A ) B) = 
con
dence(A ) B) 
P(B) 
= 
P(A [ B) 
P(A)P(B) 
where P(A) is the percentage (or probability) of cases containing 
A. 
4 / 30
Association Rule Mining Algorithms in R 
I APRIORI 
I a level-wise, breadth-
rst algorithm which counts transactions 
to
nd frequent itemsets and then derive association rules from 
them 
I apriori() in package arules 
I ECLAT 
I
nds frequent itemsets with equivalence classes, depth-
rst 
search and set intersection instead of counting 
I eclat() in the same package 
5 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
6 / 30
The Titanic Dataset 
I The Titanic dataset in the datasets package is a 4-dimensional 
table with summarized information on the fate of passengers 
on the Titanic according to social class, sex, age and survival. 
I To make it suitable for association rule mining, we reconstruct 
the raw data as titanic.raw, where each row represents a 
person. 
I The reconstructed raw data can also be downloaded at 
http://www.rdatamining.com/data/titanic.raw.rdata. 
7 / 30
load("./data/titanic.raw.rdata") 
## draw a sample of 5 records 
idx <- sample(1:nrow(titanic.raw), 5) 
titanic.raw[idx, ] 
## Class Sex Age Survived 
## 950 Crew Male Adult No 
## 2176 3rd Female Adult Yes 
## 1716 Crew Male Adult Yes 
## 1001 Crew Male Adult No 
## 48 3rd Female Child No 
summary(titanic.raw) 
## Class Sex Age Survived 
## 1st :325 Female: 470 Adult:2092 No :1490 
## 2nd :285 Male :1731 Child: 109 Yes: 711 
## 3rd :706 
## Crew:885 
8 / 30
Function apriori() 
Mine frequent itemsets, association rules or association hyperedges 
using the Apriori algorithm. The Apriori algorithm employs 
level-wise search for frequent itemsets. 
Default settings: 
I minimum support: supp=0.1 
I minimum con
dence: conf=0.8 
I maximum length of rules: maxlen=10 
9 / 30
library(arules) 
rules.all <- apriori(titanic.raw) 
## 
## parameter specification: 
## confidence minval smax arem aval originalSupport support 
## 0.8 0.1 1 none FALSE TRUE 0.1 
## minlen maxlen target ext 
## 1 10 rules FALSE 
## 
## algorithmic control: 
## filter tree heap memopt load sort verbose 
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE 
## 
## apriori - find association rules with the apriori algorithm 
## version 4.21 (2004.05.09) (c) 1996-2004 Christian ... 
## set item appearances ...[0 item(s)] done [0.00s]. 
## set transactions ...[10 item(s), 2201 transaction(s)] done ... 
## sorting and recoding items ... [9 item(s)] done [0.00s]. 
## creating transaction tree ... done [0.00s]. 
## checking subsets of size 1 2 3 4 done [0.00s]. 
## writing ... [27 rule(s)] done [0.00s]. 
## creating S4 object ... done [0.00s]. 
10 / 30
inspect(rules.all) 
## lhs rhs support confidence lift 
## 1 {} => {Age=Adult} 0.9505 0.9505 1.0000 
## 2 {Class=2nd} => {Age=Adult} 0.1186 0.9158 0.9635 
## 3 {Class=1st} => {Age=Adult} 0.1449 0.9815 1.0327 
## 4 {Sex=Female} => {Age=Adult} 0.1931 0.9043 0.9514 
## 5 {Class=3rd} => {Age=Adult} 0.2849 0.8881 0.9344 
## 6 {Survived=Yes} => {Age=Adult} 0.2971 0.9198 0.9678 
## 7 {Class=Crew} => {Sex=Male} 0.3916 0.9740 1.2385 
## 8 {Class=Crew} => {Age=Adult} 0.4021 1.0000 1.0521 
## 9 {Survived=No} => {Sex=Male} 0.6197 0.9154 1.1640 
## 10 {Survived=No} => {Age=Adult} 0.6533 0.9651 1.0154 
## 11 {Sex=Male} => {Age=Adult} 0.7574 0.9630 1.0132 
## 12 {Sex=Female, 
## Survived=Yes} => {Age=Adult} 0.1436 0.9186 0.9665 
## 13 {Class=3rd, 
## Sex=Male} => {Survived=No} 0.1917 0.8275 1.2223 
## 14 {Class=3rd, 
## Survived=No} => {Age=Adult} 0.2163 0.9015 0.9485 
## 15 {Class=3rd, 
## Sex=Male} => {Age=Adult} 0.2099 0.9059 0.9531 
## 16 {Sex=Male, 
## Survived=Yes} => {Age=Adult} 0.1536 0.9210 0.9690 
11 / 30
# rules with rhs containing "Survived" only 
rules <- apriori(titanic.raw, 
control = list(verbose=F), 
parameter = list(minlen=2, supp=0.005, conf=0.8), 
appearance = list(rhs=c("Survived=No", 
"Survived=Yes"), 
default="lhs")) 
## keep three decimal places 
quality(rules) <- round(quality(rules), digits=3) 
## order rules by lift 
rules.sorted <- sort(rules, by="lift") 
12 / 30
inspect(rules.sorted) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 
## 2 {Class=2nd, 
## Sex=Female, 
## Age=Child} => {Survived=Yes} 0.006 1.000 3.096 
## 3 {Class=1st, 
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 
## 4 {Class=1st, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.064 0.972 3.010 
## 5 {Class=2nd, 
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 
## 6 {Class=Crew, 
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 
## 7 {Class=Crew, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.009 0.870 2.692 
## 8 {Class=2nd, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.036 0.860 2.663 
## 9 {Class=2nd, 
13 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
14 / 30
Redundant Rules 
inspect(rules.sorted[1:2]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
## 2 {Class=2nd, 
## Sex=Female, 
## Age=Child} => {Survived=Yes} 0.006 1 3.096 
I Rule #2 provides no extra knowledge in addition to rule #1, 
since rules #1 tells us that all 2nd-class children survived. 
I When a rule (such as #2) is a super rule of another rule (#1) 
and the former has the same or a lower lift, the former rule 
(#2) is considered to be redundant. 
I Other redundant rules in the above result are rules #4, #7 
and #8, compared respectively with #3, #6 and #5. 
15 / 30
Remove Redundant Rules 
## find redundant rules 
subset.matrix <- is.subset(rules.sorted, rules.sorted) 
subset.matrix[lower.tri(subset.matrix, diag = T)] <- NA 
redundant <- colSums(subset.matrix, na.rm = T) >= 1 
## which rules are redundant 
which(redundant) 
## [1] 2 4 7 8 
## remove redundant rules 
rules.pruned <- rules.sorted[!redundant] 
16 / 30
Remaining Rules 
inspect(rules.pruned) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 
## 2 {Class=1st, 
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 
## 3 {Class=2nd, 
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 
## 4 {Class=Crew, 
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 
## 5 {Class=2nd, 
## Sex=Male, 
## Age=Adult} => {Survived=No} 0.070 0.917 1.354 
## 6 {Class=2nd, 
## Sex=Male} => {Survived=No} 0.070 0.860 1.271 
## 7 {Class=3rd, 
## Sex=Male, 
## Age=Adult} => {Survived=No} 0.176 0.838 1.237 
## 8 {Class=3rd, 
## Sex=Male} => {Survived=No} 0.192 0.827 1.222 
17 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
18 / 30
inspect(rules.pruned[1]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
Did children of the 2nd class have a higher survival rate than other 
children? 
19 / 30
inspect(rules.pruned[1]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
Did children of the 2nd class have a higher survival rate than other 
children? 
The rule states only that all children of class 2 survived, but 
provides no information at all to compare the survival rates of 
dierent classes. 
19 / 30
Rules about Children 
rules - apriori(titanic.raw, control = list(verbose=F), 
parameter = list(minlen=3, supp=0.002, conf=0.2), 
appearance = list(default=none, rhs=c(Survived=Yes), 
lhs=c(Class=1st, Class=2nd, Class=3rd, 
Age=Child, Age=Adult))) 
rules.sorted - sort(rules, by=confidence) 
inspect(rules.sorted) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} = {Survived=Yes} 0.010904 1.0000 3.0956 
## 2 {Class=1st, 
## Age=Child} = {Survived=Yes} 0.002726 1.0000 3.0956 
## 3 {Class=1st, 
## Age=Adult} = {Survived=Yes} 0.089505 0.6176 1.9117 
## 4 {Class=2nd, 
## Age=Adult} = {Survived=Yes} 0.042708 0.3602 1.1149 
## 5 {Class=3rd, 
## Age=Child} = {Survived=Yes} 0.012267 0.3418 1.0580 
## 6 {Class=3rd, 
## Age=Adult} = {Survived=Yes} 0.068605 0.2408 0.7455 
20 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
21 / 30
library(arulesViz) 
plot(rules.all) 
Scatter plot for 27 rules 
1.25 
1.2 
1.15 
1.1 
1.05 
1 
0.95 
lift 
0.2 0.4 0.6 0.8 
1 
0.95 
0.9 
0.85 
support 
confidence 
22 / 30
plot(rules.all, method = grouped) 
Grouped matrix for 27 rules 
size: support 
1 (Class=Crew +2) 
1 (Class=Crew +1) 
1 (Class=3rd +2) 
1 (Age=Adult +1) 
2 (Class=Crew +1) 
2 (Class=Crew +0) 
2 (Survived=No +0) 
2 (Class=3rd +1) 
2 (Class=Crew +2) 
1 (Class=3rd +2) 
1 (Class=1st +0) 
1 (Sex=Male +1) 
1 (Sex=Male +0) 
1 (Class=1st +−1) 
2 (Survived=Yes +1) 
1 (Sex=Female +1) 
2 (Class=2nd +3) 
1 (Sex=Female +0) 
1 (Class=3rd +1) 
1 (Class=3rd +0) 
color: lift 
{Age=Adult} 
{Survived=No} 
{Sex=Male} 
LHS 
RHS 
23 / 30

Weitere ähnliche Inhalte

Was ist angesagt?

Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
dplyr Package in R
dplyr Package in Rdplyr Package in R
dplyr Package in RVedant Shah
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Edureka!
 
Data analysis
Data analysisData analysis
Data analysisLizzyL1
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouseSwapnilSaurav7
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data MiningKamal Acharya
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysishktripathy
 
Data mining Concepts and Techniques
Data mining Concepts and Techniques Data mining Concepts and Techniques
Data mining Concepts and Techniques Justin Cletus
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Edureka!
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regressionAkhilesh Joshi
 

Was ist angesagt? (20)

Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
dplyr Package in R
dplyr Package in Rdplyr Package in R
dplyr Package in R
 
Data Visualization Tools
Data Visualization ToolsData Visualization Tools
Data Visualization Tools
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
 
R decision tree
R   decision treeR   decision tree
R decision tree
 
Data analysis
Data analysisData analysis
Data analysis
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouse
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
02 data
02 data02 data
02 data
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data Mining
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Data mining Concepts and Techniques
Data mining Concepts and Techniques Data mining Concepts and Techniques
Data mining Concepts and Techniques
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
 

Andere mochten auch

Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growthShihab Rahman
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisMahendra Gupta
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization techniquemustafasmart
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 

Andere mochten auch (20)

Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growth
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 

Ähnlich wie Association Rule Mining with R

R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in BiostatisticsLarry Sultiz
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better MathBrent Schneeman
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Projectjpeterson2058
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsJano Suchal
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadiesAlicia Pérez
 
The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184Mahmoud Samir Fayed
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RFayan TAO
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptxRacksaviR
 
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxHierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxpooleavelina
 
[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19Kevin Chun-Hsien Hsu
 
Market Basket Analysis in R
Market Basket Analysis in RMarket Basket Analysis in R
Market Basket Analysis in RRsquared Academy
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Theodore Grammatikopoulos
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Matt Harrison
 
Structured data type
Structured data typeStructured data type
Structured data typeOmkar Majukar
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptxAdrien Melquiond
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 

Ähnlich wie Association Rule Mining with R (20)

R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in Biostatistics
 
R and data mining
R and data miningR and data mining
R and data mining
 
R programming
R programmingR programming
R programming
 
Next Level Testing
Next Level TestingNext Level Testing
Next Level Testing
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Project
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applications
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadies
 
The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with R
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptx
 
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxHierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
 
[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19
 
Market Basket Analysis in R
Market Basket Analysis in RMarket Basket Analysis in R
Market Basket Analysis in R
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
 
Structured data type
Structured data typeStructured data type
Structured data type
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 

Mehr von Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 

Mehr von Yanchang Zhao (11)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 

Kürzlich hochgeladen

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Association Rule Mining with R

  • 1. Association Rule Mining with R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 30
  • 2. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 2 / 30
  • 3. Association Rule Mining with R 1 I basic concepts of association rules I association rules mining with R I pruning redundant rules I interpreting and visualizing association rules I recommended readings 1Chapter 9: Association Rules, R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 3 / 30
  • 4. Association Rules Association rules are rules presenting association or correlation between itemsets. support(A ) B) = P(A [ B) con
  • 5. dence(A ) B) = P(BjA) = P(A [ B) P(A) lift(A ) B) = con
  • 6. dence(A ) B) P(B) = P(A [ B) P(A)P(B) where P(A) is the percentage (or probability) of cases containing A. 4 / 30
  • 7. Association Rule Mining Algorithms in R I APRIORI I a level-wise, breadth-
  • 8. rst algorithm which counts transactions to
  • 9. nd frequent itemsets and then derive association rules from them I apriori() in package arules I ECLAT I
  • 10. nds frequent itemsets with equivalence classes, depth-
  • 11. rst search and set intersection instead of counting I eclat() in the same package 5 / 30
  • 12. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 6 / 30
  • 13. The Titanic Dataset I The Titanic dataset in the datasets package is a 4-dimensional table with summarized information on the fate of passengers on the Titanic according to social class, sex, age and survival. I To make it suitable for association rule mining, we reconstruct the raw data as titanic.raw, where each row represents a person. I The reconstructed raw data can also be downloaded at http://www.rdatamining.com/data/titanic.raw.rdata. 7 / 30
  • 14. load("./data/titanic.raw.rdata") ## draw a sample of 5 records idx <- sample(1:nrow(titanic.raw), 5) titanic.raw[idx, ] ## Class Sex Age Survived ## 950 Crew Male Adult No ## 2176 3rd Female Adult Yes ## 1716 Crew Male Adult Yes ## 1001 Crew Male Adult No ## 48 3rd Female Child No summary(titanic.raw) ## Class Sex Age Survived ## 1st :325 Female: 470 Adult:2092 No :1490 ## 2nd :285 Male :1731 Child: 109 Yes: 711 ## 3rd :706 ## Crew:885 8 / 30
  • 15. Function apriori() Mine frequent itemsets, association rules or association hyperedges using the Apriori algorithm. The Apriori algorithm employs level-wise search for frequent itemsets. Default settings: I minimum support: supp=0.1 I minimum con
  • 16. dence: conf=0.8 I maximum length of rules: maxlen=10 9 / 30
  • 17. library(arules) rules.all <- apriori(titanic.raw) ## ## parameter specification: ## confidence minval smax arem aval originalSupport support ## 0.8 0.1 1 none FALSE TRUE 0.1 ## minlen maxlen target ext ## 1 10 rules FALSE ## ## algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian ... ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[10 item(s), 2201 transaction(s)] done ... ## sorting and recoding items ... [9 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [27 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. 10 / 30
  • 18. inspect(rules.all) ## lhs rhs support confidence lift ## 1 {} => {Age=Adult} 0.9505 0.9505 1.0000 ## 2 {Class=2nd} => {Age=Adult} 0.1186 0.9158 0.9635 ## 3 {Class=1st} => {Age=Adult} 0.1449 0.9815 1.0327 ## 4 {Sex=Female} => {Age=Adult} 0.1931 0.9043 0.9514 ## 5 {Class=3rd} => {Age=Adult} 0.2849 0.8881 0.9344 ## 6 {Survived=Yes} => {Age=Adult} 0.2971 0.9198 0.9678 ## 7 {Class=Crew} => {Sex=Male} 0.3916 0.9740 1.2385 ## 8 {Class=Crew} => {Age=Adult} 0.4021 1.0000 1.0521 ## 9 {Survived=No} => {Sex=Male} 0.6197 0.9154 1.1640 ## 10 {Survived=No} => {Age=Adult} 0.6533 0.9651 1.0154 ## 11 {Sex=Male} => {Age=Adult} 0.7574 0.9630 1.0132 ## 12 {Sex=Female, ## Survived=Yes} => {Age=Adult} 0.1436 0.9186 0.9665 ## 13 {Class=3rd, ## Sex=Male} => {Survived=No} 0.1917 0.8275 1.2223 ## 14 {Class=3rd, ## Survived=No} => {Age=Adult} 0.2163 0.9015 0.9485 ## 15 {Class=3rd, ## Sex=Male} => {Age=Adult} 0.2099 0.9059 0.9531 ## 16 {Sex=Male, ## Survived=Yes} => {Age=Adult} 0.1536 0.9210 0.9690 11 / 30
  • 19. # rules with rhs containing "Survived" only rules <- apriori(titanic.raw, control = list(verbose=F), parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(rhs=c("Survived=No", "Survived=Yes"), default="lhs")) ## keep three decimal places quality(rules) <- round(quality(rules), digits=3) ## order rules by lift rules.sorted <- sort(rules, by="lift") 12 / 30
  • 20. inspect(rules.sorted) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 ## 2 {Class=2nd, ## Sex=Female, ## Age=Child} => {Survived=Yes} 0.006 1.000 3.096 ## 3 {Class=1st, ## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 ## 4 {Class=1st, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.064 0.972 3.010 ## 5 {Class=2nd, ## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 ## 6 {Class=Crew, ## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 ## 7 {Class=Crew, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.009 0.870 2.692 ## 8 {Class=2nd, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.036 0.860 2.663 ## 9 {Class=2nd, 13 / 30
  • 21. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 14 / 30
  • 22. Redundant Rules inspect(rules.sorted[1:2]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 ## 2 {Class=2nd, ## Sex=Female, ## Age=Child} => {Survived=Yes} 0.006 1 3.096 I Rule #2 provides no extra knowledge in addition to rule #1, since rules #1 tells us that all 2nd-class children survived. I When a rule (such as #2) is a super rule of another rule (#1) and the former has the same or a lower lift, the former rule (#2) is considered to be redundant. I Other redundant rules in the above result are rules #4, #7 and #8, compared respectively with #3, #6 and #5. 15 / 30
  • 23. Remove Redundant Rules ## find redundant rules subset.matrix <- is.subset(rules.sorted, rules.sorted) subset.matrix[lower.tri(subset.matrix, diag = T)] <- NA redundant <- colSums(subset.matrix, na.rm = T) >= 1 ## which rules are redundant which(redundant) ## [1] 2 4 7 8 ## remove redundant rules rules.pruned <- rules.sorted[!redundant] 16 / 30
  • 24. Remaining Rules inspect(rules.pruned) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 ## 2 {Class=1st, ## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 ## 3 {Class=2nd, ## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 ## 4 {Class=Crew, ## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 ## 5 {Class=2nd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.070 0.917 1.354 ## 6 {Class=2nd, ## Sex=Male} => {Survived=No} 0.070 0.860 1.271 ## 7 {Class=3rd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.176 0.838 1.237 ## 8 {Class=3rd, ## Sex=Male} => {Survived=No} 0.192 0.827 1.222 17 / 30
  • 25. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 18 / 30
  • 26. inspect(rules.pruned[1]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 Did children of the 2nd class have a higher survival rate than other children? 19 / 30
  • 27. inspect(rules.pruned[1]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 Did children of the 2nd class have a higher survival rate than other children? The rule states only that all children of class 2 survived, but provides no information at all to compare the survival rates of dierent classes. 19 / 30
  • 28. Rules about Children rules - apriori(titanic.raw, control = list(verbose=F), parameter = list(minlen=3, supp=0.002, conf=0.2), appearance = list(default=none, rhs=c(Survived=Yes), lhs=c(Class=1st, Class=2nd, Class=3rd, Age=Child, Age=Adult))) rules.sorted - sort(rules, by=confidence) inspect(rules.sorted) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} = {Survived=Yes} 0.010904 1.0000 3.0956 ## 2 {Class=1st, ## Age=Child} = {Survived=Yes} 0.002726 1.0000 3.0956 ## 3 {Class=1st, ## Age=Adult} = {Survived=Yes} 0.089505 0.6176 1.9117 ## 4 {Class=2nd, ## Age=Adult} = {Survived=Yes} 0.042708 0.3602 1.1149 ## 5 {Class=3rd, ## Age=Child} = {Survived=Yes} 0.012267 0.3418 1.0580 ## 6 {Class=3rd, ## Age=Adult} = {Survived=Yes} 0.068605 0.2408 0.7455 20 / 30
  • 29. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 21 / 30
  • 30. library(arulesViz) plot(rules.all) Scatter plot for 27 rules 1.25 1.2 1.15 1.1 1.05 1 0.95 lift 0.2 0.4 0.6 0.8 1 0.95 0.9 0.85 support confidence 22 / 30
  • 31. plot(rules.all, method = grouped) Grouped matrix for 27 rules size: support 1 (Class=Crew +2) 1 (Class=Crew +1) 1 (Class=3rd +2) 1 (Age=Adult +1) 2 (Class=Crew +1) 2 (Class=Crew +0) 2 (Survived=No +0) 2 (Class=3rd +1) 2 (Class=Crew +2) 1 (Class=3rd +2) 1 (Class=1st +0) 1 (Sex=Male +1) 1 (Sex=Male +0) 1 (Class=1st +−1) 2 (Survived=Yes +1) 1 (Sex=Female +1) 2 (Class=2nd +3) 1 (Sex=Female +0) 1 (Class=3rd +1) 1 (Class=3rd +0) color: lift {Age=Adult} {Survived=No} {Sex=Male} LHS RHS 23 / 30
  • 32. plot(rules.all, method = graph) Graph for 27 rules {Class=3rd,Sex=Male,Age=Adult} {Sex=Male,Survived=No} {Class=3rd,Survived=No} {Sex=Male,Survived=Yes} {Survived=No} {Age=Adult} {Age=Adult,Survived=No} {} {Class=1st} {Sex=Female} {Class=2nd} {Class=3rd,Age=Adult,Survived=No} {Class=3rd,Sex=Male,Survived=No} {Class=3rd,Sex=Male} {Class=3rd} {Class=Crew,Age=Adult,Survived=No} {Class=Crew,Age=Adult} {Class=Crew,Sex=Male,Survived=No} {Class=Crew,Sex=Male} {Class=Crew,Survived=No} {Class=Crew} {Sex=Female,Survived=Yes} {Sex=Male} {Survived=Yes} width: support (0.119 − 0.95) color: lift (0.934 − 1.266) 24 / 30
  • 33. plot(rules.all, method = graph, control = list(type = items)) Graph for 27 rules Class=1st Class=2nd Survived=Yes Class=3rd Class=Crew Sex=Female Age=Adult Sex=Male Survived=No size: support (0.119 − 0.95) color: lift (0.934 − 1.266) 25 / 30
  • 34. plot(rules.all, method = paracoord, control = list(reorder = TRUE)) Parallel coordinates plot for 27 rules 3 2 1 rhs Class=1st Survived=No Survived=Yes Class=3rd Class=2nd Sex=Male Sex=Female Class=Crew Age=Adult Position 26 / 30
  • 35. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 27 / 30
  • 36. Further Readings I More than 20 interestingness measures, such as chi-square, conviction, gini and leverage Tan, P.-N., Kumar, V., and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proc. of KDD '02, pages 32-41, New York, NY, USA. ACM Press. I Post mining of association rules, such as selecting interesting association rules, visualization of association rules and using association rules for classi
  • 37. cation Yanchang Zhao, Chengqi Zhang and Longbing Cao (Eds.). Post-Mining of Association Rules: Techniques for Eective Knowledge Extraction, ISBN 978-1-60566-404-0, May 2009. Information Science Reference. I Package arulesSequences: mining sequential patterns http://cran.r-project.org/web/packages/arulesSequences/ 28 / 30
  • 38. Online Resources I Chapter 9: Association Rules, in book R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining.pdf I R Reference Card for Data Mining http://www.rdatamining.com/docs/R-refcard-data-mining.pdf I Free online courses and documents http://www.rdatamining.com/resources/ I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining 29 / 30
  • 39. The End Thanks! Email: yanchang(at)rdatamining.com 30 / 30