SlideShare ist ein Scribd-Unternehmen logo
1 von 56
DATA MINING
TECHNIQUES
UNIT-III
Association Rule Mining
• All Electronics-customer buys PC & Digital Camera
What should you recommend to him next?
Frequent patterns and association rules are the knowledge that you want to
mine
• Frequent patterns: patterns that appear frequently in a data set
• Frequent item sets: such as milk and bread, that appear frequently in a
transaction data set is frequent item set.
• Frequent sub sequence: appear in subsequence together in transaction data
set
• Frequent substructure: sub graphs, sub trees or sub lattices which may be
combined with item sets or subsequence ,if it occurs frequently is called a
frequent structured pattern
Basic Concepts
• Mining frequent patterns plays an essential role in mining associations,
correlations, data classifications, clustering etc.,
• Market Basket Analysis:
customer1:milk,bread,cereal
customer2:milk,bread,sugar,eggs
customer3:milk,bread,butter
customer4:sugar,eggs
• Which groups or sets of items are customers likely to purchase on a
given trip to a store?
Association Rules
• Support and Confidence are two measures of rule interestingness.
Support: (usefulness of discovered rules)
Certainity:(certainity of discovered rules)
[ support=2%,confidence=60%]
2% of all the transactions under analysis show that computer and
antivirus are purchased together- support
60% of the customers who purchased a computer also bought the
software- confidence
Association Rules
• Association rules are interesting if they satisfy both a minimum
support threshold and a minimum confidence threshold
• Frequent itemset, closed item sets and association rules:
I={I1,I2,..In}-Itemset
D-Task relevant data-database
T-Transaction
Rule: A=>B
Support(A=>B)=P(AUB)-Relative support
Confidence(A=>B)=P(B/A)
Association Rules
• Item sets
• K-Item sets
• Occurrence frequency of an itemset
• Minimum support threshold: If the relative support of an itemset I satisfies a
prespecified minimum support threshold then I is a frequent itemset.
• Confidence(A=>B)=P(B/A)
=support(AUB)
support(A)
=support_count(AUB)
support_count(A)
• Thus the problem of mining association rules can be reduced to that of mining
frequency item sets.
Frequent Item set in Data set (Association Rule
Mining)
• Association Mining searches for frequent items in the data-set. In frequent
mining usually the interesting associations and correlations between item
sets in transactional and relational databases are found. In short, Frequent
Mining shows which items appear together in a transaction or relation.
• Need of Association Mining:
Frequent mining is generation of association rules from a Transactional
Dataset. If there are 2 items X and Y purchased frequently then its good to
put them together in stores or provide some discount offer on one item on
purchase of other item. This can really increase the sales. For example it is
likely to find that if a customer buys Milk and bread he/she also
buys Butter.
So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can
suggest the customer to buy butter if he/she buys Milk and Bread.
Important Definitions :
• Support : It is one of the measure of interestingness. This tells about
usefulness and certainty of rules. 5% Support means total 5% of
transactions in database follow the rule.
• Support(A -> B) = Support_count(A ∪ B)
• Confidence: A confidence of 60% means that 60% of the customers
who purchased a milk and bread also bought butter.
• Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A)
• If a rule satisfies both minimum support and minimum confidence, it
is a strong rule.
Important Definitions :
• Support_count(X) : Number of transactions in which X appears. If X
is A union B then it is the number of transactions in which A and B
both are present.
1.Maximal Itemset: An itemset is maximal frequent if none of its
supersets are frequent.
2.Closed Itemset: An itemset is closed if none of its immediate
supersets have same support count same as Itemset.
3.K- Itemset: Itemset which contains K items is a K-itemset. So it can
be said that an itemset is frequent if the corresponding support count is
greater than minimum support count.
Example On finding Frequent Itemsets
• Consider the given dataset with given transactions.
• Lets say minimum support count is 3
• Relation hold is maximal frequent => closed => frequent
• 1-frequent:
• {A} = 3; // not closed due to {A, C} and not maximal
• {B} = 4; // not closed due to {B, D} and no maximal
• {C} = 4; // not closed due to {C, D} not maximal
• {D} = 5; // closed item-set since not immediate super-set has same count. Not maximal
• 2-frequent:
• {A, B} = 2 // not frequent because support count < minimum support count so ignore
• {A, C} = 3 // not closed due to {A, C, D}
• {A, D} = 3 // not closed due to {A, C, D}
• {B, C} = 3 // not closed due to {B, C, D}
• {B, D} = 4 // closed but not maximal due to {B, C, D}
• {C, D} = 4 // closed but not maximal due to {B, C, D}
• 3-frequent:
• {A, B, C} = 2 // ignore not frequent because support count < minimum support count
• {A, B, D} = 2 // ignore not frequent because support count < minimum support count
• {A, C, D} = 3 // maximal frequent
• {B, C, D} = 3 // maximal frequent
• 4-frequent:
• {A, B, C, D} = 2 //ignore not frequent
AR as Two step Process
• Find all frequent item sets
• Generate strong association rules from the frequent item sets
• Challenge in mining frequent item sets:
• Closed frequent item set: An itemset X is closed in a data set D if there
exists no proper super-itemset Y such that Y has the same support
count as X in D
• Maximal Frequent item set: An itemset X is a maximal frequent
itemset in a data set D if X is frequent & there exists no super-itemset
Y such that X ʗ Y& Y is frequent in D
Example: closed and maximal frequent
item sets
• A transaction database has only two transactions:
{<a1,a2,..a100>;<a1,a2,..a50>} Min_sup=1
• We find two closed frequent item sets and their support counts
C={{a1,a2,..a100}:1;{a1,a2,..a50}:2}
• Only one maximal frequent itemset:
M={{a1,a2,…a100}:1}
• We cannot include {a1,a2,..a50} as a maximal frequent itemset
because it has a frequent superset,{a1,a2,..a100}
• C-closed frequent item set, M-Maximal frequent item sets
Example: closed and maximal frequent
item sets
• Set of closed frequent item sets contain complete information
regarding the frequent item sets
• From c, we can derive
(i){a2,a45:2} since {a2,a45} is a sub-itemset of the itemset
{a1,a2,..a50:2}
(ii){a8,a55:1} since {a8,a55} is not a sub-itemset of the previous
itemset but of the itemset {a1,a2,..a100:1}
Frequent Itemset Mining Methods: Apriori
and FP Growth
• Apriori algorithm:
Finding frequent item sets by confined candidate generation
A seminal algorithm proposed by R.Agarwal & R.Srikant in 1994 for
mining frequent item sets.
Name of the algorithm is due to the fact that algorithm uses prior
knowledge of frequent itemset properties
Apriori Property: All non empty subsets of a frequent itemset must
also be frequent
Join Step and Prune Step
Example: problem
Problem contd.,
Generating Association Rules from
frequent item sets
• Once the frequent item sets from transactions have been found, it is
straightforward to generate strong association rules from them
• Strong association rules satisfy both minimum support and minimum
confidence
• Confidence(A=>B)=P(B/A)
=support_count(AUB)
support_count(A)
Generating Association Rules from
frequent item sets
• Association rules are generated as follows:
For each frequent itemset L, generate all non-empty subsets of L
For every non-empty subset s of L, output the rule
“s=>l-s” if sup_count(l)
sup_count(s) >= min_conf
Example: problem
Improving the efficiency of apriori
• Hash – based Technique: a hash based technique can be used to
reduce the size of the candidate k-item sets, cK ;k >1
• Example :
Improving the efficiency of apriori
• Transaction Reduction: reducing the no. of transaction scanned in
future iterations.
• A transaction that does not contain any frequent k-item sets cannot
contain any frequent (k+1) item sets.
• Such a transaction can be marked or removed from further
consideration.
Improving the efficiency of apriori
• Partitioning:2db scans
Partitioning the data to find candidate itemsets requires 2 db scans to
mine the frequent itemsets
• Phase I:
Divide the transaction of D into ‘n’ non overlapping partitions
Find the local frequent itemsets for each partition
Any itemset that is frequent in D must occur as a frequentitemset in
atleast one of the partitions
Therefore all local frequent itemsets are candidate itemsets in D
Improving the efficiency of apriori
• Phase: II
A second scan of D is conducted to determine the global frequent
item set, D is scanned only once in each phase
• Sampling
• Dynamic itemset counting
A database has five transactions. Let min sup D
60% and min conf D 80%.
A pattern-growth approach for mining
frequent item sets
• Apriori algorithm: Disadvantages
• Generate and test method-reduces the size of candidate sets that leads
to good performance gain
• Suffers from nontrivial costs
Frequent pattern growth or FP growth
(Divide and Conquer)
• Mines the complete set of frequent item sets without such a costly
candidate generation
• First it compresses the database representing frequent items into FP-
tree,which retains the itemset association information
• Create the root of the tree labelled with “null”
• Scan D second time
• Items in each transaction are processed in ”L” order and branch is
created for each transactions
Mining the FP-tree
• Start from each frequent length_1 pattern (as an initial suffix pattern)
construct its conditional pattern base
• Then constructs its conditional FP tree and perform mining recursively
on the tree
• Pattern growth is achieved by the concatenation of suffix pattern with
the frequent patterns generated from a conditional FP-tree
• This method reduces the search cost.
• Algorithm-FP growth
Mining frequent item sets using the
vertical data format
Mining closed and maximum patterns
• How can we mine closed frequent item sets?
• Strategies included:
Item merging
Sub-itemset pruning
Item skipping
• When a new frequent itemset is derived it is necessary to perform two
kinds of closure checking:
Superset checking
Subset checking
Pattern Evaluation Methods
• Strong rules are not necessarily interesting:
Pattern Evaluation Methods
• From association analysis to correlation analysis:
• Correlation rule:
• Correlation measure:
Pattern Evaluation Methods: chi-square
measure
Comparison of pattern evaluation
measures
• All-confidence
• Max_confidence
• Kulczynski(kulc)
• Cosine
• Null Transactions
• Null Invariant
Advanced pattern mining
• What is pattern mining?
• Pattern mining: A Road map
Basic patterns: frequent pattern, closed pattern, max-pattern,
infrequent pattern or rare patterns, negative patterns
Based on the abstraction levels involved in a pattern: single-level
association rule, multilevel association rules
Pattern mining: A Road map
Based on the number of dimensions involved in the rule or pattern :
Single-dimensional association rule/pattern , Multidimensional
association rule/pattern
Pattern mining: A Road map
• Based on the types of values handled in the rule or pattern: Boolean
association rule, quantitative association rule
Pattern mining: A Road map
• Based on the constraints or criteria used to mine selective
patterns:constraint-based,approximate,compressed,near-match,top-
k,redundancy-aware top-k
• Based on kinds of data and features to be mined: sequential patterns,
structural patterns
• Based on application domain-specific semantics
• Based on data analysis usages: pattern based classification, pattern
based clustering
Pattern mining in multilevel,
multidimensional space
• Mining multilevel associations
Pattern mining in multilevel,
multidimensional space
• Using uniform minimum support for all levels
• Using reduced minimum support at lower levels
Pattern mining in multilevel,
multidimensional space
• Using item or group-based minimum support
Pattern mining in multilevel,
multidimensional space
• Mining Multidimensional Associations
Single dimensional or intradimensional association rules
Multi dimensional or interdimensional association rules
Pattern mining in multilevel,
multidimensional space
• Mining quantitative association rules
A data cube method
A clustering-based method
A statistical analysis method to uncover exceptional behaviours
Pattern mining in multilevel,
multidimensional space
• Mining rare patterns and negative patterns
Constraint-based frequent pattern mining
• It includes the following: Knowledge type constraints, data
constraints, dimension/level constraints, Interestingness constraints,
Rule constraints
• Meta-rule guided mining of association rule
• Constraint based pattern generation
• An efficient frequent pattern mining processor can prune its search
space during mining in two ways:
Pruning pattern search space
Pruning data search space
Constraint-based frequent pattern mining
• There are five categories of pattern mining constraints:
Antimonotonic
Monotonic
Succint
Convertible
In convertible
Constraint-based frequent pattern mining
• Pruning data space with data pruning constraints
Data succinctness
Data antimonotocity
Data mining techniques unit III

Weitere ähnliche Inhalte

Was ist angesagt?

Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationKnoldus Inc.
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule MiningMohit Rajput
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningUtkarsh Sharma
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 

Was ist angesagt? (20)

Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 

Ähnlich wie Data mining techniques unit III

Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptxRashi Agarwal
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesRashmi Bhat
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptxAmenahAbbood
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.pptprema370155
 
Association rules by arpit_sharma
Association rules by arpit_sharmaAssociation rules by arpit_sharma
Association rules by arpit_sharmaEr. Arpit Sharma
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptNBACriteria2SICET
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 

Ähnlich wie Data mining techniques unit III (20)

Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
 
6 module 4
6 module 46 module 4
6 module 4
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
APRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptxAPRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptx
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.ppt
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rules by arpit_sharma
Association rules by arpit_sharmaAssociation rules by arpit_sharma
Association rules by arpit_sharma
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 

Mehr von malathieswaran29

Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit ivmalathieswaran29
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2malathieswaran29
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Principles of management organizing & reengineering
Principles of management organizing & reengineeringPrinciples of management organizing & reengineering
Principles of management organizing & reengineeringmalathieswaran29
 
Principles of management human factor & motivation
Principles of management human factor & motivationPrinciples of management human factor & motivation
Principles of management human factor & motivationmalathieswaran29
 
Software maintenance real world maintenance cost
Software maintenance real world maintenance costSoftware maintenance real world maintenance cost
Software maintenance real world maintenance costmalathieswaran29
 

Mehr von malathieswaran29 (14)

Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Bitcoin data mining
Bitcoin data miningBitcoin data mining
Bitcoin data mining
 
Principles of management organizing & reengineering
Principles of management organizing & reengineeringPrinciples of management organizing & reengineering
Principles of management organizing & reengineering
 
Principles of management human factor & motivation
Principles of management human factor & motivationPrinciples of management human factor & motivation
Principles of management human factor & motivation
 
Principles given by fayol
Principles given by fayolPrinciples given by fayol
Principles given by fayol
 
Software maintenance real world maintenance cost
Software maintenance real world maintenance costSoftware maintenance real world maintenance cost
Software maintenance real world maintenance cost
 
SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4
 
SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3
 
SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2
 
SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1
 
SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5
 

Kürzlich hochgeladen

BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Kürzlich hochgeladen (20)

BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Data mining techniques unit III

  • 2. Association Rule Mining • All Electronics-customer buys PC & Digital Camera What should you recommend to him next? Frequent patterns and association rules are the knowledge that you want to mine • Frequent patterns: patterns that appear frequently in a data set • Frequent item sets: such as milk and bread, that appear frequently in a transaction data set is frequent item set. • Frequent sub sequence: appear in subsequence together in transaction data set • Frequent substructure: sub graphs, sub trees or sub lattices which may be combined with item sets or subsequence ,if it occurs frequently is called a frequent structured pattern
  • 3. Basic Concepts • Mining frequent patterns plays an essential role in mining associations, correlations, data classifications, clustering etc., • Market Basket Analysis: customer1:milk,bread,cereal customer2:milk,bread,sugar,eggs customer3:milk,bread,butter customer4:sugar,eggs • Which groups or sets of items are customers likely to purchase on a given trip to a store?
  • 4. Association Rules • Support and Confidence are two measures of rule interestingness. Support: (usefulness of discovered rules) Certainity:(certainity of discovered rules) [ support=2%,confidence=60%] 2% of all the transactions under analysis show that computer and antivirus are purchased together- support 60% of the customers who purchased a computer also bought the software- confidence
  • 5. Association Rules • Association rules are interesting if they satisfy both a minimum support threshold and a minimum confidence threshold • Frequent itemset, closed item sets and association rules: I={I1,I2,..In}-Itemset D-Task relevant data-database T-Transaction Rule: A=>B Support(A=>B)=P(AUB)-Relative support Confidence(A=>B)=P(B/A)
  • 6. Association Rules • Item sets • K-Item sets • Occurrence frequency of an itemset • Minimum support threshold: If the relative support of an itemset I satisfies a prespecified minimum support threshold then I is a frequent itemset. • Confidence(A=>B)=P(B/A) =support(AUB) support(A) =support_count(AUB) support_count(A) • Thus the problem of mining association rules can be reduced to that of mining frequency item sets.
  • 7. Frequent Item set in Data set (Association Rule Mining) • Association Mining searches for frequent items in the data-set. In frequent mining usually the interesting associations and correlations between item sets in transactional and relational databases are found. In short, Frequent Mining shows which items appear together in a transaction or relation. • Need of Association Mining: Frequent mining is generation of association rules from a Transactional Dataset. If there are 2 items X and Y purchased frequently then its good to put them together in stores or provide some discount offer on one item on purchase of other item. This can really increase the sales. For example it is likely to find that if a customer buys Milk and bread he/she also buys Butter. So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can suggest the customer to buy butter if he/she buys Milk and Bread.
  • 8. Important Definitions : • Support : It is one of the measure of interestingness. This tells about usefulness and certainty of rules. 5% Support means total 5% of transactions in database follow the rule. • Support(A -> B) = Support_count(A ∪ B) • Confidence: A confidence of 60% means that 60% of the customers who purchased a milk and bread also bought butter. • Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A) • If a rule satisfies both minimum support and minimum confidence, it is a strong rule.
  • 9. Important Definitions : • Support_count(X) : Number of transactions in which X appears. If X is A union B then it is the number of transactions in which A and B both are present. 1.Maximal Itemset: An itemset is maximal frequent if none of its supersets are frequent. 2.Closed Itemset: An itemset is closed if none of its immediate supersets have same support count same as Itemset. 3.K- Itemset: Itemset which contains K items is a K-itemset. So it can be said that an itemset is frequent if the corresponding support count is greater than minimum support count.
  • 10. Example On finding Frequent Itemsets • Consider the given dataset with given transactions. • Lets say minimum support count is 3 • Relation hold is maximal frequent => closed => frequent
  • 11. • 1-frequent: • {A} = 3; // not closed due to {A, C} and not maximal • {B} = 4; // not closed due to {B, D} and no maximal • {C} = 4; // not closed due to {C, D} not maximal • {D} = 5; // closed item-set since not immediate super-set has same count. Not maximal • 2-frequent: • {A, B} = 2 // not frequent because support count < minimum support count so ignore • {A, C} = 3 // not closed due to {A, C, D} • {A, D} = 3 // not closed due to {A, C, D} • {B, C} = 3 // not closed due to {B, C, D} • {B, D} = 4 // closed but not maximal due to {B, C, D} • {C, D} = 4 // closed but not maximal due to {B, C, D} • 3-frequent: • {A, B, C} = 2 // ignore not frequent because support count < minimum support count • {A, B, D} = 2 // ignore not frequent because support count < minimum support count • {A, C, D} = 3 // maximal frequent • {B, C, D} = 3 // maximal frequent • 4-frequent: • {A, B, C, D} = 2 //ignore not frequent
  • 12. AR as Two step Process • Find all frequent item sets • Generate strong association rules from the frequent item sets • Challenge in mining frequent item sets: • Closed frequent item set: An itemset X is closed in a data set D if there exists no proper super-itemset Y such that Y has the same support count as X in D • Maximal Frequent item set: An itemset X is a maximal frequent itemset in a data set D if X is frequent & there exists no super-itemset Y such that X ʗ Y& Y is frequent in D
  • 13. Example: closed and maximal frequent item sets • A transaction database has only two transactions: {<a1,a2,..a100>;<a1,a2,..a50>} Min_sup=1 • We find two closed frequent item sets and their support counts C={{a1,a2,..a100}:1;{a1,a2,..a50}:2} • Only one maximal frequent itemset: M={{a1,a2,…a100}:1} • We cannot include {a1,a2,..a50} as a maximal frequent itemset because it has a frequent superset,{a1,a2,..a100} • C-closed frequent item set, M-Maximal frequent item sets
  • 14. Example: closed and maximal frequent item sets • Set of closed frequent item sets contain complete information regarding the frequent item sets • From c, we can derive (i){a2,a45:2} since {a2,a45} is a sub-itemset of the itemset {a1,a2,..a50:2} (ii){a8,a55:1} since {a8,a55} is not a sub-itemset of the previous itemset but of the itemset {a1,a2,..a100:1}
  • 15. Frequent Itemset Mining Methods: Apriori and FP Growth • Apriori algorithm: Finding frequent item sets by confined candidate generation A seminal algorithm proposed by R.Agarwal & R.Srikant in 1994 for mining frequent item sets. Name of the algorithm is due to the fact that algorithm uses prior knowledge of frequent itemset properties Apriori Property: All non empty subsets of a frequent itemset must also be frequent Join Step and Prune Step
  • 18.
  • 19. Generating Association Rules from frequent item sets • Once the frequent item sets from transactions have been found, it is straightforward to generate strong association rules from them • Strong association rules satisfy both minimum support and minimum confidence • Confidence(A=>B)=P(B/A) =support_count(AUB) support_count(A)
  • 20. Generating Association Rules from frequent item sets • Association rules are generated as follows: For each frequent itemset L, generate all non-empty subsets of L For every non-empty subset s of L, output the rule “s=>l-s” if sup_count(l) sup_count(s) >= min_conf
  • 22. Improving the efficiency of apriori • Hash – based Technique: a hash based technique can be used to reduce the size of the candidate k-item sets, cK ;k >1 • Example :
  • 23. Improving the efficiency of apriori • Transaction Reduction: reducing the no. of transaction scanned in future iterations. • A transaction that does not contain any frequent k-item sets cannot contain any frequent (k+1) item sets. • Such a transaction can be marked or removed from further consideration.
  • 24. Improving the efficiency of apriori • Partitioning:2db scans Partitioning the data to find candidate itemsets requires 2 db scans to mine the frequent itemsets • Phase I: Divide the transaction of D into ‘n’ non overlapping partitions Find the local frequent itemsets for each partition Any itemset that is frequent in D must occur as a frequentitemset in atleast one of the partitions Therefore all local frequent itemsets are candidate itemsets in D
  • 25. Improving the efficiency of apriori • Phase: II A second scan of D is conducted to determine the global frequent item set, D is scanned only once in each phase • Sampling • Dynamic itemset counting
  • 26. A database has five transactions. Let min sup D 60% and min conf D 80%.
  • 27.
  • 28. A pattern-growth approach for mining frequent item sets • Apriori algorithm: Disadvantages • Generate and test method-reduces the size of candidate sets that leads to good performance gain • Suffers from nontrivial costs
  • 29. Frequent pattern growth or FP growth (Divide and Conquer) • Mines the complete set of frequent item sets without such a costly candidate generation • First it compresses the database representing frequent items into FP- tree,which retains the itemset association information • Create the root of the tree labelled with “null” • Scan D second time • Items in each transaction are processed in ”L” order and branch is created for each transactions
  • 30. Mining the FP-tree • Start from each frequent length_1 pattern (as an initial suffix pattern) construct its conditional pattern base • Then constructs its conditional FP tree and perform mining recursively on the tree • Pattern growth is achieved by the concatenation of suffix pattern with the frequent patterns generated from a conditional FP-tree • This method reduces the search cost. • Algorithm-FP growth
  • 31.
  • 32.
  • 33.
  • 34. Mining frequent item sets using the vertical data format
  • 35. Mining closed and maximum patterns • How can we mine closed frequent item sets? • Strategies included: Item merging Sub-itemset pruning Item skipping • When a new frequent itemset is derived it is necessary to perform two kinds of closure checking: Superset checking Subset checking
  • 36. Pattern Evaluation Methods • Strong rules are not necessarily interesting:
  • 37. Pattern Evaluation Methods • From association analysis to correlation analysis: • Correlation rule: • Correlation measure:
  • 38. Pattern Evaluation Methods: chi-square measure
  • 39. Comparison of pattern evaluation measures • All-confidence • Max_confidence • Kulczynski(kulc) • Cosine • Null Transactions • Null Invariant
  • 40.
  • 41. Advanced pattern mining • What is pattern mining? • Pattern mining: A Road map Basic patterns: frequent pattern, closed pattern, max-pattern, infrequent pattern or rare patterns, negative patterns Based on the abstraction levels involved in a pattern: single-level association rule, multilevel association rules
  • 42. Pattern mining: A Road map Based on the number of dimensions involved in the rule or pattern : Single-dimensional association rule/pattern , Multidimensional association rule/pattern
  • 43. Pattern mining: A Road map • Based on the types of values handled in the rule or pattern: Boolean association rule, quantitative association rule
  • 44. Pattern mining: A Road map • Based on the constraints or criteria used to mine selective patterns:constraint-based,approximate,compressed,near-match,top- k,redundancy-aware top-k • Based on kinds of data and features to be mined: sequential patterns, structural patterns • Based on application domain-specific semantics • Based on data analysis usages: pattern based classification, pattern based clustering
  • 45.
  • 46. Pattern mining in multilevel, multidimensional space • Mining multilevel associations
  • 47. Pattern mining in multilevel, multidimensional space • Using uniform minimum support for all levels • Using reduced minimum support at lower levels
  • 48. Pattern mining in multilevel, multidimensional space • Using item or group-based minimum support
  • 49. Pattern mining in multilevel, multidimensional space • Mining Multidimensional Associations Single dimensional or intradimensional association rules Multi dimensional or interdimensional association rules
  • 50. Pattern mining in multilevel, multidimensional space • Mining quantitative association rules A data cube method A clustering-based method A statistical analysis method to uncover exceptional behaviours
  • 51.
  • 52. Pattern mining in multilevel, multidimensional space • Mining rare patterns and negative patterns
  • 53. Constraint-based frequent pattern mining • It includes the following: Knowledge type constraints, data constraints, dimension/level constraints, Interestingness constraints, Rule constraints • Meta-rule guided mining of association rule • Constraint based pattern generation • An efficient frequent pattern mining processor can prune its search space during mining in two ways: Pruning pattern search space Pruning data search space
  • 54. Constraint-based frequent pattern mining • There are five categories of pattern mining constraints: Antimonotonic Monotonic Succint Convertible In convertible
  • 55. Constraint-based frequent pattern mining • Pruning data space with data pruning constraints Data succinctness Data antimonotocity