SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013

Modifed Bit-Apriori Algorithm for Frequent ItemSets in Data Mining
J Karthikeyan1 and Dr. Udaykumar2
1

Research Scholar, Hindustan University, Chennai, India
Email: karthikeyan_world@hotmail.com
2
ACOE, Hindustan University, Chennai, India
Email: aukumar71@gmail.com

Abstract -Mining frequent item-sets is one of the most
important concepts in data mining. It is a fundamental and
initial task of data mining. Apriori[3] is the most popular and
frequently used algorithm for finding frequent item-sets.
There are other algorithms viz, Eclat[4], FP-growth[5] which
are used to find out frequent item-sets. In order to improve
the time efficiency of Apriori algorithms, Jiemin Zheng
introduced Bit-Apriori[1] algorithm with the following
corrections with respect to Apriori[3] algorithm.
1) Support count is implemented by performing bitwise “And”
operation on binary strings
2) Special equal-support pruning
In this paper, to improve the time efficiency of Bit-Apriori[1]
algorithm, a novel algorithm that deletes infrequent items
during trie2 and subsequent tire’s are proposed and
demonstrated with an example.

unimportant patterns in the item-sets mining.
II. RELATED WORK
A. Apriori algorithm
In computer science and data mining, Apriori is a classic
algorithm for learning association rules[8]. Apriori is designed
to operate on databases containing transactions. Apriori is
commonly used in association rule mining [3]. Apriori uses a
“bottom up” approach, where frequent subsets are extended
one item at a time (a step known as candidate generation),
and groups of candidates are tested against the data[9][10].
The algorithm terminates when no further successful
extensions are found. Apriori [2] uses breadth-first [3] search
and a tree structure to count[6][12[13] candidate item sets
efficiently. It generates candidate item sets of length K from
item sets of length k-1. Then it prunes the candidates which
have an infrequent sub pattern[11]. According to the
downward closure lemma, the candidate set contains all
frequent k- length item sets. After that, it scans the transaction
database to determine frequent item-sets among the
candidates.
Apriori [2], though historically significant, suffers from a
number of inefficiencies or trade-offs, which have spawned
other algorithms. Candidate generation generates large
numbers of subsets (the algorithm attempts to load up the
candidate set with as many as possible before each scan).
Bottom-up subset exploration (essentially a breadth-first
traversal of the subset lattice) finds any maximal subset S
only after all
-1of its proper subsets. The pseudo code for
Apriori is shown in Table I.

Index Terms - Data mining; frequent item-sets; Apriori; BitApriori, trie2.

I. INTRODUCTION
In recent years the size of database has increased rapidly.
This has led to a growing interest in the development of
tools capable of automatic extraction of knowledge from data.
The term data mining or knowledge discovery in database
has been adopted for a field of research dealing with the
automatic discovery of implicit information or knowledge
within the databases. The implicit information within
databases, mainly the interesting association relationships[5]
among sets of objects that lead to association rules may
disclose useful patterns for decision support, financial
forecast, marketing policies, even medical diagnosis and
many other applications[7].
In frequent patterns, the challenge is large number of result
patterns. As the minimum threshold becomes lower, an
exponentially large number of item-sets are generated.
Therefore, pruning[1] unimportant patterns can be done
effectively in mining process and that becomes one of the
main topics in frequent pattern mining. Hence, the main aim
is to optimize the process of finding frequent patterns which
should be efficient, scalable and can detect the important
patterns that can be used in various ways of extraction of
knowledge from data.
Therefore, the study of frequent item-sets mining is well
acknowledged in frequent pattern mining because of its broad
applications on association rules and for other data mining
tasks. An attempt is made in the present work to prune
© 2013 ACEEE
DOI: 03.LSCS.2013.2.66

B. Bit-Apriori Algorithm
Bit-Apriori used the datastructure and techniques of
Apriori [1] algorithm. The main difference between Apriori
and Bit-Apriori lies in candidate item-sets generation and
support count approach. These two steps consume more
time and memory in the Apriori [2] algorithm. Given a set of
item-sets, the algorithm attempts to find subsets which are
common to at least a minimum number C of the item-sets. The
time required for mining [14][15]frequent k-item-sets grows
significantly when k increases in Apriori. But Bit-Apriori [1]
performs much better because it has no candidate generation
and needs to traverse the trie only once. The pseudocode for
Bit-Apriori is shown in Table II.
54
Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
there exist a node with child then we go for traversal else
ignore the node by considering as infrequent. Such nodes
will not be considered for the further iterations in the proposed
algorithm. This will reduce the time complexity when the
occurance of the infrequent items are increased in the given
dataset.
The pseudo code for the proposed algorithm is shown in
Table III.

TABLE I. THE PSUEDOCODE FOR FINDING FREQUENT ITEM-SETS USING APRIORI
ALGORITHM

TABLE III. THE PSUEDOCODE FOR THE PROPOSED ALGORITHM

TABLE II. T HE PSUEDOCODE FOR BIT-APRIORI

To demonstrate the process of proposed algorithm, an
example is given below. As shown in table , the example
database is in the second column. In the database, there are
ten transactions.
TABLE IV. T HE EXAMPLE DATABASE
TID
1
2
3
4
5
6
7
8
9
10

III. PROBLEM STATEMENT
To find out frequent item-sets, both Apriori[3] and BitApriori[1] algorithms are used to search elements in the entire
item-sets starting from 1 to N. When the total support count
for an item is zero or lesser than the support count, then the
elements are not required for the consecutive iterations. While
forming tires Apriori and Bit-Apriori algorithms are
considering these elements.
Hence there is a scope for improvement by eliminating
such items during tires formation. A new algorithm is proposed
to improve the performance, resource utilization, time and
efficiency.
IV. PROPOSED ALGORITHM
A new algorithm has been developed which deletes the
infrequent items during the trie2 and subsequent iterations.
The removal of infrequent items results with improvement in
computation time. Apriori and Bit-Apriori algorithms do not
removes the infrequent items during the tire2 and subsequent
iterations. In the graph, the proposed algorithm checks if
© 2013 ACEEE
DOI: 03.LSCS.2013.2. 66

55

Items
ABDEFL
AGO
CEI
ACDEG
ABCEGK
EH
ABCEFJ
ACD
ACEGM
ACEGN

Ordered frequent
items
AE
GA
CE
GCAE
GCAE
E
CAE
CA
GCAE
GCAE

Suppose the support threshold min_sup is 40%. The
support of each item is counted, and infrequent items
are deleted, during the first scan of the database. The
support of each item is given as follows.
A:8, B:3, C:7, D:3, E:8, F:2, G:5, H:1, I:1, J:1, K:1, L:1,
M:1, N:1, O:1
Since the minimum support is 4, frequent items are sorted
into a non-decreasing list, according to their respective
supports. And if two items have the same support, they will
be sorted according to their lexicographic order. In Step 2 of
Bit-Apriori, all frequent 2-item-sets are found as shown in
Table V.
The trie with the binary string shown in each leaf is
established, which is shown in Fig. 1.
Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
TABLE V. FREQUENT 2-I TEM-SETS
TID
1
2
3
4
5
6
7
8
9
10

Ordered
Items
AE
GA
CE
GCAE
GCAE
E
CAE
CA
GCAE
GCAE

{G,
C}
0
0
0
1
1
0
0
0
1
1

{G,
A}
0
1
0
1
1
0
0
0
1
1

{G,
E}
0
0
0
1
1
0
0
0
1
1

{C,
A}
0
0
0
1
1
0
1
1
1
1

{C,
E}
0
0
1
1
1
0
1
0
1
1

algorithms. Interesting finding is that, when the occurrence
of the non-frequent item-sets are higher then the execution
time gets reduced drastically. The experimental result shows
that the proposed algorithm not only decreases the
computation time but also decreases the resources used and
the execution time is represented in Table VI.

{A,
E}
1
0
0
1
1
0
1
0
1
1

During the consequent iterations, element ‘E’ can be ignored by considering it as non-frequent item set. The computation time can be considerably reduced when the occurrence of element like ‘E’ are more in the frequent items. By
completing all iterations the final output of the binary string
is shown in Fig. 2.

Fig. 3. Execution Time Of Algorithms
TABLE VI. C OMPARISON OF EXECUTION T IME BETWEEN APRIORI/B IT-APRIORI/
MODIFIED BIT-APRIORI
(Execution Time in Seconds)
Dataset
Apriori
pusmsb

Bit-Apriori

Modified Bit-Apriori

4.5

1.32

0.98

VII. CONCLUSIONS
In this paper, the modified Bit-Apriori technique improves
the performance of Bit-Apriori, by eliminating the search of
infrequent item-sets. It also improves the computational
efficiency significantly. Experimental results have shown that
modified Bit-Apriori algorithm out performs the fast BitApriori, especially when the occurrence of the non-frequent
item-sets are more.
When the database is large, the Bit-Apriori may suffer
from the problem of memory scarcity due to large number of
bitwise operations. Future work can be done in the direction
of replacing bitwise operations.

Fig. 1. Trie After Generation(2)

REFERENCES
[1] Jiemin Zheng., 1, Defu Zhang 1, Stephen C.H.Leung 2,Xiyue
Zhou, “An efficient algorithm for frequent itemsets in data
mining” Service Systems and Service Management(ICSSSM),
2010 7th International Conference on: 28-30 June 2010.
[2] Agrwal R., R.Srikant, “Fast algorithms for mining association
rules”, The International Conference on Very Large Dabases,
pp. 487-499, 1994.
[3] Zaki M.J., S. Parthasarathy, M.Ogihara, W.Li,” New algorithms
for fast discovery of association rules”, in Proceedings of the
3rd International Conference on Knowledge Discovery and
Data Mining, pp. 283-296,1997.
[4] Han J., J. Pei, Y. Yin, “Mining frequent patterns without
candidate generation” in Proceedings of the 2000 ACM
SIGMOD international conference on Management of data,

Fig. 2. Trie After Completion

V. EXPERIMENTAL RESULTS
The proposed algorithm is tested on different data sets
and the experimental results are shown in Fig. 3.
The proposed algorithm consumes considerably a lesser
amount of time compared to Bit-Apriori and Apriori
© 2013 ACEEE
DOI: 03.LSCS.2013.2.66

56
Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
ACM Press, pp. 1-12,2000.
[5] Pork J.S., M.S. Chen, P.S. Yu, “An effective hash based
algorithm for mining association rules” ACM SIGMOD, pp.
175-186, 1995.
[6] Brin S., R. Motwani, J.D. Ullman, S. Tsur,”Dynamic itemset
counting and implicationrulesformarket basket data”,in
Proceedings of the ACMSIGMOD International Conference
on Management of Data, pp. 255–264, 1997.
[7] Brin S., R. Motwani, C. Silverstein, “Beyond market baskets:
generalizing association rules to correlations”, in Proceedings
of the ACM SIGMOD International Conference on
Management of Data, Tuscon, Arizona, pp. 265-276, 1997.
[8] Toivonen H., “Sampling large databases for association rules”,
in Proceedings of 22nd VLDB Conference, Mumbai, India,
pp. 134-145, 1996.
[9] Savasere A., E. Omiecinski, S.B. Navathe, “An efficient
algorithm for mining association rules in large databases”, in
Proceedings of 21th International Conference on Very Large
Data Bases (VLDB’95), Zurich, pp. 432-444, 1995.

© 2013 ACEEE
DOI: 03.LSCS.2013.2. 66

[10] Tsay Y.J., J.Y. Chiang, “CBAR: an efficient method formining
association rules,” Knowledge Based Systems, 18 (2-3), pp.
99-105, 2005.
[11] Liu G., H. Lu, W. Lou, Y. Xu, J.X. Yu, “Efficient mining of
frequent patterns using ascending frequency Ordered prefixtree”, Data Mining Knowledge Discovery, 9 (3), pp. 249-274,
2004.
[12] Grahne G., J. Zhu, “Fast algorithms for frequent itemset mining
using FP-Trees”, IEEE Transaction on Knowledge and Data
Engineering, 17 (10), pp.1347-1362, 2005.
[13] Zaki M.J., “Scalable algorithms for association mining” IEEE
Transactions on Knowledge and Data Engineering, 12 (3), pp.
372-390, 2000.
[14] Zaki M.J., K. Gouda, “Fast Vertical Mining Using Diffsets”,
in Proceedings of the ACM SIGMOD International Conference
on Knowledge Discovery and Data Mining, pp. 326-335, 2003.
[15] Dong J., M. Han, “BitTableFI: an efficient mining frequent
itemsets algorithm” Knowledge Based Systems, 20 (4), pp.
329-335, 2007.

57

Weitere ähnliche Inhalte

Was ist angesagt?

CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...
ijcsit
 
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduceFiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
IJCSIS Research Publications
 

Was ist angesagt? (20)

Ad03301810188
Ad03301810188Ad03301810188
Ad03301810188
 
Association Rule Mining using RHadoop
Association Rule Mining using RHadoopAssociation Rule Mining using RHadoop
Association Rule Mining using RHadoop
 
J0945761
J0945761J0945761
J0945761
 
Ijariie1129
Ijariie1129Ijariie1129
Ijariie1129
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Optimized High-Utility Itemsets Mining for Effective Association Mining Paper
Optimized High-Utility Itemsets Mining for Effective Association Mining Paper  Optimized High-Utility Itemsets Mining for Effective Association Mining Paper
Optimized High-Utility Itemsets Mining for Effective Association Mining Paper
 
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
IRJET- Deep Learning Framework Analysis
IRJET- Deep Learning Framework AnalysisIRJET- Deep Learning Framework Analysis
IRJET- Deep Learning Framework Analysis
 
B0950814
B0950814B0950814
B0950814
 
B03606010
B03606010B03606010
B03606010
 
Ay4201347349
Ay4201347349Ay4201347349
Ay4201347349
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
 
Hyperparameter Optimization with Hyperband Algorithm
Hyperparameter Optimization with Hyperband AlgorithmHyperparameter Optimization with Hyperband Algorithm
Hyperparameter Optimization with Hyperband Algorithm
 
Ej36829834
Ej36829834Ej36829834
Ej36829834
 
Associative Learning
Associative LearningAssociative Learning
Associative Learning
 
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduceFiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
 

Ähnlich wie Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Mining

Association Rule Hiding using Hash Tree
Association Rule Hiding using Hash TreeAssociation Rule Hiding using Hash Tree
Association Rule Hiding using Hash Tree
ijtsrd
 

Ähnlich wie Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Mining (20)

Ijcatr04051008
Ijcatr04051008Ijcatr04051008
Ijcatr04051008
 
B017550814
B017550814B017550814
B017550814
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
 
Efficient Temporal Association Rule Mining
Efficient Temporal Association Rule MiningEfficient Temporal Association Rule Mining
Efficient Temporal Association Rule Mining
 
A04010105
A04010105A04010105
A04010105
 
Generation of Potential High Utility Itemsets from Transactional Databases
Generation of Potential High Utility Itemsets from Transactional DatabasesGeneration of Potential High Utility Itemsets from Transactional Databases
Generation of Potential High Utility Itemsets from Transactional Databases
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
 
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULESIMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
 
Association Rule Hiding using Hash Tree
Association Rule Hiding using Hash TreeAssociation Rule Hiding using Hash Tree
Association Rule Hiding using Hash Tree
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
 
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
 
A1030105
A1030105A1030105
A1030105
 
J017114852
J017114852J017114852
J017114852
 
A classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningA classification of methods for frequent pattern mining
A classification of methods for frequent pattern mining
 
Frequent Item Set Mining - A Review
Frequent Item Set Mining - A ReviewFrequent Item Set Mining - A Review
Frequent Item Set Mining - A Review
 
A Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
A Survey Report on High Utility Itemset Mining for Frequent Pattern MiningA Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
A Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
 

Mehr von idescitation

65 113-121
65 113-12165 113-121
65 113-121
idescitation
 
74 136-143
74 136-14374 136-143
74 136-143
idescitation
 
84 11-21
84 11-2184 11-21
84 11-21
idescitation
 
29 88-96
29 88-9629 88-96
29 88-96
idescitation
 

Mehr von idescitation (20)

65 113-121
65 113-12165 113-121
65 113-121
 
69 122-128
69 122-12869 122-128
69 122-128
 
71 338-347
71 338-34771 338-347
71 338-347
 
72 129-135
72 129-13572 129-135
72 129-135
 
74 136-143
74 136-14374 136-143
74 136-143
 
80 152-157
80 152-15780 152-157
80 152-157
 
82 348-355
82 348-35582 348-355
82 348-355
 
84 11-21
84 11-2184 11-21
84 11-21
 
62 328-337
62 328-33762 328-337
62 328-337
 
46 102-112
46 102-11246 102-112
46 102-112
 
47 292-298
47 292-29847 292-298
47 292-298
 
49 299-305
49 299-30549 299-305
49 299-305
 
57 306-311
57 306-31157 306-311
57 306-311
 
60 312-318
60 312-31860 312-318
60 312-318
 
5 1-10
5 1-105 1-10
5 1-10
 
11 69-81
11 69-8111 69-81
11 69-81
 
14 284-291
14 284-29114 284-291
14 284-291
 
15 82-87
15 82-8715 82-87
15 82-87
 
29 88-96
29 88-9629 88-96
29 88-96
 
43 97-101
43 97-10143 97-101
43 97-101
 

Kürzlich hochgeladen

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Kürzlich hochgeladen (20)

Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Mining

  • 1. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 Modifed Bit-Apriori Algorithm for Frequent ItemSets in Data Mining J Karthikeyan1 and Dr. Udaykumar2 1 Research Scholar, Hindustan University, Chennai, India Email: karthikeyan_world@hotmail.com 2 ACOE, Hindustan University, Chennai, India Email: aukumar71@gmail.com Abstract -Mining frequent item-sets is one of the most important concepts in data mining. It is a fundamental and initial task of data mining. Apriori[3] is the most popular and frequently used algorithm for finding frequent item-sets. There are other algorithms viz, Eclat[4], FP-growth[5] which are used to find out frequent item-sets. In order to improve the time efficiency of Apriori algorithms, Jiemin Zheng introduced Bit-Apriori[1] algorithm with the following corrections with respect to Apriori[3] algorithm. 1) Support count is implemented by performing bitwise “And” operation on binary strings 2) Special equal-support pruning In this paper, to improve the time efficiency of Bit-Apriori[1] algorithm, a novel algorithm that deletes infrequent items during trie2 and subsequent tire’s are proposed and demonstrated with an example. unimportant patterns in the item-sets mining. II. RELATED WORK A. Apriori algorithm In computer science and data mining, Apriori is a classic algorithm for learning association rules[8]. Apriori is designed to operate on databases containing transactions. Apriori is commonly used in association rule mining [3]. Apriori uses a “bottom up” approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data[9][10]. The algorithm terminates when no further successful extensions are found. Apriori [2] uses breadth-first [3] search and a tree structure to count[6][12[13] candidate item sets efficiently. It generates candidate item sets of length K from item sets of length k-1. Then it prunes the candidates which have an infrequent sub pattern[11]. According to the downward closure lemma, the candidate set contains all frequent k- length item sets. After that, it scans the transaction database to determine frequent item-sets among the candidates. Apriori [2], though historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all -1of its proper subsets. The pseudo code for Apriori is shown in Table I. Index Terms - Data mining; frequent item-sets; Apriori; BitApriori, trie2. I. INTRODUCTION In recent years the size of database has increased rapidly. This has led to a growing interest in the development of tools capable of automatic extraction of knowledge from data. The term data mining or knowledge discovery in database has been adopted for a field of research dealing with the automatic discovery of implicit information or knowledge within the databases. The implicit information within databases, mainly the interesting association relationships[5] among sets of objects that lead to association rules may disclose useful patterns for decision support, financial forecast, marketing policies, even medical diagnosis and many other applications[7]. In frequent patterns, the challenge is large number of result patterns. As the minimum threshold becomes lower, an exponentially large number of item-sets are generated. Therefore, pruning[1] unimportant patterns can be done effectively in mining process and that becomes one of the main topics in frequent pattern mining. Hence, the main aim is to optimize the process of finding frequent patterns which should be efficient, scalable and can detect the important patterns that can be used in various ways of extraction of knowledge from data. Therefore, the study of frequent item-sets mining is well acknowledged in frequent pattern mining because of its broad applications on association rules and for other data mining tasks. An attempt is made in the present work to prune © 2013 ACEEE DOI: 03.LSCS.2013.2.66 B. Bit-Apriori Algorithm Bit-Apriori used the datastructure and techniques of Apriori [1] algorithm. The main difference between Apriori and Bit-Apriori lies in candidate item-sets generation and support count approach. These two steps consume more time and memory in the Apriori [2] algorithm. Given a set of item-sets, the algorithm attempts to find subsets which are common to at least a minimum number C of the item-sets. The time required for mining [14][15]frequent k-item-sets grows significantly when k increases in Apriori. But Bit-Apriori [1] performs much better because it has no candidate generation and needs to traverse the trie only once. The pseudocode for Bit-Apriori is shown in Table II. 54
  • 2. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 there exist a node with child then we go for traversal else ignore the node by considering as infrequent. Such nodes will not be considered for the further iterations in the proposed algorithm. This will reduce the time complexity when the occurance of the infrequent items are increased in the given dataset. The pseudo code for the proposed algorithm is shown in Table III. TABLE I. THE PSUEDOCODE FOR FINDING FREQUENT ITEM-SETS USING APRIORI ALGORITHM TABLE III. THE PSUEDOCODE FOR THE PROPOSED ALGORITHM TABLE II. T HE PSUEDOCODE FOR BIT-APRIORI To demonstrate the process of proposed algorithm, an example is given below. As shown in table , the example database is in the second column. In the database, there are ten transactions. TABLE IV. T HE EXAMPLE DATABASE TID 1 2 3 4 5 6 7 8 9 10 III. PROBLEM STATEMENT To find out frequent item-sets, both Apriori[3] and BitApriori[1] algorithms are used to search elements in the entire item-sets starting from 1 to N. When the total support count for an item is zero or lesser than the support count, then the elements are not required for the consecutive iterations. While forming tires Apriori and Bit-Apriori algorithms are considering these elements. Hence there is a scope for improvement by eliminating such items during tires formation. A new algorithm is proposed to improve the performance, resource utilization, time and efficiency. IV. PROPOSED ALGORITHM A new algorithm has been developed which deletes the infrequent items during the trie2 and subsequent iterations. The removal of infrequent items results with improvement in computation time. Apriori and Bit-Apriori algorithms do not removes the infrequent items during the tire2 and subsequent iterations. In the graph, the proposed algorithm checks if © 2013 ACEEE DOI: 03.LSCS.2013.2. 66 55 Items ABDEFL AGO CEI ACDEG ABCEGK EH ABCEFJ ACD ACEGM ACEGN Ordered frequent items AE GA CE GCAE GCAE E CAE CA GCAE GCAE Suppose the support threshold min_sup is 40%. The support of each item is counted, and infrequent items are deleted, during the first scan of the database. The support of each item is given as follows. A:8, B:3, C:7, D:3, E:8, F:2, G:5, H:1, I:1, J:1, K:1, L:1, M:1, N:1, O:1 Since the minimum support is 4, frequent items are sorted into a non-decreasing list, according to their respective supports. And if two items have the same support, they will be sorted according to their lexicographic order. In Step 2 of Bit-Apriori, all frequent 2-item-sets are found as shown in Table V. The trie with the binary string shown in each leaf is established, which is shown in Fig. 1.
  • 3. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 TABLE V. FREQUENT 2-I TEM-SETS TID 1 2 3 4 5 6 7 8 9 10 Ordered Items AE GA CE GCAE GCAE E CAE CA GCAE GCAE {G, C} 0 0 0 1 1 0 0 0 1 1 {G, A} 0 1 0 1 1 0 0 0 1 1 {G, E} 0 0 0 1 1 0 0 0 1 1 {C, A} 0 0 0 1 1 0 1 1 1 1 {C, E} 0 0 1 1 1 0 1 0 1 1 algorithms. Interesting finding is that, when the occurrence of the non-frequent item-sets are higher then the execution time gets reduced drastically. The experimental result shows that the proposed algorithm not only decreases the computation time but also decreases the resources used and the execution time is represented in Table VI. {A, E} 1 0 0 1 1 0 1 0 1 1 During the consequent iterations, element ‘E’ can be ignored by considering it as non-frequent item set. The computation time can be considerably reduced when the occurrence of element like ‘E’ are more in the frequent items. By completing all iterations the final output of the binary string is shown in Fig. 2. Fig. 3. Execution Time Of Algorithms TABLE VI. C OMPARISON OF EXECUTION T IME BETWEEN APRIORI/B IT-APRIORI/ MODIFIED BIT-APRIORI (Execution Time in Seconds) Dataset Apriori pusmsb Bit-Apriori Modified Bit-Apriori 4.5 1.32 0.98 VII. CONCLUSIONS In this paper, the modified Bit-Apriori technique improves the performance of Bit-Apriori, by eliminating the search of infrequent item-sets. It also improves the computational efficiency significantly. Experimental results have shown that modified Bit-Apriori algorithm out performs the fast BitApriori, especially when the occurrence of the non-frequent item-sets are more. When the database is large, the Bit-Apriori may suffer from the problem of memory scarcity due to large number of bitwise operations. Future work can be done in the direction of replacing bitwise operations. Fig. 1. Trie After Generation(2) REFERENCES [1] Jiemin Zheng., 1, Defu Zhang 1, Stephen C.H.Leung 2,Xiyue Zhou, “An efficient algorithm for frequent itemsets in data mining” Service Systems and Service Management(ICSSSM), 2010 7th International Conference on: 28-30 June 2010. [2] Agrwal R., R.Srikant, “Fast algorithms for mining association rules”, The International Conference on Very Large Dabases, pp. 487-499, 1994. [3] Zaki M.J., S. Parthasarathy, M.Ogihara, W.Li,” New algorithms for fast discovery of association rules”, in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283-296,1997. [4] Han J., J. Pei, Y. Yin, “Mining frequent patterns without candidate generation” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Fig. 2. Trie After Completion V. EXPERIMENTAL RESULTS The proposed algorithm is tested on different data sets and the experimental results are shown in Fig. 3. The proposed algorithm consumes considerably a lesser amount of time compared to Bit-Apriori and Apriori © 2013 ACEEE DOI: 03.LSCS.2013.2.66 56
  • 4. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 ACM Press, pp. 1-12,2000. [5] Pork J.S., M.S. Chen, P.S. Yu, “An effective hash based algorithm for mining association rules” ACM SIGMOD, pp. 175-186, 1995. [6] Brin S., R. Motwani, J.D. Ullman, S. Tsur,”Dynamic itemset counting and implicationrulesformarket basket data”,in Proceedings of the ACMSIGMOD International Conference on Management of Data, pp. 255–264, 1997. [7] Brin S., R. Motwani, C. Silverstein, “Beyond market baskets: generalizing association rules to correlations”, in Proceedings of the ACM SIGMOD International Conference on Management of Data, Tuscon, Arizona, pp. 265-276, 1997. [8] Toivonen H., “Sampling large databases for association rules”, in Proceedings of 22nd VLDB Conference, Mumbai, India, pp. 134-145, 1996. [9] Savasere A., E. Omiecinski, S.B. Navathe, “An efficient algorithm for mining association rules in large databases”, in Proceedings of 21th International Conference on Very Large Data Bases (VLDB’95), Zurich, pp. 432-444, 1995. © 2013 ACEEE DOI: 03.LSCS.2013.2. 66 [10] Tsay Y.J., J.Y. Chiang, “CBAR: an efficient method formining association rules,” Knowledge Based Systems, 18 (2-3), pp. 99-105, 2005. [11] Liu G., H. Lu, W. Lou, Y. Xu, J.X. Yu, “Efficient mining of frequent patterns using ascending frequency Ordered prefixtree”, Data Mining Knowledge Discovery, 9 (3), pp. 249-274, 2004. [12] Grahne G., J. Zhu, “Fast algorithms for frequent itemset mining using FP-Trees”, IEEE Transaction on Knowledge and Data Engineering, 17 (10), pp.1347-1362, 2005. [13] Zaki M.J., “Scalable algorithms for association mining” IEEE Transactions on Knowledge and Data Engineering, 12 (3), pp. 372-390, 2000. [14] Zaki M.J., K. Gouda, “Fast Vertical Mining Using Diffsets”, in Proceedings of the ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, pp. 326-335, 2003. [15] Dong J., M. Han, “BitTableFI: an efficient mining frequent itemsets algorithm” Knowledge Based Systems, 20 (4), pp. 329-335, 2007. 57