Weitere ähnliche Inhalte Ähnlich wie Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Mining (20) Mehr von idescitation (20) Kürzlich hochgeladen (20) Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Mining1. Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
Modifed Bit-Apriori Algorithm for Frequent ItemSets in Data Mining
J Karthikeyan1 and Dr. Udaykumar2
1
Research Scholar, Hindustan University, Chennai, India
Email: karthikeyan_world@hotmail.com
2
ACOE, Hindustan University, Chennai, India
Email: aukumar71@gmail.com
Abstract -Mining frequent item-sets is one of the most
important concepts in data mining. It is a fundamental and
initial task of data mining. Apriori[3] is the most popular and
frequently used algorithm for finding frequent item-sets.
There are other algorithms viz, Eclat[4], FP-growth[5] which
are used to find out frequent item-sets. In order to improve
the time efficiency of Apriori algorithms, Jiemin Zheng
introduced Bit-Apriori[1] algorithm with the following
corrections with respect to Apriori[3] algorithm.
1) Support count is implemented by performing bitwise “And”
operation on binary strings
2) Special equal-support pruning
In this paper, to improve the time efficiency of Bit-Apriori[1]
algorithm, a novel algorithm that deletes infrequent items
during trie2 and subsequent tire’s are proposed and
demonstrated with an example.
unimportant patterns in the item-sets mining.
II. RELATED WORK
A. Apriori algorithm
In computer science and data mining, Apriori is a classic
algorithm for learning association rules[8]. Apriori is designed
to operate on databases containing transactions. Apriori is
commonly used in association rule mining [3]. Apriori uses a
“bottom up” approach, where frequent subsets are extended
one item at a time (a step known as candidate generation),
and groups of candidates are tested against the data[9][10].
The algorithm terminates when no further successful
extensions are found. Apriori [2] uses breadth-first [3] search
and a tree structure to count[6][12[13] candidate item sets
efficiently. It generates candidate item sets of length K from
item sets of length k-1. Then it prunes the candidates which
have an infrequent sub pattern[11]. According to the
downward closure lemma, the candidate set contains all
frequent k- length item sets. After that, it scans the transaction
database to determine frequent item-sets among the
candidates.
Apriori [2], though historically significant, suffers from a
number of inefficiencies or trade-offs, which have spawned
other algorithms. Candidate generation generates large
numbers of subsets (the algorithm attempts to load up the
candidate set with as many as possible before each scan).
Bottom-up subset exploration (essentially a breadth-first
traversal of the subset lattice) finds any maximal subset S
only after all
-1of its proper subsets. The pseudo code for
Apriori is shown in Table I.
Index Terms - Data mining; frequent item-sets; Apriori; BitApriori, trie2.
I. INTRODUCTION
In recent years the size of database has increased rapidly.
This has led to a growing interest in the development of
tools capable of automatic extraction of knowledge from data.
The term data mining or knowledge discovery in database
has been adopted for a field of research dealing with the
automatic discovery of implicit information or knowledge
within the databases. The implicit information within
databases, mainly the interesting association relationships[5]
among sets of objects that lead to association rules may
disclose useful patterns for decision support, financial
forecast, marketing policies, even medical diagnosis and
many other applications[7].
In frequent patterns, the challenge is large number of result
patterns. As the minimum threshold becomes lower, an
exponentially large number of item-sets are generated.
Therefore, pruning[1] unimportant patterns can be done
effectively in mining process and that becomes one of the
main topics in frequent pattern mining. Hence, the main aim
is to optimize the process of finding frequent patterns which
should be efficient, scalable and can detect the important
patterns that can be used in various ways of extraction of
knowledge from data.
Therefore, the study of frequent item-sets mining is well
acknowledged in frequent pattern mining because of its broad
applications on association rules and for other data mining
tasks. An attempt is made in the present work to prune
© 2013 ACEEE
DOI: 03.LSCS.2013.2.66
B. Bit-Apriori Algorithm
Bit-Apriori used the datastructure and techniques of
Apriori [1] algorithm. The main difference between Apriori
and Bit-Apriori lies in candidate item-sets generation and
support count approach. These two steps consume more
time and memory in the Apriori [2] algorithm. Given a set of
item-sets, the algorithm attempts to find subsets which are
common to at least a minimum number C of the item-sets. The
time required for mining [14][15]frequent k-item-sets grows
significantly when k increases in Apriori. But Bit-Apriori [1]
performs much better because it has no candidate generation
and needs to traverse the trie only once. The pseudocode for
Bit-Apriori is shown in Table II.
54
2. Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
there exist a node with child then we go for traversal else
ignore the node by considering as infrequent. Such nodes
will not be considered for the further iterations in the proposed
algorithm. This will reduce the time complexity when the
occurance of the infrequent items are increased in the given
dataset.
The pseudo code for the proposed algorithm is shown in
Table III.
TABLE I. THE PSUEDOCODE FOR FINDING FREQUENT ITEM-SETS USING APRIORI
ALGORITHM
TABLE III. THE PSUEDOCODE FOR THE PROPOSED ALGORITHM
TABLE II. T HE PSUEDOCODE FOR BIT-APRIORI
To demonstrate the process of proposed algorithm, an
example is given below. As shown in table , the example
database is in the second column. In the database, there are
ten transactions.
TABLE IV. T HE EXAMPLE DATABASE
TID
1
2
3
4
5
6
7
8
9
10
III. PROBLEM STATEMENT
To find out frequent item-sets, both Apriori[3] and BitApriori[1] algorithms are used to search elements in the entire
item-sets starting from 1 to N. When the total support count
for an item is zero or lesser than the support count, then the
elements are not required for the consecutive iterations. While
forming tires Apriori and Bit-Apriori algorithms are
considering these elements.
Hence there is a scope for improvement by eliminating
such items during tires formation. A new algorithm is proposed
to improve the performance, resource utilization, time and
efficiency.
IV. PROPOSED ALGORITHM
A new algorithm has been developed which deletes the
infrequent items during the trie2 and subsequent iterations.
The removal of infrequent items results with improvement in
computation time. Apriori and Bit-Apriori algorithms do not
removes the infrequent items during the tire2 and subsequent
iterations. In the graph, the proposed algorithm checks if
© 2013 ACEEE
DOI: 03.LSCS.2013.2. 66
55
Items
ABDEFL
AGO
CEI
ACDEG
ABCEGK
EH
ABCEFJ
ACD
ACEGM
ACEGN
Ordered frequent
items
AE
GA
CE
GCAE
GCAE
E
CAE
CA
GCAE
GCAE
Suppose the support threshold min_sup is 40%. The
support of each item is counted, and infrequent items
are deleted, during the first scan of the database. The
support of each item is given as follows.
A:8, B:3, C:7, D:3, E:8, F:2, G:5, H:1, I:1, J:1, K:1, L:1,
M:1, N:1, O:1
Since the minimum support is 4, frequent items are sorted
into a non-decreasing list, according to their respective
supports. And if two items have the same support, they will
be sorted according to their lexicographic order. In Step 2 of
Bit-Apriori, all frequent 2-item-sets are found as shown in
Table V.
The trie with the binary string shown in each leaf is
established, which is shown in Fig. 1.
3. Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
TABLE V. FREQUENT 2-I TEM-SETS
TID
1
2
3
4
5
6
7
8
9
10
Ordered
Items
AE
GA
CE
GCAE
GCAE
E
CAE
CA
GCAE
GCAE
{G,
C}
0
0
0
1
1
0
0
0
1
1
{G,
A}
0
1
0
1
1
0
0
0
1
1
{G,
E}
0
0
0
1
1
0
0
0
1
1
{C,
A}
0
0
0
1
1
0
1
1
1
1
{C,
E}
0
0
1
1
1
0
1
0
1
1
algorithms. Interesting finding is that, when the occurrence
of the non-frequent item-sets are higher then the execution
time gets reduced drastically. The experimental result shows
that the proposed algorithm not only decreases the
computation time but also decreases the resources used and
the execution time is represented in Table VI.
{A,
E}
1
0
0
1
1
0
1
0
1
1
During the consequent iterations, element ‘E’ can be ignored by considering it as non-frequent item set. The computation time can be considerably reduced when the occurrence of element like ‘E’ are more in the frequent items. By
completing all iterations the final output of the binary string
is shown in Fig. 2.
Fig. 3. Execution Time Of Algorithms
TABLE VI. C OMPARISON OF EXECUTION T IME BETWEEN APRIORI/B IT-APRIORI/
MODIFIED BIT-APRIORI
(Execution Time in Seconds)
Dataset
Apriori
pusmsb
Bit-Apriori
Modified Bit-Apriori
4.5
1.32
0.98
VII. CONCLUSIONS
In this paper, the modified Bit-Apriori technique improves
the performance of Bit-Apriori, by eliminating the search of
infrequent item-sets. It also improves the computational
efficiency significantly. Experimental results have shown that
modified Bit-Apriori algorithm out performs the fast BitApriori, especially when the occurrence of the non-frequent
item-sets are more.
When the database is large, the Bit-Apriori may suffer
from the problem of memory scarcity due to large number of
bitwise operations. Future work can be done in the direction
of replacing bitwise operations.
Fig. 1. Trie After Generation(2)
REFERENCES
[1] Jiemin Zheng., 1, Defu Zhang 1, Stephen C.H.Leung 2,Xiyue
Zhou, “An efficient algorithm for frequent itemsets in data
mining” Service Systems and Service Management(ICSSSM),
2010 7th International Conference on: 28-30 June 2010.
[2] Agrwal R., R.Srikant, “Fast algorithms for mining association
rules”, The International Conference on Very Large Dabases,
pp. 487-499, 1994.
[3] Zaki M.J., S. Parthasarathy, M.Ogihara, W.Li,” New algorithms
for fast discovery of association rules”, in Proceedings of the
3rd International Conference on Knowledge Discovery and
Data Mining, pp. 283-296,1997.
[4] Han J., J. Pei, Y. Yin, “Mining frequent patterns without
candidate generation” in Proceedings of the 2000 ACM
SIGMOD international conference on Management of data,
Fig. 2. Trie After Completion
V. EXPERIMENTAL RESULTS
The proposed algorithm is tested on different data sets
and the experimental results are shown in Fig. 3.
The proposed algorithm consumes considerably a lesser
amount of time compared to Bit-Apriori and Apriori
© 2013 ACEEE
DOI: 03.LSCS.2013.2.66
56
4. Poster Paper
Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013
ACM Press, pp. 1-12,2000.
[5] Pork J.S., M.S. Chen, P.S. Yu, “An effective hash based
algorithm for mining association rules” ACM SIGMOD, pp.
175-186, 1995.
[6] Brin S., R. Motwani, J.D. Ullman, S. Tsur,”Dynamic itemset
counting and implicationrulesformarket basket data”,in
Proceedings of the ACMSIGMOD International Conference
on Management of Data, pp. 255–264, 1997.
[7] Brin S., R. Motwani, C. Silverstein, “Beyond market baskets:
generalizing association rules to correlations”, in Proceedings
of the ACM SIGMOD International Conference on
Management of Data, Tuscon, Arizona, pp. 265-276, 1997.
[8] Toivonen H., “Sampling large databases for association rules”,
in Proceedings of 22nd VLDB Conference, Mumbai, India,
pp. 134-145, 1996.
[9] Savasere A., E. Omiecinski, S.B. Navathe, “An efficient
algorithm for mining association rules in large databases”, in
Proceedings of 21th International Conference on Very Large
Data Bases (VLDB’95), Zurich, pp. 432-444, 1995.
© 2013 ACEEE
DOI: 03.LSCS.2013.2. 66
[10] Tsay Y.J., J.Y. Chiang, “CBAR: an efficient method formining
association rules,” Knowledge Based Systems, 18 (2-3), pp.
99-105, 2005.
[11] Liu G., H. Lu, W. Lou, Y. Xu, J.X. Yu, “Efficient mining of
frequent patterns using ascending frequency Ordered prefixtree”, Data Mining Knowledge Discovery, 9 (3), pp. 249-274,
2004.
[12] Grahne G., J. Zhu, “Fast algorithms for frequent itemset mining
using FP-Trees”, IEEE Transaction on Knowledge and Data
Engineering, 17 (10), pp.1347-1362, 2005.
[13] Zaki M.J., “Scalable algorithms for association mining” IEEE
Transactions on Knowledge and Data Engineering, 12 (3), pp.
372-390, 2000.
[14] Zaki M.J., K. Gouda, “Fast Vertical Mining Using Diffsets”,
in Proceedings of the ACM SIGMOD International Conference
on Knowledge Discovery and Data Mining, pp. 326-335, 2003.
[15] Dong J., M. Han, “BitTableFI: an efficient mining frequent
itemsets algorithm” Knowledge Based Systems, 20 (4), pp.
329-335, 2007.
57