SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
ISSN: 2278 – 1323
                                        International Journal of Advanced Research in Computer Engineering & Technology
                                                                                             Volume 1, Issue 5, July 2012



          ESW-FI: An Improved Analysis of Frequent
                      Itemsets Mining
                                       K Jothimani, Dr Antony SelvadossThanamani


                                                                     quickly as time goes by[5]. Therefore, catching the recent
   Abstract—Frequent itemsets mining play an essential role in        trend of data is an important issue when mining frequent
many datamining tasks. The frequent itemset mining over data          itemsets from data streams. Although the sliding window
streams is to find an approximate set of frequent itemsets in         model proposed a good solution for this problem, the
transaction with respect to a given support and threshold. It
should support the flexible trade-off between processing time
                                                                      appearing information of the patterns within the sliding
and mining accuracy. It should be time efficient even when the        window has to be maintained completely in the traditional
user-specified minimum support threshold is small. The                approach[11].
objective was to propose an effective algorithm which generates          We classify the stream-mining techniques into two
frequent patterns in a very less time. Our approach has been          categories based on the window model that they adopt in
developed based on improvement and analysis of MFI                    order to provide insights into how and why the techniques are
algorithm. In this paper, we introduce a new algorithm
ESW-FI, to maintain a dynamically selected set of item sets over
                                                                      useful. First, each element in the datastream can be examined
a sliding window. We keep some advantages of the previous             only once or twice, making traditional multiple-scan
approach and resolve the drawbacks, and produce the                   approaches infeasible. Second, the consumption of memory
improved runtime and memory consumption. The proposed                 space should be confined in a range, despite that data
algorithm gave a guarantee of the output quality and also a           elements are continuously streaming into the local site. Third,
bound on the memory usage.                                            notwithstanding the data characteristics of incoming stream
   .                                                                  may be unpredictable; the mining task should proceed
                                                                      normally and offer acceptable quality of results. Fourth, the
Index Terms—Data stream, data-stream mining, Efficient
Window, frequent itemset and sliding window.                          latest analysis result of the data stream should be available as
  .                                                                   soon as possible when the user invokes a query.
                                                                         In this paper we consider mining recent frequent itemsets
                                                                      in sliding windowsover data streams and estimate their true
                       I. INTRODUCTION                                frequencies, while making only one passover the data. In our
   Data stream is an ordered sequence of elements that arrives        design, we actively maintain potentially frequent itemsets ina
in timely order. In many application domains, data is                 compact data structure [8][9]. Compared with existing
presented in the form of data streams which originate at some         algorithms, our algorithm hastwo contributions as follows:
endpoint and are transmitted through the communication                   1. It is a real one-pass algorithm. The obsolete
channel to the central server. Different from data in                         transactions are not required whenthey are removed
traditional static datasets, data streams are continuous,                     from the sliding window.
unbounded, usually come with high speed and have a data                  2. Flexible queries based on continuous transactions in
distribution that often changes with time [1][2][3]. It is often              the sliding window can be answered with an error
refer to as streaming data..                                                  bound guarantee.
   Some well-known examples include market basket, traffic
signals, web-click packets, ATM transactions, and sensor                 In this paper, we propose a remarkable approximating
networks. In these applications, it is desirable that we obtain       method for discovering frequent itemsets in a transactional
some useful information, like patterns occurred frequently,           data stream under the sliding window model. In our design,
from the streaming data, to help us make some advanced                we actively maintain potentially frequent itemsets ina
decision[4]. Data-stream mining is such a technique that can          compact data structure. It is a real one-pass algorithm. The
find valuable information or knowledge from a great deal of           obsolete transactions are not required whenthey are removed
primitive data.                                                       from the sliding window. Flexible queries based on
   Recently, the data stream, which is an unbounded                   continuous transactions in the sliding window can
sequence of data elements generated at a rapid rate, provides         beanswered with an error bound guarantee.
a dynamic environment for collecting data sources. It is likely
that the embedded knowledge in a data stream will change
                                                                                           II. PRELIMINARIES
   
      K. Jothimani is with the Department of Computer Science, NGM      A. Related Work
College, Pollachi,Tamilnadu,India. Email: jothi1083@yahoo.co.in         Frequent-pattern mining has been studied extensively
   Dr. Antony SelvadossThanamani is with the Department of Computer
                                                                      indata mining, with many algorithms Proposed and
Science, NGM College, Pollachi. Tamilnadu, India.
     Email: selvdoss@yahoo.com                                        implemented. Frequent pattern mining and its
   .                                                                  associatedmethods have been popularly used in association
                                                All Rights Reserved © 2012 IJARCET
                                                                                                                                  386
ISSN: 2278 – 1323
                                       International Journal of Advanced Research in Computer Engineering & Technology
                                                                                            Volume 1, Issue 5, July 2012

rule mining, sequential pattern mining, structured pattern          elements are inserted without a deletion. Variable-sized
mining, iceberg cube computation, cube gradient analysis,           windows have no constraint[18].
associative classification, frequent pattern-basedclustering
                                                                      Fixed-sized and variable-sized windows model several
[26], and so on. There are a number of research works which
                                                                    variants of sliding windows. For example, tuple-based windows
study the problem of data-stream mining in the first decade of      correspond to fixed-sized windows, andtime-based windows
21stcentury. Among these studies, Lossy Counting [3] is the most    correspond to variable-sized windows.
famous method of mining frequent itemsets (FIs) through data
streams under the landmark window model. Besides the                   In recent two years, a new kind of data-stream mining
user-specified minimum support threshold (ms), Lossy Counting       method named DSCA has been proposed [12]. DSCA is an
also utilizes an error parameter, ε, to maintain those infrequent   approximate approach based on the application of the
itemsets having the potential to become frequent in the future.     Principle of Inclusion and Exclusion in Combinatorial
With the use of ε , when an itemset is newly found, Lossy           Mathematics [7]. One of the most notable features of DSCA
Counting knows the upper bound of counts that itemsets may          is that it would approximate the count of an arbitrary itemset,
have (in the previous stream data) before it has been monitored     through an equation (i.e., Equation (4) in [12]), by only the
by the algorithm.                                                   sum of counts of the first few orders of its subsets over the
 B. Previous Algorithms                                             data stream. There are also two techniques named counts
                                                                    bounding and correction, respectively, integrated within
   Based on the ε mechanism of Lossy Counting, [4]                  DSCA. The concept of Inclusion and Exclusion Principle [7] is
proposed the sliding window method, which can find out              valuable that it may also be applied in mining FIs under different
frequent itemsets in a data stream under the sliding window         window models other than the landmark window. Based on the
model with high accuracy. The sliding window method                 theory of Approximate Inclusion–Exclusion [8], we devise and
processes the incoming stream data transaction by                   propose a new algorithm, called ESW-FI, to discover FIs over
transaction.[11][17] Each time when a new transaction is            the sliding window in a transactional data stream.
inserted into the window, the itemsets contained in that
transaction are updated into the data structure incrementally.                        III. PROBLEM DESCRIPTION
Next, the oldest transaction in the original window is dropped         Let I = {x1, x2, …,xz} be a set of items (or attributes). An
out, and the effect of those itemsets contained in it is also       itemset (or a pattern) X is a subset of I and written as X =xi xj…xm.
deleted. The sliding window method also has a periodical            The length (i.e., number of items) of an itemset X is denoted by
operation to prune away unpromising itemsets from its data          |X|. A transaction, T, is an itemset and T supports an itemset, X, if
structure, and the frequent itemsets are output as mining
                                                                    X⊆T. A transactional data stream is a sequence of continuously
result whenever a user requests.
                                                                    incoming transactions. A segment, S, is a sequence of fixed
   In the sliding window model, there are two typical mining        number of transactions, and the size of S is indicated by s. A
methods: Moment [5] and CFI-Stream [6]. Both the two methods        window, W, in the stream is a set of successive w transactions,
aimed at mining Closed Frequent Itemsets (CFIs), a complete         where w ≥s. A sliding window in the stream is a window of a
and non-redundant representation of the set of FIs. Moment uses     fixed number of most recent w transactions which slides forward
a data structure called CET to maintain a dynamically selected      for every transaction or every segment of transactions. We adopt
set of itemsets, which includes CFIs and itemsets that form a       the notation Ilto denote all the itemsets of length l together with
boundary between CFIs and the rest of itemsets. The CFI-Stream      their respective counts in a set of transactions (e.g., over W or S).
algorithm, on the other hand, uses a data structure called DIU      In addition, we use Tnand Snto denote the latest transaction and
tree to maintain nothing other than all Closed Itemsets over the    segment in the current window, respectively. Thus, the current
sliding window. The current CFIs can be output anytime based        window is either W = <Tn-w+1, …,Tn> or W = <Sn-m+1, …, Sn>,
on any msspecified by the user.                                     where w and m denote the size of W and the number of segments
                                                                    in W, respectively.
   Besides, there are still some interesting research works [9]
[10] [11] on the sliding window model. In [9] a false-negative         In this research, we employ a prefix tree which is organized
approach named ESWCA was proposed. By employing a                   under the lexicographic order as our data structure, and also
progressively increasing function of ms, ESWCA greatly              processes the growth of itemsets in a lexicographic-ordered
reduces the number of potential itemsets and would approximate      away. As a result, an itemset is treated a little bit like a sequence
the set of FIs over a sliding window. In [10] a data structure      (while it is indeed an itemset). A superset of an itemset X is the
called DSTreewas proposed to capture information from the           one whose length is above |X| and has X as its prefix. We define
streams. This tree captures the contents of transactions in a       Growthl(X) as the set of supersets of an itemset X whose length
window, and arranges tree nodes according to some canonical         are l more than that of X, where l ≥0. The number of itemsets in
order.According to the overview, one crucial problem is to          Growthl(X) is denoted by |Growthl(X)|.
efficiently compute the inclusion between two itemsets. This
                                                                       We adopt the symbol cnt(X) to represent the count-value (or
costly operation could easily be performed when considering a
                                                                    just count) of an itemset X. The count of X over W, denoted as
new representation for items in transactions. From now, each
                                                                    cntW(X), is the number of transactions in W that support X. So
item is represented by a unique prime number.
                                                                    cntS(X) represents the count of X over a segment S. Given a
   A sliding window over a data stream is a bag of last N           user-specified minimum support threshold (ms), where 0 <ms≤1,
elements ofthe stream. There are two variants of sliding            we say that X is a frequent itemset (FI) over W if cntW(X) ms×w,
windows based on whether N is fixed(fixed-sized sliding             otherwise X is an infrequent itemset (IFI). The FI and IFI over S
windows) or variable (variable-sized sliding windows).              are defined similarly to those for W.
Fixed-sized windows are constrained to perform the insertions          Given a data stream in which every incoming transaction has
and deletions in pairs, exceptin the beginning when exactly N       its items arranged in order, and a changeable value of

                                                                                                                                    387
                                               All Rights Reserved © 2012 IJARCET
ISSN: 2278 – 1323
                                       International Journal of Advanced Research in Computer Engineering & Technology
                                                                                            Volume 1, Issue 5, July 2012

msspecified by the user, the problem of mining FIs over a sliding    both inserted into and dropped out from the window. The
window in the stream is to find out the set of frequent itemsets     transaction-by-transaction sliding of a window leads to
over the window at different slides.                                 excessively high frequency of processing. In addition, since the
   We remark that most of the existing stream mining                 transit of a data stream is usually at a high speed, and the impact
methods[13] [14] [15] [19] work with a basic hypothesis that         of one single transaction to the entire set of transactions (in the
they know the user-specified msin advance, and this                  current window) is very negligible, making it reasonable to
                                                                     handle the window sliding in a wider magnitude. Therefore, for
parameter will remain unchanged all the time before the
                                                                     an incoming transactional data stream to be mined, we propose
stream terminates. This hypothesis may be unreasonable,
                                                                     to process on a Segment-oriented window sliding.
since in general, a user may wish to tune the value of mseach
time he/she makes a query for the purpose of obtaining a more           We conceptually divide the sliding window further into
preferable mining result. An unchangeable msleads to a serious       several, say, m, segments, where the term segment is the one we
limitation and may be impractical in most real-life applications.    have defined earlier in Section 3. Each of the m segments
As a result, we relax this constraint in our problem that the user   contains a set of successive transactions and is of the same size s
is allowed to change the value of msat different times, while our    (i.e., contains the equal number of s transactions). Besides, in
method must still work normally.                                     each segment, the summary (which contains I1, I2, and the
                                                                     fair-cutters which we will introduced later) of transactions
  Using the original count-values of subsets to approximate the      belonging to that segment is stored in the data structure we use.
count of X may sometimes bring about considerable error. For a       We call the sliding of window “segment in-out,” which is
better approximation, there is a possible way, which is to           defined as follows.
boundthe range of counts of subsets for the itemset to be
approximated.                                                        Definition 1 (Segment in-out) LetScdenote the current segment
   Let Y be a 3-itemset (to be approximated) and y be a              which is going to be inserted into the window next (after it is full
1-subset of Y. To obtain a better approximation of Y, the            of s transactions). A segment in-out operation (of the window) is
count-value to each subset y with respect to Y has a particular      that we first insert Scinto and then extract Sn-m+1 from the original
                                                                     window, where n denotes the id of latest segment in the original
range, which can be determined by Y’s 2-subsets that have y
                                                                     window. Therefore, the windows before and after a sliding are W
as their common subset, respectively. This range of y’s count is
                                                                     = <Sn-m+1, …,Sn> and W = <Sn-m+2, …, Sn, Sc>, respectively.
bounded by an upper bound (i.e., the maximum) and a lower
bound (i.e., the minimum), and count-values within this range            By taking this segment-based manner of sliding, each time
are the set of portions of y’s original count which is more          when a segment in-out operation occurs, we delete (or drop out)
relevant for y with respect to Y[21]. We define Scby(Y) as the set   the earliest segment, which contains the summary of transactions
of count-values of y with respect to Y within the range obtained     of that segment, from the current window at each sliding. As a
through the aforesaid manner. Besides, the upper bound and the       result, we need not to maintain the whole transactions within the
lower bounds of count-values of y are denoted by uby(Y) and          current window in memory all along to support window
lby(Y), respectively.                                                sliding[24][25]. In addition, we remark that the parameter m
                                                                     directly affects the consumption of memory. A larger value of m
Lemma 1 Let cntub(Y) and cntlb(Y) respectively be the
                                                                     means the window will slide (update) more frequently, while the
approximate counts of Y obtained by using uby(Y) and lby(Y)of
                                                                     increasing overhead of memory space is also considerable. In
count-values to every 1-subset y ∈Y during the approximating         our opinion, an adequate size of m that falls in the range between
process. Then cntub(Y) ≤cntlb(Y).                                    5 and 20 may be suitable for general data streams.[20]

Proof: .Let Tub and Tlbbe the sums of counts of 1-subsets of Y
                                                                     Theorem 1 For a 2-itemset X and a threshold ms, let TPub(X)
obtained by choosing uby(Y) and lby(Y) for each y ∈ Y,
                                                                     and TPlb(X) be the true-positive rates of X’s 3-supersets in the
respectively. Then Tub ≥ Tlbsince uby(Y) ≥ lby(Y) for each y.        mining result resulting from adopting Ubc and Lbc to all the
According to (1) with the parameters setting m=3 and k=2, we         1-subsets of each superset, respectively. Then we have TPub(X) ≤
can eventually obtain the following simplified equation: cnt(Y)≒     TPlb(X).
(1-) c + (d, where the symboljk md denotes the
                                                                     Proof .Let P be the number of FIs of X’s 3-supersets with
coefficient of linearly transformed Chebyshev polynomial, and c      respect to ms. Also, let Pub and Plbrespectively be the numbers
and d represent the sum of counts of Y’s 2-subsets and that of
                                                                     of true FIs of X’s 3-supersets found by choosing Ubc and Lbc
counts of Y’s 1-subsets, respectively. Since the value of is
                                                                     to 1-subsets. According to Lemma 1, we have cntub(Y) ≤
less than 1, the coefficient
                                                                     cntlb(Y) for each 3-superset Y of X, which means that the
   ( for 1-subset term is then negative, which means        number of true-positive itemsets (i.e., FIs) obtained by
that the value of 1-subset term will be subtracted from the          choosing Lbc is at least equal to that of choosing Ubc, i.e., Pub ≤
other term. Thus, by substituting Tuband Tlbfor drespectively        Plb. Since TPub(X) = Pub/P and TPlb(X) = Plb/P, we then have
in the approximate equation and knowing that Tub ≥Tlb, we            TPub(X) ≤TPlb(X).
have cntub(Y) ≤ntlb(Y).
              c
                                                                     Theorem 2 For a 2-itemset X and a threshold ms, let TNub(X)
                                                                     and TNlb(X) be the true-negative rates of X’s 3-supersets in the
                                                                     mining result resulting from adopting Ubc and Lbc to all the
  IV. EFFICIENT SLIDING-WINDOW PROCESSING
                                                                     1-subsets of each superset, respectively. Then we have TNub(X) ≥
   In research works under the sliding window model [4] [5] [6],     TNlb(X).
the sliding of window is handled transaction by transaction;
however, we have a different opinion. Unlike the landmark            Proof .Let N be the number of IFIs of X’s 3-supersets with
window model, transactions in the sliding window model will be       respect to ms. Also, let Nub and Nlbrespectively be the numbers of

                                               All Rights Reserved © 2012 IJARCET
                                                                                                                                     388
ISSN: 2278 – 1323
                                       International Journal of Advanced Research in Computer Engineering & Technology
                                                                                            Volume 1, Issue 5, July 2012

true IFIs of X’s 3-supersets determined by choosing Ubc and Lbc     phase, which is activated when the window is full of generated
to 1-subsets. According to Lemma 1, we have cntub(Y) ≤cntlb(Y)      transactions.
for each 3-superset Y of X, which means that the number of
true-negative itemsets (i.e., IFIs) determined by choosing Ubc is
at least equal to that of choosing Lbc, i.e., Nub ≥Nlb. Since
TPub(X) = Nub/N and TPlb(X) = Nlb/N, we then have TPub(X) ≥
TPlb(X).

   Now we discuss the issue of selecting suitable
count-values in the bounded range of subsets for
approximating an itemset. According to Lemma 1 in Section
3, using the lower bound of counts (Lbc) for 1-subsets always
results in a higher approximate count for an itemset than that
of using the upper bound of counts (Ubc). It follows from
Theorem 1 and Theorem 2 that, adopting Lbc to the 1-subsets
to approximate the 3-supersets of an 2-itemset X will reach a
higher true-positive rate (i.e., recall ratio) in the result than
that of using Ubc, while choosing Ubc to the 1-subsets will
obtain a higher true-negative rate (which usually concerns a                   Fig 5.2: Space Requirement of ESW-FI with T20
higher precision ratio) in the result than that of using Lbc.
However, it is actually unknown whether to adopt Ubc or Lbc
to1-subsets for approximating an itemset Y.


                 V. EXPERIMENTAL RESULT
   We performed extensive experiments to evaluate the
performance of our algorithmand we present results in this
section. We compared our algorithm (ESW-FI)with the MFI
algorithm [7].Then we tested the adaptability of ESW-FI by
changing the data distribution ofthe dataset.
   The experiments were performed on a Pentium processor with
4G memory,running Windows 2007 (SP4). Our algorithm is
implemented in C# and compiledby using Microsoft Visual                    Fig 5.3: Execution time of MFI Vs ESW-FI for Updated
Studio 2008. In the implementation of three algorithms, weused                                 Transactions
the same data structures and subroutines in order to minimize the
performancedifferences caused by minor differences.
   We have usedT20 data set in the experiments. The T20 dataset
is generated by the IBM data generator [1]. For T20, the average
size of transactions, the average size of the maximal potential
frequent itemsets and the numberof items are 20, 4, and 1 000,
respectively.
     Fig 5.1: Execution time of MFI Vs ESW-FI for T20




                                                                       Fig 5.4: Space Requirement between MIF Vs ESW-FI with more
                                                                                          number of transactions

                                                                       Comparison between MFI and ESW-FI basedon different
                                                                    block size to observe the influence of different block sizes on
                                                                    algorithm, we take 2×104, 5×104and 10 × 104 as an example.
                                                                    Meanwhile, the size of the window is fixed 5 × 105.In this case,
                                                                    this window contains 25, 10 and 5 blocks, respectively.
   The data sets were broken into batches of10K size
transactions and provided to our program through standard              The experimental results of the execution time and the
input.For ESW-FI, the whole procedure can be divided into two       space requirement are plotted in Figure 5.1 to 5.4,
different phases namely, window initialization phase, which is      respectively. We collected the total number of seconds and
activated when the number of transactions generated so far is not
                                                                    the size of storage space required in KB per 10 × 5
more than the size of the sliding window and windowsliding

                                                                                                                                  389
                                               All Rights Reserved © 2012 IJARCET
ISSN: 2278 – 1323
                                               International Journal of Advanced Research in Computer Engineering & Technology
                                                                                                    Volume 1, Issue 5, July 2012

transactions. These results basically keep stable as the                        [18] Y. Chi, H. Wang, P.S. Yu, & R.R. Muntz, “Moment: maintaining
                                                                                     closed frequent itemsets over a stream sliding window”, Proc. 4th
window moves forward. The above experimental results                                 IEEE Conf. on Data Mining, Brighton, UK, 2004, pp. 59–66.
provide evidence that two algorithm can handle long data                        [19] N. Jiang & L. Gruenwald, “CFI-Stream: mining closed frequent
streams both. As ESW-FI may keep multiple triples for one                            Itemsets in data streams”, Proc. 12th ACM SIGKDD Conf. on
                                                                                     Knowledge Discovery and Data Mining, Philadelphia, PA,USA, 2006,
significant itemset, it needs more memory and time than MFI                          pp. 592–597.
does.                                                                           [20] Quest Data Mining Synthetic Data Generation Code. Available:
                                                                                     http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_minin
                           VI. CONCLUSION                                            g/datasets/syndata.html
                                                                                [21] P. Indyk, D. Woodruff, “Optimal approximations of the frequency
   We mainly discuss how to discover recent frequent                                 moments of data streams”, Proceedings of the thirty-seventh annual
itemsets in sliding windows over data streams. An efficient                          ACM symposium on Theory of computing, pp.202–208, 2005.
                                                                                [22] H.F Li, S.Y. Lee, M.K. Shan, “An Efficient Algorithm for Mining
algorithm is presented in detail. Compared with previous                             Frequent Itemsets over the Entire History of Data Streams”, In
algorithms, this algorithm doesn’t keep the data in the                              Proceedings of First International Workshop on Knowledge Discovery
window, which considerably increase the scalability of the                           in Data Streams 9IWKDDS, 2004.
                                                                                [23] H.F Li, S.Y. Lee, M.K. Shan, “Online Mining (Recently) Maximal
algorithm. Moreover, it can figure out answers with an error                         Frequent Itemsets over Data Streams”, In Proceedings of the 15th
bound guarantee for continuous transactions in the window.                           IEEE International Workshop on Research Issues on Data Engineering
The extensive experiment results demonstrate the                                     (RIDE), 2005.
                                                                                [24] Lin C.-H., Chiu D.-Y., Wu Y.-H. and Chen A.L.P.: Mining Frequent
effectiveness and efficiency of our approach. We can                                 Itemsets from Data Streams with a Time-Sensitive Sliding Window. In
implement this algorithm in traffic networks especially for                          Proc SIAM International Conference on Data Mining. (2005).
mining the itemsets in high speed streams.                                      [25] Ho, C. C., Li, H. F., Kuo, F. F., & Lee, S. Y. (2006). Incremental
                                                                                     mining of sequential patterns over a stream sliding window. In
                                                                                     Proceedings of IEEE international workshop on mining evolving and
       REFERENCES                                                                    streaming data.
                                                                                [26] K Jothimani, Dr Antony SelvadossThanmani, “MS: Multiple Segments
[1]    R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules             with Combinatorial Approach for Mining Frequent Itemsets Over Data
       between Sets of Items in Large Databases. In Proceedings of the 1993          Streams”, IJCES International Journal of Computer Engineering
       International Conference on Management of Data, pp. 207-216, 1993.            Science, Volume 2 Issue 2 ISSN : 2250:3439
[2]    R. Agrawal and R. Srikant. Fast Algorithms for Mining Association        [27] K Jothimani, Dr Antony SelvadossThanmani, “An Algorithm for
       Rules. In Proceedings of the 20th International Conference on Very            Mining Frequent Itemsets,” “International Journal for Computer
       Large Data Bases, pp. 487-499, 1994                                           Science Engineering and Technology IJCSET, March 2012,Vol 2,
[3]    J.H. Chang & W.S. Lee, “A sliding window method for finding recently          Issue 3,1012-1015.
       frequent itemsets over online data streams’, Journal of Information
       Science and Engineering, 20(4), 2004, pp. 753–762
[4]    Y. Zhu & D. Shasha, “Stat Stream: statistical monitoring of thousands
       of data streams in real time”, Proc. 28th Conf. on Very Large Data          K Jothimanireceived her Bachelor degree in Computer Science from
       Bases, Hong Kong, China, 2002, pp. 358–369.                              Bharathidasan University, Trichy in 2003. She received her Master degree in
[5]    G.S. Manku& R. Motwani, “Approximate frequency counts over data          Computer Applications in 2008. She pursued her Master of Philosophy in
       streams”, Proc. 28th Conf. on Very Large Data Bases, Hong Kong,          Computer Science in the year 2009 from Vinayaka Missions University,
       China, 2002, pp. 346–357.                                                Salem. Currently she is a research scholar of the Department of Computer
[6]    J.H. Chang & W.S. Lee, “A sliding window method for finding recently     Science, NGM College under Bharathiyar University, Coimbatore. She had
       frequent itemsets over online data streams”, Journal of Information      five years of experience in the computer field in technical as well as
       science and Engineering, 20(4), 2004, pp. 753–762.                       non-technical. She is a life member of Indian Society for Technical
[7]    M.N. Garofalakis, J. Gehrke, & R. Rastogi, Querying and mining data      Education from the year 2009. Also she is active member of Computer
       streams: you only get one look (A Tutorial), Proc. 2002 ACM              Society of India (CSI). She has published more than ten papers in
       SIGMOD Conf. on Management of Data, Madison, Wisconsin, 2002,            international and national conferences including international journals. Her
       p. 635.                                                                  area of interests includes Data mining, Knowledge Engineering and Image
[8]    Y. Zhu & D. Shasha, “Stat Stream: statistical monitoring of thousands    Processing.
       of data streams in real time”, Proc. 28th Conf. on Very Large Data                 Email: jothi1083@yahoo.co.in
       Bases, Hong Kong, China, 2002, pp. 358–369.                                 .
[9]    G.S. Manku& R. Motwani, “Approximate frequency counts over data             Dr Antony SelvadossThanamaniis presently working as professor and
       streams”, Proc. 28th Conf. on Very Large Data Bases, Hong Kong,          Head, Dept of Computer Science, NGM College, Coimbatore,
       China, 2002, pp. 346–357.                                                India(affiliated to Bharathiar University, Coimbatore). He has published
[10]   J.H. Chang & W.S. Lee, “A sliding window method for finding recently     many papers in international/national journals and written many books. His
       frequent itemsets over online data streams”, Journal of Information      areas of interest include E-Learning, Software Engineering, Data Mining,
       science and Engineering, 20(4), 2004, pp. 753–762.                       Networking, Parallel and Distributed Computing. He has to his credit 25
[11]   J. Cheng, Y. Ke, & W. Ng,” Maintaining frequent itemsets over            years of teaching and research experience. His current research interests
       high-speed data streams”, Proc. 10th Pacific-Asia Conf. on               include Grid Computing, Cloud Computing, Semantic Web. He is a life
       Knowledge Discovery and Data Mining, Singapore, 2006, pp.462–467.        member of Computer Society of India, Life member of Indian Society for
[12]   C.K.-S. Leung & Q.I. Khan, “DSTree: a tree structure for the mining      Technical Education, Life member of Indian Science Congress, Life member
       of frequent sets from data streams,” Proc. 6th IEEE Conf. on Data        of Computer Science, Teachers Associates, Newyork.
       Mining, Hong Kong, China, 2006, pp. 928–932.                             Email:-selvdoss@yahoo.com
[13]   B. Mozafari, H. Thakkar, & C. Zaniolo, “Verifying and mining
       frequent patterns from large windows over data streams”, Proc. 24th
       Conf. on Data Engineering, Mexico, 2008, pp. 179–188.
[14]   K.-F. Jea& C.-W. Li, “Discovering frequent itemsets over transactional
       data streams through an efficient and stable approximate approach,
       Expert Systems with Applications”, 36(10), 2009, pp. 12323–12331.
[15]   F. Bodon,        “A fast APRIORI implementation”, Proc. ICDM
       Workshop on Frequent Itemset Mining Implementations (FIMI’03),
       2003.
[16]   N. Jiang and L. Gruenwald, “Research Issues in Data Stream
       Association Rule Mining”. In SIGMOD Record, Vol. 35, No. 1, Mar.
       2006.
[17]   Frequent      Itemset      Mining      Implementations     Repository
       (FIMI).Available: http://fimi.cs.helsinki.fi/

                                                        All Rights Reserved © 2012 IJARCET
                                                                                                                                                       390

Weitere ähnliche Inhalte

Was ist angesagt?

IRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET Journal
 
Modelling, Conception and Simulation of a Digital Watermarking System based o...
Modelling, Conception and Simulation of a Digital Watermarking System based o...Modelling, Conception and Simulation of a Digital Watermarking System based o...
Modelling, Conception and Simulation of a Digital Watermarking System based o...sipij
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
 
Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection
Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection
Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection IJCNCJournal
 
A study of existing ontologies in the io t domain
A study of existing ontologies in the io t domainA study of existing ontologies in the io t domain
A study of existing ontologies in the io t domainSof Ouni
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Oladokun Sulaiman
 
Ncct Ieee Software Abstract Collection Volume 1 50+ Abst
Ncct   Ieee Software Abstract Collection Volume 1   50+ AbstNcct   Ieee Software Abstract Collection Volume 1   50+ Abst
Ncct Ieee Software Abstract Collection Volume 1 50+ Abstncct
 
Iaetsd decentralized coordinated cooperative cache
Iaetsd decentralized coordinated cooperative cacheIaetsd decentralized coordinated cooperative cache
Iaetsd decentralized coordinated cooperative cacheIaetsd Iaetsd
 
Iaetsd implementation of chaotic algorithm for secure image
Iaetsd implementation of chaotic algorithm for secure imageIaetsd implementation of chaotic algorithm for secure image
Iaetsd implementation of chaotic algorithm for secure imageIaetsd Iaetsd
 
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...Performance Analysis of Various Data Mining Techniques on Banknote Authentica...
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...inventionjournals
 
A new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningA new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningAttaporn Ninsuwan
 
New Research Articles 2020 June Issue International Journal on Cryptography a...
New Research Articles 2020 June Issue International Journal on Cryptography a...New Research Articles 2020 June Issue International Journal on Cryptography a...
New Research Articles 2020 June Issue International Journal on Cryptography a...ijcisjournal
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
 
A comparative analysis of data mining tools for performance mapping of wlan data
A comparative analysis of data mining tools for performance mapping of wlan dataA comparative analysis of data mining tools for performance mapping of wlan data
A comparative analysis of data mining tools for performance mapping of wlan dataIAEME Publication
 

Was ist angesagt? (19)

Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
IRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and Techniques
 
Modelling, Conception and Simulation of a Digital Watermarking System based o...
Modelling, Conception and Simulation of a Digital Watermarking System based o...Modelling, Conception and Simulation of a Digital Watermarking System based o...
Modelling, Conception and Simulation of a Digital Watermarking System based o...
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
 
Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection
Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection
Ensemble of Probabilistic Learning Networks for IoT Edge Intrusion Detection
 
A study of existing ontologies in the io t domain
A study of existing ontologies in the io t domainA study of existing ontologies in the io t domain
A study of existing ontologies in the io t domain
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
 
Ncct Ieee Software Abstract Collection Volume 1 50+ Abst
Ncct   Ieee Software Abstract Collection Volume 1   50+ AbstNcct   Ieee Software Abstract Collection Volume 1   50+ Abst
Ncct Ieee Software Abstract Collection Volume 1 50+ Abst
 
P24120125
P24120125P24120125
P24120125
 
Iaetsd decentralized coordinated cooperative cache
Iaetsd decentralized coordinated cooperative cacheIaetsd decentralized coordinated cooperative cache
Iaetsd decentralized coordinated cooperative cache
 
Iaetsd implementation of chaotic algorithm for secure image
Iaetsd implementation of chaotic algorithm for secure imageIaetsd implementation of chaotic algorithm for secure image
Iaetsd implementation of chaotic algorithm for secure image
 
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...Performance Analysis of Various Data Mining Techniques on Banknote Authentica...
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...
 
A new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningA new study of dss based on neural network and data mining
A new study of dss based on neural network and data mining
 
New Research Articles 2020 June Issue International Journal on Cryptography a...
New Research Articles 2020 June Issue International Journal on Cryptography a...New Research Articles 2020 June Issue International Journal on Cryptography a...
New Research Articles 2020 June Issue International Journal on Cryptography a...
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
A comparative analysis of data mining tools for performance mapping of wlan data
A comparative analysis of data mining tools for performance mapping of wlan dataA comparative analysis of data mining tools for performance mapping of wlan data
A comparative analysis of data mining tools for performance mapping of wlan data
 

Andere mochten auch

Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Editor IJARCET
 
Ijarcet vol-2-issue-7-2333-2336
Ijarcet vol-2-issue-7-2333-2336Ijarcet vol-2-issue-7-2333-2336
Ijarcet vol-2-issue-7-2333-2336Editor IJARCET
 
Ijarcet vol-2-issue-4-1389-1392
Ijarcet vol-2-issue-4-1389-1392Ijarcet vol-2-issue-4-1389-1392
Ijarcet vol-2-issue-4-1389-1392Editor IJARCET
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Editor IJARCET
 
Ijarcet vol-2-issue-7-2337-2340
Ijarcet vol-2-issue-7-2337-2340Ijarcet vol-2-issue-7-2337-2340
Ijarcet vol-2-issue-7-2337-2340Editor IJARCET
 
A Unified PID Control Methodology to Meet Plant Objectives
A Unified PID Control Methodology to Meet Plant ObjectivesA Unified PID Control Methodology to Meet Plant Objectives
A Unified PID Control Methodology to Meet Plant ObjectivesJim Cahill
 

Andere mochten auch (8)

Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582
 
Ijarcet vol-2-issue-7-2333-2336
Ijarcet vol-2-issue-7-2333-2336Ijarcet vol-2-issue-7-2333-2336
Ijarcet vol-2-issue-7-2333-2336
 
Ijarcet vol-2-issue-4-1389-1392
Ijarcet vol-2-issue-4-1389-1392Ijarcet vol-2-issue-4-1389-1392
Ijarcet vol-2-issue-4-1389-1392
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362
 
Ijarcet vol-2-issue-7-2337-2340
Ijarcet vol-2-issue-7-2337-2340Ijarcet vol-2-issue-7-2337-2340
Ijarcet vol-2-issue-7-2337-2340
 
A Unified PID Control Methodology to Meet Plant Objectives
A Unified PID Control Methodology to Meet Plant ObjectivesA Unified PID Control Methodology to Meet Plant Objectives
A Unified PID Control Methodology to Meet Plant Objectives
 
Class 19 pi & pd control modes
Class 19   pi & pd control modesClass 19   pi & pd control modes
Class 19 pi & pd control modes
 
Chapter 8-SAMPLE & SAMPLING TECHNIQUES
Chapter 8-SAMPLE & SAMPLING TECHNIQUESChapter 8-SAMPLE & SAMPLING TECHNIQUES
Chapter 8-SAMPLE & SAMPLING TECHNIQUES
 

Ähnlich wie 386 390

Mining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overMining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overIJDKP
 
Paper id 25201431
Paper id 25201431Paper id 25201431
Paper id 25201431IJRAT
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
 
A New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesA New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesVenu Madhav
 
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...ijitcs
 
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...IJDKP
 
Mining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmMining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmManishankar Medi
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
 
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...Waqas Tariq
 
Integrating compression technique for data mining
Integrating compression technique for data  miningIntegrating compression technique for data  mining
Integrating compression technique for data miningDr.Manmohan Singh
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Db2425082511
Db2425082511Db2425082511
Db2425082511IJMER
 
Database techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspectionDatabase techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspectionTELKOMNIKA JOURNAL
 

Ähnlich wie 386 390 (20)

Mining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overMining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) over
 
Aa31163168
Aa31163168Aa31163168
Aa31163168
 
Paper id 25201431
Paper id 25201431Paper id 25201431
Paper id 25201431
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
A New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesA New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association Rules
 
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
 
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
 
Fp3111131118
Fp3111131118Fp3111131118
Fp3111131118
 
Mining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmMining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering Algorithm
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream Clustering
 
1105.1950
1105.19501105.1950
1105.1950
 
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...
 
Integrating compression technique for data mining
Integrating compression technique for data  miningIntegrating compression technique for data  mining
Integrating compression technique for data mining
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Db2425082511
Db2425082511Db2425082511
Db2425082511
 
1699 1704
1699 17041699 1704
1699 1704
 
1699 1704
1699 17041699 1704
1699 1704
 
Database techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspectionDatabase techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspection
 

Mehr von Editor IJARCET

Electrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationElectrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationEditor IJARCET
 
Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Editor IJARCET
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Editor IJARCET
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Editor IJARCET
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Editor IJARCET
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Editor IJARCET
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Editor IJARCET
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Editor IJARCET
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Editor IJARCET
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Editor IJARCET
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Editor IJARCET
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Editor IJARCET
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Editor IJARCET
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Editor IJARCET
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Editor IJARCET
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Editor IJARCET
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Editor IJARCET
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Editor IJARCET
 

Mehr von Editor IJARCET (20)

Electrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturizationElectrically small antennas: The art of miniaturization
Electrically small antennas: The art of miniaturization
 
Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207Volume 2-issue-6-2205-2207
Volume 2-issue-6-2205-2207
 
Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199Volume 2-issue-6-2195-2199
Volume 2-issue-6-2195-2199
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185Volume 2-issue-6-2177-2185
Volume 2-issue-6-2177-2185
 
Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176Volume 2-issue-6-2173-2176
Volume 2-issue-6-2173-2176
 
Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172Volume 2-issue-6-2165-2172
Volume 2-issue-6-2165-2172
 
Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164Volume 2-issue-6-2159-2164
Volume 2-issue-6-2159-2164
 
Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158Volume 2-issue-6-2155-2158
Volume 2-issue-6-2155-2158
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124
 
Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142Volume 2-issue-6-2139-2142
Volume 2-issue-6-2139-2142
 
Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138Volume 2-issue-6-2130-2138
Volume 2-issue-6-2130-2138
 
Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129Volume 2-issue-6-2125-2129
Volume 2-issue-6-2125-2129
 
Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118Volume 2-issue-6-2114-2118
Volume 2-issue-6-2114-2118
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113
 
Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107Volume 2-issue-6-2102-2107
Volume 2-issue-6-2102-2107
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

386 390

  • 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 ESW-FI: An Improved Analysis of Frequent Itemsets Mining K Jothimani, Dr Antony SelvadossThanamani  quickly as time goes by[5]. Therefore, catching the recent Abstract—Frequent itemsets mining play an essential role in trend of data is an important issue when mining frequent many datamining tasks. The frequent itemset mining over data itemsets from data streams. Although the sliding window streams is to find an approximate set of frequent itemsets in model proposed a good solution for this problem, the transaction with respect to a given support and threshold. It should support the flexible trade-off between processing time appearing information of the patterns within the sliding and mining accuracy. It should be time efficient even when the window has to be maintained completely in the traditional user-specified minimum support threshold is small. The approach[11]. objective was to propose an effective algorithm which generates We classify the stream-mining techniques into two frequent patterns in a very less time. Our approach has been categories based on the window model that they adopt in developed based on improvement and analysis of MFI order to provide insights into how and why the techniques are algorithm. In this paper, we introduce a new algorithm ESW-FI, to maintain a dynamically selected set of item sets over useful. First, each element in the datastream can be examined a sliding window. We keep some advantages of the previous only once or twice, making traditional multiple-scan approach and resolve the drawbacks, and produce the approaches infeasible. Second, the consumption of memory improved runtime and memory consumption. The proposed space should be confined in a range, despite that data algorithm gave a guarantee of the output quality and also a elements are continuously streaming into the local site. Third, bound on the memory usage. notwithstanding the data characteristics of incoming stream . may be unpredictable; the mining task should proceed normally and offer acceptable quality of results. Fourth, the Index Terms—Data stream, data-stream mining, Efficient Window, frequent itemset and sliding window. latest analysis result of the data stream should be available as . soon as possible when the user invokes a query. In this paper we consider mining recent frequent itemsets in sliding windowsover data streams and estimate their true I. INTRODUCTION frequencies, while making only one passover the data. In our Data stream is an ordered sequence of elements that arrives design, we actively maintain potentially frequent itemsets ina in timely order. In many application domains, data is compact data structure [8][9]. Compared with existing presented in the form of data streams which originate at some algorithms, our algorithm hastwo contributions as follows: endpoint and are transmitted through the communication 1. It is a real one-pass algorithm. The obsolete channel to the central server. Different from data in transactions are not required whenthey are removed traditional static datasets, data streams are continuous, from the sliding window. unbounded, usually come with high speed and have a data 2. Flexible queries based on continuous transactions in distribution that often changes with time [1][2][3]. It is often the sliding window can be answered with an error refer to as streaming data.. bound guarantee. Some well-known examples include market basket, traffic signals, web-click packets, ATM transactions, and sensor In this paper, we propose a remarkable approximating networks. In these applications, it is desirable that we obtain method for discovering frequent itemsets in a transactional some useful information, like patterns occurred frequently, data stream under the sliding window model. In our design, from the streaming data, to help us make some advanced we actively maintain potentially frequent itemsets ina decision[4]. Data-stream mining is such a technique that can compact data structure. It is a real one-pass algorithm. The find valuable information or knowledge from a great deal of obsolete transactions are not required whenthey are removed primitive data. from the sliding window. Flexible queries based on Recently, the data stream, which is an unbounded continuous transactions in the sliding window can sequence of data elements generated at a rapid rate, provides beanswered with an error bound guarantee. a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change II. PRELIMINARIES  K. Jothimani is with the Department of Computer Science, NGM A. Related Work College, Pollachi,Tamilnadu,India. Email: jothi1083@yahoo.co.in Frequent-pattern mining has been studied extensively Dr. Antony SelvadossThanamani is with the Department of Computer indata mining, with many algorithms Proposed and Science, NGM College, Pollachi. Tamilnadu, India. Email: selvdoss@yahoo.com implemented. Frequent pattern mining and its . associatedmethods have been popularly used in association All Rights Reserved © 2012 IJARCET 386
  • 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 rule mining, sequential pattern mining, structured pattern elements are inserted without a deletion. Variable-sized mining, iceberg cube computation, cube gradient analysis, windows have no constraint[18]. associative classification, frequent pattern-basedclustering Fixed-sized and variable-sized windows model several [26], and so on. There are a number of research works which variants of sliding windows. For example, tuple-based windows study the problem of data-stream mining in the first decade of correspond to fixed-sized windows, andtime-based windows 21stcentury. Among these studies, Lossy Counting [3] is the most correspond to variable-sized windows. famous method of mining frequent itemsets (FIs) through data streams under the landmark window model. Besides the In recent two years, a new kind of data-stream mining user-specified minimum support threshold (ms), Lossy Counting method named DSCA has been proposed [12]. DSCA is an also utilizes an error parameter, ε, to maintain those infrequent approximate approach based on the application of the itemsets having the potential to become frequent in the future. Principle of Inclusion and Exclusion in Combinatorial With the use of ε , when an itemset is newly found, Lossy Mathematics [7]. One of the most notable features of DSCA Counting knows the upper bound of counts that itemsets may is that it would approximate the count of an arbitrary itemset, have (in the previous stream data) before it has been monitored through an equation (i.e., Equation (4) in [12]), by only the by the algorithm. sum of counts of the first few orders of its subsets over the B. Previous Algorithms data stream. There are also two techniques named counts bounding and correction, respectively, integrated within Based on the ε mechanism of Lossy Counting, [4] DSCA. The concept of Inclusion and Exclusion Principle [7] is proposed the sliding window method, which can find out valuable that it may also be applied in mining FIs under different frequent itemsets in a data stream under the sliding window window models other than the landmark window. Based on the model with high accuracy. The sliding window method theory of Approximate Inclusion–Exclusion [8], we devise and processes the incoming stream data transaction by propose a new algorithm, called ESW-FI, to discover FIs over transaction.[11][17] Each time when a new transaction is the sliding window in a transactional data stream. inserted into the window, the itemsets contained in that transaction are updated into the data structure incrementally. III. PROBLEM DESCRIPTION Next, the oldest transaction in the original window is dropped Let I = {x1, x2, …,xz} be a set of items (or attributes). An out, and the effect of those itemsets contained in it is also itemset (or a pattern) X is a subset of I and written as X =xi xj…xm. deleted. The sliding window method also has a periodical The length (i.e., number of items) of an itemset X is denoted by operation to prune away unpromising itemsets from its data |X|. A transaction, T, is an itemset and T supports an itemset, X, if structure, and the frequent itemsets are output as mining X⊆T. A transactional data stream is a sequence of continuously result whenever a user requests. incoming transactions. A segment, S, is a sequence of fixed In the sliding window model, there are two typical mining number of transactions, and the size of S is indicated by s. A methods: Moment [5] and CFI-Stream [6]. Both the two methods window, W, in the stream is a set of successive w transactions, aimed at mining Closed Frequent Itemsets (CFIs), a complete where w ≥s. A sliding window in the stream is a window of a and non-redundant representation of the set of FIs. Moment uses fixed number of most recent w transactions which slides forward a data structure called CET to maintain a dynamically selected for every transaction or every segment of transactions. We adopt set of itemsets, which includes CFIs and itemsets that form a the notation Ilto denote all the itemsets of length l together with boundary between CFIs and the rest of itemsets. The CFI-Stream their respective counts in a set of transactions (e.g., over W or S). algorithm, on the other hand, uses a data structure called DIU In addition, we use Tnand Snto denote the latest transaction and tree to maintain nothing other than all Closed Itemsets over the segment in the current window, respectively. Thus, the current sliding window. The current CFIs can be output anytime based window is either W = <Tn-w+1, …,Tn> or W = <Sn-m+1, …, Sn>, on any msspecified by the user. where w and m denote the size of W and the number of segments in W, respectively. Besides, there are still some interesting research works [9] [10] [11] on the sliding window model. In [9] a false-negative In this research, we employ a prefix tree which is organized approach named ESWCA was proposed. By employing a under the lexicographic order as our data structure, and also progressively increasing function of ms, ESWCA greatly processes the growth of itemsets in a lexicographic-ordered reduces the number of potential itemsets and would approximate away. As a result, an itemset is treated a little bit like a sequence the set of FIs over a sliding window. In [10] a data structure (while it is indeed an itemset). A superset of an itemset X is the called DSTreewas proposed to capture information from the one whose length is above |X| and has X as its prefix. We define streams. This tree captures the contents of transactions in a Growthl(X) as the set of supersets of an itemset X whose length window, and arranges tree nodes according to some canonical are l more than that of X, where l ≥0. The number of itemsets in order.According to the overview, one crucial problem is to Growthl(X) is denoted by |Growthl(X)|. efficiently compute the inclusion between two itemsets. This We adopt the symbol cnt(X) to represent the count-value (or costly operation could easily be performed when considering a just count) of an itemset X. The count of X over W, denoted as new representation for items in transactions. From now, each cntW(X), is the number of transactions in W that support X. So item is represented by a unique prime number. cntS(X) represents the count of X over a segment S. Given a A sliding window over a data stream is a bag of last N user-specified minimum support threshold (ms), where 0 <ms≤1, elements ofthe stream. There are two variants of sliding we say that X is a frequent itemset (FI) over W if cntW(X) ms×w, windows based on whether N is fixed(fixed-sized sliding otherwise X is an infrequent itemset (IFI). The FI and IFI over S windows) or variable (variable-sized sliding windows). are defined similarly to those for W. Fixed-sized windows are constrained to perform the insertions Given a data stream in which every incoming transaction has and deletions in pairs, exceptin the beginning when exactly N its items arranged in order, and a changeable value of 387 All Rights Reserved © 2012 IJARCET
  • 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 msspecified by the user, the problem of mining FIs over a sliding both inserted into and dropped out from the window. The window in the stream is to find out the set of frequent itemsets transaction-by-transaction sliding of a window leads to over the window at different slides. excessively high frequency of processing. In addition, since the We remark that most of the existing stream mining transit of a data stream is usually at a high speed, and the impact methods[13] [14] [15] [19] work with a basic hypothesis that of one single transaction to the entire set of transactions (in the they know the user-specified msin advance, and this current window) is very negligible, making it reasonable to handle the window sliding in a wider magnitude. Therefore, for parameter will remain unchanged all the time before the an incoming transactional data stream to be mined, we propose stream terminates. This hypothesis may be unreasonable, to process on a Segment-oriented window sliding. since in general, a user may wish to tune the value of mseach time he/she makes a query for the purpose of obtaining a more We conceptually divide the sliding window further into preferable mining result. An unchangeable msleads to a serious several, say, m, segments, where the term segment is the one we limitation and may be impractical in most real-life applications. have defined earlier in Section 3. Each of the m segments As a result, we relax this constraint in our problem that the user contains a set of successive transactions and is of the same size s is allowed to change the value of msat different times, while our (i.e., contains the equal number of s transactions). Besides, in method must still work normally. each segment, the summary (which contains I1, I2, and the fair-cutters which we will introduced later) of transactions Using the original count-values of subsets to approximate the belonging to that segment is stored in the data structure we use. count of X may sometimes bring about considerable error. For a We call the sliding of window “segment in-out,” which is better approximation, there is a possible way, which is to defined as follows. boundthe range of counts of subsets for the itemset to be approximated. Definition 1 (Segment in-out) LetScdenote the current segment Let Y be a 3-itemset (to be approximated) and y be a which is going to be inserted into the window next (after it is full 1-subset of Y. To obtain a better approximation of Y, the of s transactions). A segment in-out operation (of the window) is count-value to each subset y with respect to Y has a particular that we first insert Scinto and then extract Sn-m+1 from the original window, where n denotes the id of latest segment in the original range, which can be determined by Y’s 2-subsets that have y window. Therefore, the windows before and after a sliding are W as their common subset, respectively. This range of y’s count is = <Sn-m+1, …,Sn> and W = <Sn-m+2, …, Sn, Sc>, respectively. bounded by an upper bound (i.e., the maximum) and a lower bound (i.e., the minimum), and count-values within this range By taking this segment-based manner of sliding, each time are the set of portions of y’s original count which is more when a segment in-out operation occurs, we delete (or drop out) relevant for y with respect to Y[21]. We define Scby(Y) as the set the earliest segment, which contains the summary of transactions of count-values of y with respect to Y within the range obtained of that segment, from the current window at each sliding. As a through the aforesaid manner. Besides, the upper bound and the result, we need not to maintain the whole transactions within the lower bounds of count-values of y are denoted by uby(Y) and current window in memory all along to support window lby(Y), respectively. sliding[24][25]. In addition, we remark that the parameter m directly affects the consumption of memory. A larger value of m Lemma 1 Let cntub(Y) and cntlb(Y) respectively be the means the window will slide (update) more frequently, while the approximate counts of Y obtained by using uby(Y) and lby(Y)of increasing overhead of memory space is also considerable. In count-values to every 1-subset y ∈Y during the approximating our opinion, an adequate size of m that falls in the range between process. Then cntub(Y) ≤cntlb(Y). 5 and 20 may be suitable for general data streams.[20] Proof: .Let Tub and Tlbbe the sums of counts of 1-subsets of Y Theorem 1 For a 2-itemset X and a threshold ms, let TPub(X) obtained by choosing uby(Y) and lby(Y) for each y ∈ Y, and TPlb(X) be the true-positive rates of X’s 3-supersets in the respectively. Then Tub ≥ Tlbsince uby(Y) ≥ lby(Y) for each y. mining result resulting from adopting Ubc and Lbc to all the According to (1) with the parameters setting m=3 and k=2, we 1-subsets of each superset, respectively. Then we have TPub(X) ≤ can eventually obtain the following simplified equation: cnt(Y)≒ TPlb(X). (1-) c + (d, where the symboljk md denotes the Proof .Let P be the number of FIs of X’s 3-supersets with coefficient of linearly transformed Chebyshev polynomial, and c respect to ms. Also, let Pub and Plbrespectively be the numbers and d represent the sum of counts of Y’s 2-subsets and that of of true FIs of X’s 3-supersets found by choosing Ubc and Lbc counts of Y’s 1-subsets, respectively. Since the value of is to 1-subsets. According to Lemma 1, we have cntub(Y) ≤ less than 1, the coefficient cntlb(Y) for each 3-superset Y of X, which means that the ( for 1-subset term is then negative, which means number of true-positive itemsets (i.e., FIs) obtained by that the value of 1-subset term will be subtracted from the choosing Lbc is at least equal to that of choosing Ubc, i.e., Pub ≤ other term. Thus, by substituting Tuband Tlbfor drespectively Plb. Since TPub(X) = Pub/P and TPlb(X) = Plb/P, we then have in the approximate equation and knowing that Tub ≥Tlb, we TPub(X) ≤TPlb(X). have cntub(Y) ≤ntlb(Y). c Theorem 2 For a 2-itemset X and a threshold ms, let TNub(X) and TNlb(X) be the true-negative rates of X’s 3-supersets in the mining result resulting from adopting Ubc and Lbc to all the IV. EFFICIENT SLIDING-WINDOW PROCESSING 1-subsets of each superset, respectively. Then we have TNub(X) ≥ In research works under the sliding window model [4] [5] [6], TNlb(X). the sliding of window is handled transaction by transaction; however, we have a different opinion. Unlike the landmark Proof .Let N be the number of IFIs of X’s 3-supersets with window model, transactions in the sliding window model will be respect to ms. Also, let Nub and Nlbrespectively be the numbers of All Rights Reserved © 2012 IJARCET 388
  • 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 true IFIs of X’s 3-supersets determined by choosing Ubc and Lbc phase, which is activated when the window is full of generated to 1-subsets. According to Lemma 1, we have cntub(Y) ≤cntlb(Y) transactions. for each 3-superset Y of X, which means that the number of true-negative itemsets (i.e., IFIs) determined by choosing Ubc is at least equal to that of choosing Lbc, i.e., Nub ≥Nlb. Since TPub(X) = Nub/N and TPlb(X) = Nlb/N, we then have TPub(X) ≥ TPlb(X). Now we discuss the issue of selecting suitable count-values in the bounded range of subsets for approximating an itemset. According to Lemma 1 in Section 3, using the lower bound of counts (Lbc) for 1-subsets always results in a higher approximate count for an itemset than that of using the upper bound of counts (Ubc). It follows from Theorem 1 and Theorem 2 that, adopting Lbc to the 1-subsets to approximate the 3-supersets of an 2-itemset X will reach a higher true-positive rate (i.e., recall ratio) in the result than that of using Ubc, while choosing Ubc to the 1-subsets will obtain a higher true-negative rate (which usually concerns a Fig 5.2: Space Requirement of ESW-FI with T20 higher precision ratio) in the result than that of using Lbc. However, it is actually unknown whether to adopt Ubc or Lbc to1-subsets for approximating an itemset Y. V. EXPERIMENTAL RESULT We performed extensive experiments to evaluate the performance of our algorithmand we present results in this section. We compared our algorithm (ESW-FI)with the MFI algorithm [7].Then we tested the adaptability of ESW-FI by changing the data distribution ofthe dataset. The experiments were performed on a Pentium processor with 4G memory,running Windows 2007 (SP4). Our algorithm is implemented in C# and compiledby using Microsoft Visual Fig 5.3: Execution time of MFI Vs ESW-FI for Updated Studio 2008. In the implementation of three algorithms, weused Transactions the same data structures and subroutines in order to minimize the performancedifferences caused by minor differences. We have usedT20 data set in the experiments. The T20 dataset is generated by the IBM data generator [1]. For T20, the average size of transactions, the average size of the maximal potential frequent itemsets and the numberof items are 20, 4, and 1 000, respectively. Fig 5.1: Execution time of MFI Vs ESW-FI for T20 Fig 5.4: Space Requirement between MIF Vs ESW-FI with more number of transactions Comparison between MFI and ESW-FI basedon different block size to observe the influence of different block sizes on algorithm, we take 2×104, 5×104and 10 × 104 as an example. Meanwhile, the size of the window is fixed 5 × 105.In this case, this window contains 25, 10 and 5 blocks, respectively. The data sets were broken into batches of10K size transactions and provided to our program through standard The experimental results of the execution time and the input.For ESW-FI, the whole procedure can be divided into two space requirement are plotted in Figure 5.1 to 5.4, different phases namely, window initialization phase, which is respectively. We collected the total number of seconds and activated when the number of transactions generated so far is not the size of storage space required in KB per 10 × 5 more than the size of the sliding window and windowsliding 389 All Rights Reserved © 2012 IJARCET
  • 5. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 transactions. These results basically keep stable as the [18] Y. Chi, H. Wang, P.S. Yu, & R.R. Muntz, “Moment: maintaining closed frequent itemsets over a stream sliding window”, Proc. 4th window moves forward. The above experimental results IEEE Conf. on Data Mining, Brighton, UK, 2004, pp. 59–66. provide evidence that two algorithm can handle long data [19] N. Jiang & L. Gruenwald, “CFI-Stream: mining closed frequent streams both. As ESW-FI may keep multiple triples for one Itemsets in data streams”, Proc. 12th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, Philadelphia, PA,USA, 2006, significant itemset, it needs more memory and time than MFI pp. 592–597. does. [20] Quest Data Mining Synthetic Data Generation Code. Available: http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_minin VI. CONCLUSION g/datasets/syndata.html [21] P. Indyk, D. Woodruff, “Optimal approximations of the frequency We mainly discuss how to discover recent frequent moments of data streams”, Proceedings of the thirty-seventh annual itemsets in sliding windows over data streams. An efficient ACM symposium on Theory of computing, pp.202–208, 2005. [22] H.F Li, S.Y. Lee, M.K. Shan, “An Efficient Algorithm for Mining algorithm is presented in detail. Compared with previous Frequent Itemsets over the Entire History of Data Streams”, In algorithms, this algorithm doesn’t keep the data in the Proceedings of First International Workshop on Knowledge Discovery window, which considerably increase the scalability of the in Data Streams 9IWKDDS, 2004. [23] H.F Li, S.Y. Lee, M.K. Shan, “Online Mining (Recently) Maximal algorithm. Moreover, it can figure out answers with an error Frequent Itemsets over Data Streams”, In Proceedings of the 15th bound guarantee for continuous transactions in the window. IEEE International Workshop on Research Issues on Data Engineering The extensive experiment results demonstrate the (RIDE), 2005. [24] Lin C.-H., Chiu D.-Y., Wu Y.-H. and Chen A.L.P.: Mining Frequent effectiveness and efficiency of our approach. We can Itemsets from Data Streams with a Time-Sensitive Sliding Window. In implement this algorithm in traffic networks especially for Proc SIAM International Conference on Data Mining. (2005). mining the itemsets in high speed streams. [25] Ho, C. C., Li, H. F., Kuo, F. F., & Lee, S. Y. (2006). Incremental mining of sequential patterns over a stream sliding window. In Proceedings of IEEE international workshop on mining evolving and REFERENCES streaming data. [26] K Jothimani, Dr Antony SelvadossThanmani, “MS: Multiple Segments [1] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules with Combinatorial Approach for Mining Frequent Itemsets Over Data between Sets of Items in Large Databases. In Proceedings of the 1993 Streams”, IJCES International Journal of Computer Engineering International Conference on Management of Data, pp. 207-216, 1993. Science, Volume 2 Issue 2 ISSN : 2250:3439 [2] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association [27] K Jothimani, Dr Antony SelvadossThanmani, “An Algorithm for Rules. In Proceedings of the 20th International Conference on Very Mining Frequent Itemsets,” “International Journal for Computer Large Data Bases, pp. 487-499, 1994 Science Engineering and Technology IJCSET, March 2012,Vol 2, [3] J.H. Chang & W.S. Lee, “A sliding window method for finding recently Issue 3,1012-1015. frequent itemsets over online data streams’, Journal of Information Science and Engineering, 20(4), 2004, pp. 753–762 [4] Y. Zhu & D. Shasha, “Stat Stream: statistical monitoring of thousands of data streams in real time”, Proc. 28th Conf. on Very Large Data K Jothimanireceived her Bachelor degree in Computer Science from Bases, Hong Kong, China, 2002, pp. 358–369. Bharathidasan University, Trichy in 2003. She received her Master degree in [5] G.S. Manku& R. Motwani, “Approximate frequency counts over data Computer Applications in 2008. She pursued her Master of Philosophy in streams”, Proc. 28th Conf. on Very Large Data Bases, Hong Kong, Computer Science in the year 2009 from Vinayaka Missions University, China, 2002, pp. 346–357. Salem. Currently she is a research scholar of the Department of Computer [6] J.H. Chang & W.S. Lee, “A sliding window method for finding recently Science, NGM College under Bharathiyar University, Coimbatore. She had frequent itemsets over online data streams”, Journal of Information five years of experience in the computer field in technical as well as science and Engineering, 20(4), 2004, pp. 753–762. non-technical. She is a life member of Indian Society for Technical [7] M.N. Garofalakis, J. Gehrke, & R. Rastogi, Querying and mining data Education from the year 2009. Also she is active member of Computer streams: you only get one look (A Tutorial), Proc. 2002 ACM Society of India (CSI). She has published more than ten papers in SIGMOD Conf. on Management of Data, Madison, Wisconsin, 2002, international and national conferences including international journals. Her p. 635. area of interests includes Data mining, Knowledge Engineering and Image [8] Y. Zhu & D. Shasha, “Stat Stream: statistical monitoring of thousands Processing. of data streams in real time”, Proc. 28th Conf. on Very Large Data Email: jothi1083@yahoo.co.in Bases, Hong Kong, China, 2002, pp. 358–369. . [9] G.S. Manku& R. Motwani, “Approximate frequency counts over data Dr Antony SelvadossThanamaniis presently working as professor and streams”, Proc. 28th Conf. on Very Large Data Bases, Hong Kong, Head, Dept of Computer Science, NGM College, Coimbatore, China, 2002, pp. 346–357. India(affiliated to Bharathiar University, Coimbatore). He has published [10] J.H. Chang & W.S. Lee, “A sliding window method for finding recently many papers in international/national journals and written many books. His frequent itemsets over online data streams”, Journal of Information areas of interest include E-Learning, Software Engineering, Data Mining, science and Engineering, 20(4), 2004, pp. 753–762. Networking, Parallel and Distributed Computing. He has to his credit 25 [11] J. Cheng, Y. Ke, & W. Ng,” Maintaining frequent itemsets over years of teaching and research experience. His current research interests high-speed data streams”, Proc. 10th Pacific-Asia Conf. on include Grid Computing, Cloud Computing, Semantic Web. He is a life Knowledge Discovery and Data Mining, Singapore, 2006, pp.462–467. member of Computer Society of India, Life member of Indian Society for [12] C.K.-S. Leung & Q.I. Khan, “DSTree: a tree structure for the mining Technical Education, Life member of Indian Science Congress, Life member of frequent sets from data streams,” Proc. 6th IEEE Conf. on Data of Computer Science, Teachers Associates, Newyork. Mining, Hong Kong, China, 2006, pp. 928–932. Email:-selvdoss@yahoo.com [13] B. Mozafari, H. Thakkar, & C. Zaniolo, “Verifying and mining frequent patterns from large windows over data streams”, Proc. 24th Conf. on Data Engineering, Mexico, 2008, pp. 179–188. [14] K.-F. Jea& C.-W. Li, “Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach, Expert Systems with Applications”, 36(10), 2009, pp. 12323–12331. [15] F. Bodon, “A fast APRIORI implementation”, Proc. ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), 2003. [16] N. Jiang and L. Gruenwald, “Research Issues in Data Stream Association Rule Mining”. In SIGMOD Record, Vol. 35, No. 1, Mar. 2006. [17] Frequent Itemset Mining Implementations Repository (FIMI).Available: http://fimi.cs.helsinki.fi/ All Rights Reserved © 2012 IJARCET 390