SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
ISSN: 2277 – 9043
                              International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                             Volume 1, Issue 5, July 2012



            Concise Representation of Multi-level
         Association Rules using MinMaxExact Rules
                                       R. Vijaya Prakash, A. Govardhan, SSVN.Sarma


                                                                           In this paper, we present a Reliable basis representation of
   Abstract— Association Rule mining plays an important role             non-redundant Association Rules. We then look into this
in the discovery of knowledge and information. Association               hierarchical redundancy and propose an approach from
Rule mining discovers huge number of rules for any dataset for           which more non-redundant rules can be derived. We use the
different support and confidence values, among this many of
them are redundant, especially in the case of multi-level
                                                                         same definition of non-redundant rules in single level
datasets. Mining non-redundant Association Rules in                      datasets, but to this definition we add a requirement that
multi-level dataset is a big concern in field of Data mining. In         considers the different levels of the item(s) in determining the
this paper, we present a definition for redundancy and a concise         redundancy rule. By doing so, more redundant Association
representation called Reliable Exact basis for representing              Rules can be eliminated. We also show that it is possible to
non-redundant Association Rules from multi-level datasets. The           derive all of the Association Rules, without lose of
given non-redundant Association Rules are loss less
representation for any datasets.
                                                                         information.
                                                                            The paper is organized as follows. Section 2 briefly
  Index Terms—Association Rules, Itemsets, Non-Redundant                 discusses some related work. In Section 3, we first discuss
Reliable Rules.                                                          the algorithms for generating frequent patterns in multilevel
                                                                         datasets and then we present the definition of hierarchical
                                                                         redundancy and introduce our approach for deriving
                    I.   INTRODUCTION                                    multilevel non-redundant exact basis rules. In Section 4, we
   The huge amount of the extracted rules is a big problem for           discuss the redundancy in Association Rules and present a
Association Rule mining. Especially, many of the extracted               definition to redundant rules. Experiments and results are
rules are considered redundant since they produce the same               presented in Section 5. Finally, Section 6 concludes the
meaning to the user or extracted rules can be replaced by                paper.
other rules. Many efforts have been made on reducing the
size of the extracted rule set. There are number of
representations of frequent patterns have been proposed, one                                 II.   RELATED WORK
of them, is the closed itemsets, is of particular interest as they          The approaches proposed in [13, 18] make use of the
can be applied for generating non-redundant rules [10,12,18].            closure of the Galois connection [4] to extract non-redundant
The use of frequent closed itemsets presents a clear promise             rules from frequent closed itemsets instead of from frequent
to reduce the number of extracted rules [13, 17, 19].                    itemsets. One difference between the two approaches is the
   However, these approaches have only dealt with                        definition of redundancy. The approach proposed in [18]
redundancy in single level datasets. Multi-level datasets in             extracts the rules with shorter antecedent and shorter
which the items are not all at the same concept level contain            consequent as well among rules which have the same
information at different abstract levels. The approaches used            confidence, while the method proposed in [13] defines that
to find frequent itemsets in single level datasets miss                  the non-redundant rules are those which have minimal
information, as they only look at one level in the dataset.              antecedents and maximal consequents. The definition
Thus techniques that consider all the levels are needed [6, 7            proposed in [17] is similar to that of [13]. However, the
8,9].                                                                    requirement to redundancy is relaxed, and the lesser
   However, rules derived from multi-level datasets can have             requirement makes more rules to be considered redundant
the same issues with redundancy as those from a single level             and thus eliminated. Most importantly, [17] proved that the
dataset. While approaches used to remove redundancy in                   elimination of such redundant rules increase the belief to the
single level datasets [13, 17, 19] can be adapted for use in one         extracted rules and the capacity of the extracted
rule at a given level gives the same information as another              non-redundant rules for solving problems. However, the
rule at a different level.                                               work mentioned above has only focused on datasets where all
                                                                         items are at the same concept level. Thus they do not need to
   Manuscript received June 15, 2012.                                    consider redundancy that can occur when there is a hierarchy
    R. Vijaya Prakash, Department of Informatics, Kaktiya University,    among the items.
(e-mail: vijprak@hotmail.com),Warangal, India, 9951332996
   Dr. A Govardhan, Department of Computer Science and Engineering ,        A multi-level dataset is the one which has an implicit
JNT University, Hyderabad, India, (e-mail: givardhan_cse@yahoo.co.in).   taxonomy or concept tree, like shown in Figure 1. The items
   Prof. SSVN.Sarm, Department of Computer Science and Engineering ,     in the dataset exist at the lowest concept level but are part of a
Vaagdevi College of Engineering, Warangal, India, (e-mail:               hierarchical structure and organization.
ssvn.sarma@gmail.com).

                                                                                                                                        44
                                                  All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                  International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                 Volume 1, Issue 5, July 2012

                                          Food
                                                                                     multi-level dataset. It is designed for use on single level
                                                                                     datasets. But, it has been adapted for multi-level datasets.
                                                                                        One adaptation of Apriori to multi-level datasets is the
                                                                                     ML_T2L1 algorithm [5, 6]. The ML_T2L1 algorithm uses a
                                                                                     transaction table that has the hierarchy information encoded
                                                                                     into it. Each level in the dataset is processed individually.
                 Milk                     Bread                     Fruit            Firstly, level 1 (the highest level in the hierarchy) is analyzed
                                                                                     for large 1-itemsets using Apriori. The list of level 1 large 1-
                                                                                        itemsets is then used to filter and prune the transaction
                                                                                     dataset of any item that does not have an ancestor in the level
                                                                                     1 large 1-itemset list and remove any transaction which has
   2 percent            Skimmed   White           Wheat     Apple           Banana   no frequent items (thus contains only infrequent items when
                                                                                     assessed using the level 1 large 1-itemset list). From the level
         Figure 1 A Simple example of product taxonomy                               1 large 1-itemset list, level 1 large 2-itemsets are derived
                                                                                     (using the filter dataset). Then level 1 large 3-itemsets are
   Thus for level of the taxonomy but it also belongs to the                         derived and so on, until there are no more frequent itemsets to
example, ‘Old Mills’ is an item at the lowest higher concept                         discover at level 1. Since ML_T2L1 defines that only the
category of ‘bread’ and also the more refined category ‘white                        items that are descendant from frequent items at level 1
bread’.                                                                              (essentially they must descend from level 1 large 1-itemsets)
   Because of the hierarchical nature of a multi-level dataset,                      can be frequent themselves, the level 2 itemsets are derived
a new approach to finding frequent itemsets for multi-level                          from the filtered transaction table. For level 2, the large
datasets has to be considered. Work has been done in                                 1-itemsets are discovered, from which the large 2-itemsets
adapting approaches originally made for single level datasets                        are derived and then large 3-itemsets etc. After all the
into techniques usable on multi-level datasets. The paper in                         frequent itemsets are discovered at level 2, the level 3 large
[5][7] shows one of the earliest approaches proposed to find                         1-itemsets are discovered (from the same filtered dataset) and
frequent itemsets in multi-level datasets and later was                              so on. ML_T2L1 repeats until either all levels are searched
revisited in [6]. This work primarily focused on finding                             using Apriori or no large 1-itemsets are a found at a level.
frequent itemsets at each of the levels in the dataset and did                          As the original work shows [5,6], ML_T2L1 does not find
not focus heavily on cross-level itemsets (those itemsets that                       cross-level frequent itemsets. We have added the ability for it
are composed of items from two or more different levels).                            to do this. At each level below 1 (so starting at level 2) when
Referring to Figure 1 for an example, the frequent itemset                           large 2-itemsets or later are derived the Apriori algorithm is
{'Dairyland-2%-milk', 'white-bread'} is a cross-level itemset                        not restricted to just using the large n-1-itemsets at the current
as the first item is from the lowest level, while the second                         level, but can generate combinations using the large itemsets
item is from a different concept level. In fact the cross-level                      from higher levels. The only restrictions on this are that the
idea was an addition to the work being proposed. Further                             derived frequent itemset(s) can not contain an item that has
work proposed an approach which included finding                                     an ancestor-descendant relationship with another item within
cross-level frequent itemsets [15]. This later work also                             the same itemset and that the minimum support threshold
performs more pruning of the dataset to make finding the                             used is that of the current level being processed (which is
frequent itemsets more efficient. However, even with all this                        actually the lowest level in the itemset).
work the focus has been on finding the frequent itemsets as                             A second, more recent adaptation of Apriori for use in
efficiently as possible and the issue of quality and/or                              multi-level datasets is a top-down progressive deepening
redundancy in single level datasets. Some brief work                                 method by [15]. This approach was developed to find
presented by [6] discusses removing rules which are                                  level-crossing Association Rules by extending existing
hierarchically redundant, but it relies on the user giving an                        multi-level mining techniques and uses reduced support and
expected confidence variation margin to determine                                    refinement of the transaction table at every hierarchy level.
redundancy. There appears to be a void in dealing with                               This algorithm works very similarly to ML_T2L1 presented
hierarchical redundancy in Association Rules derived from                            previously in that it uses a transaction table which has the
multi-level datasets. This work attempts to fill that void and                       hierarchy encoded into it and each level is processed
show an approach to deal with hierarchical redundancy                                individually, one at a time. Initially, level 1 is processed,
without losing any information.                                                      followed by level 2, 3 and so on until the lowest level is
                                                                                     reached and processed, or a level generates no large
                                                                                     1-itemsets. At each level, the large 1-itemsets are first derived
               III.     MINING FREQUENT PATTERNS                                     and are then used to filter / prune[11] the transaction table (as
   From the beginning of Association Rule mining in [1][3],                          described for ML_T2L1). This filtering happens at every
the first step has always been to find the frequent patterns or                      level, not just level 1, like in ML_T2L1. Then large
itemsets. The simplest way to do this is through the use of the                      2-itemsets, 3-itemsets etc are derived from the filtered table.
Apriori algorithm [2]. However, Apriori is not designed to                           When it comes to level 2 and lower, the itemsets are not
work on extracting frequent itemsets at multiple levels in a                         restricted to just the current level, but can include itemsets

                                                                                                                                                    45
                                                          All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                              International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                             Volume 1, Issue 5, July 2012

from large itemset lists of higher levels. This is how level         multiple levels and include cross-level rules (due to cross
crossing Association Rules will be found. For the itemsets           level frequent itemsets). The ReliableExactRule approach
that span multiple levels, the minimum support threshold of          can remove redundant rules, but as we will show, it does not
the lowest level in the itemset is used as the threshold to          remove hierarchy redundancy. The rules given in Table 4 are
determine whether the itemset is frequent / large.                   derived from the closed itemsets and generators in Table 3
                                                                     when the minimum confidence threshold is set to 0.5 or 50%
                                                                     (Table 4).
     IV.    GENERATION OF NON-REDUNDANT                                Table 3. Frequent closed itemsets and generators derived from
             MULTI-LEVEL ASSOCIATION RULES                                           the frequent itemsets in Table 2.
   The use of frequent itemsets as the basis for Association                 Closed Itemsets          Generators
Rule mining often results in the generation of a large number                [1-*-*]                  [1-*-*]
of rules. This is a widely recognized problem. More recent                   [1-1-*]                  [1-1-*]
work has demonstrated that the use of closed itemsets and                    [1-1-1]                  [1-1-1]
generators can reduce the number of rules generated [14,16,                  [1-*-*, 2-2-*]           [2-2-*]
17,18,20]. This has helped to greatly reduce redundancy in                   [2-*-*, 1-1-*]           [2-*-*, 1-1-*]
the rules derived from single level datasets. Despite this,                  [1-1-*, 1-2-*]           [1-2-*]
redundancy still exists in the rules generated from multi-level              [1-1-*, 2-2-*]           [2-2-*]
datasets even when using some of the methods designed to                     [1-*-*, 2-2-1]           [2-2-1]
remove redundancy. This redundancy we call hierarchical                      [2-*-*, 1-1-1]           [2-*-*, 1-1-1]
redundancy. Here in this section we first introduce                          [1-2-*, 1-1-1]           [1-2-*, 1-1-1]
hierarchical redundancy in multi-level datasets and then we                  [1-*-*, 2-1-*, 2-2-*]    [2-1-*]
detail our work to remove this redundancy without losing                     [2-*, 1-1-*, 1-2-*]      [2-*-*, 1-2-*]
information.                                                                 [1-1-*, 1-2-*, 2-2-*]    [1-2-*, 2-2-*]
                                                                             [1-1-*, 2-1-*, 2-2-*]    [2-1-*]
   A. Hierarchical Redundancy                                                [1-*-*, 2-1-1, 2-2-*]    [2-1-1]
Whether a rule is interesting and/or useful is usually                       [1-1-*, 2-1-1, 2-2-*]    [2-1-1]
determined through the support and confidence values that it                 [1-1-*, 2-2-1, 1-2-*]    [2-2-1]
has. However, this does not guarantee that all of the rules that             [2-1-*, 1-1-1, 2-2-*]    [2-1-*] [2-2-*, 1-1-1]
have a high enough support and confidence actually convey                    [2-2-*, 1-1-1, 2-1-1]    [2-1-1] [2-2-*, 1-1-1]
new information. Following is an example transaction table
for a multi-level dataset (Table 1).                                    Table 4. Exact basis Association Rules derived from closed
                                                                                    itemsets and generators in Table 3.
           Table 1. Simple multi-level transaction dataset.                      No
                                                                                        Rule                          Supp
      Transaction ID Items                                                          .
      1                 [1-1-1, 1-2-1, 2-1-1, 2-2-1]                              1     [2-2-*] ==> [1-*-*]           0.571
      2                 [1-1-1, 2-1-1, 2-2-2, 3-2-3]                              2     [1-2-*] ==> [1-1-*]           0.571
      3                 [1-1-2, 1-2-2, 2-2-1, 4-1-1]                              3     [2-2-*] ==> [1-1-*]           0.571
      4                 [1-1-1, 1-2-1]                                            4     [2-2-1] ==> [1-*-*]           0.428
      5                 [1-1-1, 1-2-2, 2-1-1, 2-2-1, 4-1-3]                       5     [2-1-*] ==> [1-*-*, 2-2-*] 0.428
      6                 [1-1-3, 3-2-3, 5-2-4]                                     6     [2-1-*] ==> [1-1-*, 2-2-*] 0.428
      7                 [1-3-1, 2-3-1]                                            7     [2-1-1] ==> [1-*-*, 2-2-*] 0.428
      8                 [3-2-3, 4-1-1, 5-2-4, 7-1-3]                              8     [2-1-1] ==> [1-1-*, 2-2-*] 0.428
   This simple multi-level transactional dataset has 3 levels                     9     [2-2-1] ==> [1-1-*, 1-2-*] 0.428
with each item belonging to the lowest level. The item ID in                     10 [2-1-*] ==> [1-1-1, 2-2-*] 0.428
the table store/holds the hierarchy information for each item.                   11 [2-2-*, 1-1-1] ==> [2-1-*] 0.428
Thus the item 1-2-1 belongs to the first category at level 1                     12 [2-1-1] ==> [2-2-*, 1-1-1] 0.428
and for level 2 it belongs to the second sub-category of the                     13 [2-2-*, 1-1-1] ==> [2-1-1] 0.428
first level 1 category. Finally at level 3 it belongs to the first   The ReliableExactRule algorithm lists all of the rules (in
subcategory of the parent category at level 2. From this             Table 4) as important and non-redundant. However, we argue
transaction set we use the ML_T2L1 algorithm with the cross          that there are still redundant rules. This type of redundancy is
level add-on and a minimum support value of 4 for level 1            beyond what the ReliableExactRule algorithm was designed
and 3 for levels 2 and 3. We used the ML_T2L1 algorithm to           for. Looking at the rules in Table 4 we claim that rule 4 is
discover frequent itemsets. From these frequent itemsets, the        redundant to rule 1, rule 7 is redundant to rule 5, rule 8 is
closed itemsets and generators are derived (Table 3). The            redundant to rule 6 and rule 12 is redundant to rule 10. For
itemsets, closed itemsets and generators come from all three         example, the item 2-2-1 (from rule 4) is a child of the more
levels.                                                              general/abstract item 2-2-* (from rule 1). Thus rule 4 is in
   Finally from the closed itemsets and generators the               fact a more specific version of rule 1. Because we know that
Association Rules can be generated. In this example we use           rule 1 says 2-2-* is enough to fire the rule with consequent C,
the ReliableExactRule approach presented in [17,18] to               whereas rule 4 requires 2-2-1 to fire with consequent C, any
generate the exact basis rules. The discovered rules are from        item that is a descendant of 2-2-* will cause a rule to fire with

                                                                                                                                     46
                                                 All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                            International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                           Volume 1, Issue 5, July 2012

consequent C. It does not have to be 2-2-1. Thus rule 4 is          Thus the algorithms to extract non-redundant multi-level
more restrictive. Because 2-2-1 is part of 2-2-* having rule 4      rules using either MinmaxExactHR or ReliableExactHR are
does not actually bring any new information to the user, as         given as follows:
the information contained in it is actually part of the             Algorithm 1: MinmaxExactHR()
information contained in rule 1. Thus rule 4 is redundant. We       Input: C: a set of frequent closed itemsets G: a set of minimal
define hierarchical redundancy in exact Association Rules           generators. For g G , g.closure is the closed itemset of g.
through the following definition.                                   Output: A set of non-redundant multilevel rules.
   Definition 1: Let R1 = X1 => Y and R2 = X2 => Y be two           1. MinMaxExact := 
exact Association Rules, with exactly the same itemset Y as         2. for each k=1 to v
the consequent. Rule R1 is redundant to rule R2 if (1) the          3. for each k-generator g Gk
itemset X1 is made up of items where at least one item in X1        4. nonRedundant = true
is descendant from the items in X2 and (2) the itemset X2 is        5. if g g closure
entirely made up of items where at least one item in X2 is an       6. for all g′  G
ancestor of the items in X1 and (3) the other non-ancestor          7. if (g ′ ≠ g)
items in X2 are all present in itemset X1.                          8. if ( g′ is ancestor set of g ) and ((c′ =c) or ( g= g ′)) and
                                                                                    (g′ is not ancestor set of c′)
From this definition, if for an exact Association Rule X1 =>
                                                                    9. then nonRedundant = false
Y1 there does not exist any other rule X2 => Y2 such that at        10. break
least one item in X1 shares an ancestor-descendant                  11. end if
relationship with X2 containing the ancestor(s) and all other       12. end if
items X2 are present in X1, then X1 => Y1 is a                      13. end for
non-redundant rule. To test for redundancy, we take this            14. if nonRedundant = true
definition and add another condition for a rule to be               15. insert { (g c), g.supp} in MinMaxExact
                                                                    16. end if
considered valid. A rule X => Y is valid if it has no
                                                                    17. end if
ancestor-descendant relationship between any items in               18. end for
itemsets X and Y. Thus for example 1-2-1 => 1-2-* is not a          19. end for
valid rule, but 1-2-1 => 1-1-3 is a valid rule. If this condition   20. return MinMaxExact
is not met by any rule X2 => Y2 when testing to see if X1 =>        Algorithm 2: ReliableExactHR()
Y1 is redundant to X2 => Y2, then X1 => Y1 is a                     Input: C: a set of frequent closed itemsets G: a set of minimal
non-redundant rule as X2 => Y2 is not a valid rule.Submit           generators. For g G, g.closure is the closed itemset of g.
your manuscript electronically for review.                          Output: A set of non-redundant multilevel rules.
                                                                    1. ReliableExact := 
   B. Generating Exact Basis Rules
                                                                    2. for all c C
As previous work has shown [14, 17, 18] using frequent              3. for all g Gc
closed itemsets in the generation of Association Rules can          4. nonRedundant = false
reduce the quantity of discovered rules. Because we wish to         5. if c C such that c  c and c g Gc , we have
remove redundancy on top of the redundancy already being                       (c or c)g)  g)
removed, our approach uses the closed itemsets and                  6. then nonRedundant = true
generators to discover the non-redundant rules. [14, 17, 18]        7. else
have both proposed condensed/more concise bases to                  8. nonRedundant = false
represent non-redundant exact rules. Exact rules refer to rules     9. break
whose confidence is 1. The proposed approach will be                10. end if
extended to other rules (i.e., so called approximate rules).        11. for all g G
The following definitions outline these two bases:                  12. if g ′ ≠ g
   Definition 2: For the Min-MaxExact (MME) basis, C is the         13. if ( g′ is ancestor set of g ) and (c=c or g=g) and ( g′ is
set of the discovered frequent closed itemsets. For each                  not ancestor set of (c or g) and (g′ is not descendant set
closed itemset c in C, Gc is the set of generators for c. From               of (c or g′ )
this the exact basis for min-max is:                                14. then nonRedundant = true
                                               and there exists     15. break
no rules gc where cC, gGc, c  c, g  c and g is          16. end if
descendant set of g, g has no ancestor or descendant of c or     17. end if
g                                                                  18. end for
 Definition 8: (Reliable Exact Basis without Hierarchy              19. if nonRedundant = true
Redundancy) Let C be the set of frequent closed itemsets. For       20. insert { (g c or g, g.supp} in ReliableExact
                                                                    21. end if
each frequent closed itemset c, let Gc be the set of minimal
                                                                    22. end for
generators of c. The Reliable exact basis is:                       23. end for
                                    (c or c g), where           24. return ReliableExact
cC, c c, gGc, and there exists no rules g c where
cc, gc and g is descendant set of g, g has no ancestor or
descendant of c or g.

                                                                                                                                   47
                                              All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                            International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                           Volume 1, Issue 5, July 2012

   C. Deriving Exact Rules from the Exact Basis Rules
   The Min-MaxExact approach and ReliableExact approach
have proven that they can deduce all of the exact rules from                            V.    EVALUATION
their basis set [17]. Comparing with the Min-MaxExact                Experiments were conducted to test and evaluate the
approach and ReliableExact approach, our work results in a           effectiveness of the proposed hierarchically non-redundant
smaller exact basis set by not only removing the redundant           exact basis and to confirm that it is also a lossless basis set.
rules that are removed by the Min-MaxExact approach and              This section presents and details the experiments and their
ReliableExact approach, but also removing the hierarchically         results.
redundant rules. If we can recover all of the hierarchically
redundant rules, then we can derive all the exact rules by           A. Datasets
using the Min-MaxExact or ReliableExact recovery                     We used 6 datasets to test our approach to discover whether it
algorithm. This will ensure that all the exact rules can still be    reduced the size of the exact basis rule set and to test that the
derived and by achieving this, our approach will be a lossless       basis set was lossless, meaning all the rules could be
representation of the exact Association Rules. The following         recovered. These datasets were composed of 10, 20, 50, 200
algorithm is designed to recover the hierarchically redundant        and 500 transactions and are named A to F respectively. The
rules from the exact basis. By adding it to the algorithms used      key statistics for these built datasets are detailed in Table 5.
by Min-MaxExact and ReliableExact to derive the exact rules             Table 5 Obtained Frequent itemsets using Exact Basis
it is then able for the existing ReliableExact recovery                      Dataset MME MMEHR RE                      REHR
algorithm to derive all of the exact rules. This is because our              A           15       10           13      9
algorithm will give them a basis set that includes the
hierarchically redundant rules (which the ReliableExact                      B          106      68           80      58
approach would not have removed in the first place). The                     C          174      134          113     89
basic idea is that, for each exact basis rule, first from                    D          577      429          383     305
generators to construct all possible exact basis rules whose
                                                                             E          450      405          315     287
antecedent is a descendant of the exact basis rule (steps 4 to 7
in Algorithm 3). These rules are potential exact basis rules                 F          725      602          91      80
that might have been eliminated due to the
                                                                                 Fig. 2. Multi Level association Rules
ancestor-descendant relationship. Then check to make sure
these potential rules are valid (steps 8 to 12), finally, from the
potential exact rules to find exact basis rules. These exact
basis rules have been eliminated due to the
ancestor-descendant relationship (steps 13 to 18).
Algorithm 3: DeriveExactHR()
Input: Set of exact basis rules denoted as Exact basis, set of
frequent closed itemsets C and generators G.
Output: Set of rules that covers the exact basis and the
hierarchically redundant rules.
1. Recovered := 
2. r Exact basis
3. CandidateBasis:= 
4. for all generator g in G
5. if any of the item x in the antecedent X of rule r: X Y is
the
       ancestor of g.
6. then add all of the possible subsets of g into S
7. end for                                                                                VI. CONCLUSION
8. for all s in S, check every , x X if x doesn’t have a
descendant in s, add x to s to make s a descendant set of X          Redundancy in Association Rules affects the quality of the
9. if s has no ancestors in Y and s has no descendants in Y and      information presented and this affects and reduces the use of
                                                                     the rule set. The goal of redundancy elimination is to improve
for all items i s there are no ancestor-descendant relations
                                                                     the quality, thus allowing them to better solve problems being
with item i s and for all item i Y there are no
                                                                     faced. Our work aims to remove hierarchical redundancy in
ancestor-descendant relation with item i Y                         multi-level datasets, thus reducing the size of the rule set to
10. then insert s Y in CandidateBasis                               improve the quality and usefulness, without causing the loss
11. end if                                                           of any information. We have proposed an approach which
12. end for                                                          removes hierarchical redundancy through the use of frequent
13. for all B D  CandidateBasis                                    closed itemsets and generators. This allows it to be added to
14. if BUD= itemset i C and B = gGi                                other approaches which also remove redundant rules, thereby
15. insert {B D , g.supp} in Recovered                              allowing a user to remove as much redundancy as possible.
16. end if                                                           The next step in our work is to apply this approach to the
17. end for                                                          approximate basis rule set to remove redundancy there. We
18. end for                                                          will also review our work to see if there are other hierarchical
19. return Exactbasis U Recovered
                                                                                                                                   48
                                               All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                   International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                  Volume 1, Issue 5, July 2012

redundancies in the basis rule sets that should be removed                         Lal Nehru University, Delhi and Ph.D from JNT University in 2003. He is
                                                                                   very eminent person in the field of teaching he guided 4 Ph.D scholars.
and will investigate what should and can be done to further
improve the quality of multi-level Association Rules.                                 Prof. SSVN. Sarma is an eminent professor in Mathematics, He Worked
                                                                                   in Department of Mathematics and Department of Informatics, Kakatiya
                                                                                   University over period of 30 years. He is well known person in the field of
                                                                                   Mathematics and Computer Science. He guided many Ph.D Scholars
                                REFERENCES
[1]    Bayardo, R.J., Agrawal, R. & Gunopulos, D.(2000). Constraint-based
       rule mining in large,dense databases. Data Mining and Knowledge
       Discovery, 4: 217-240
[2]    Berry, M.J.A. & Linoff, G.S. (1997). Data Mining Techniques for
       Marketing Sales and Customer Support. John Wiley and Sons
[3]    [3] Brin, S., Motwani, R., Ullman, J.D. & Tsur, S. (1997). Dynamic
       itemset counting and implication rules for market basket data. In:
       Proceedings of the 1997 ACM SIGMOD Conference, 255-264
[4]    [4] Ganter, B. & Wille, R. (1999). Formal Concept Analysis:
       Mathematical Foundations. Springer-Verlag
[5]    [5] Han, J. & Fu, Y. (1995). Discovery of multiple-level Association
       Rules from large databases. In: Proceedings of the 21st International
       Conference on Very Large Databases, 420-431
[6]    [6] Han, J. & Fu, Y. (1999). Mining multiple-level Association Rules
       in large databases. IEEE Transactions on Knowledge and Data
       Engineering, 11 (5): 798-805
[7]    [7] Han, J. & Fu, Y. (2000). Mining multiple-level Association Rules
       in large databases. IEEE Transactions on Knowledge and Data
       Engineering, 11: 798-804
[8]    [8] Hong, T.P., Lin, K.Y. & Chien, B.C. (2003). Mining fuzzy
       multiple-level Association Rules from quantitative data. Applied
       Intelligence, 18 (1): 79-90
[9]    [9] Kaya, M. & Alhajj, R. (2004). Mining multi-cross-level fuzzy
       weighted Association Rules. In: the 2nd International IEEE Conference
       on Intelligent Systems, 225-230
[10]   [10] Kryszkiewicz, M., Rybinski, H. & Gajek, M. (2004). Dataless
       transitions between concise representations of frequent patterns.
       Journal of Intelligent Information Systems, 22 (1): 41-70
[11]   [11] Ng, R.T., Lakshmanan, V., Han, J. & Pang, A. (1998). Exploratory
       mining and pruning otimizations of constrained Association Rules. In:
       Proceedings of the SIGMOD conference, 13-24
[12]   [12] Pasquier, N., Bastide, Y., Taouil, R. & Lakhal, L. (1999). Efficient
       mining of association rultes using closed itemset lattices. Information
       Systems, 24 (1): 25-46
[13]   [13] Pasquier, N., Taouil, R., Bastide, Y., Stumme, G. & Lakhal, L.
       (2005). Generating a condensed representation for Association Rules.
       Journal of Intelligent Information Systems, 24 (1): 29-60
[14]   [14] Srikant, R., Vu, Q. & Agrawal, R. (1997). Mining Association
       Rules with item constraints. In: Proceedings of the KDD Conference,
       67-73
[15]   [15] Thakur, R.S., Jain, R.C. & Pardasani, K.P. (2006). Mining
       level-crossing Association Rules from large databases. Journal of
       Computer Science, 76-81
[16]   [16] Wille, R. (1982). Restructuring lattices theory: An approach based
       on hierarchies of concepts. In: Rival, I. (ed.), Ordered Sets.
       Dordrecht-Boston
[17]   [17] Xu, Y. & Li, Y. (2007). Generating concise Association Rules. In:
       Proceedings of the 16th ACM Conference on Conference on
       Information and Knowledge Management (CIKM07), 781-790
[18]   [18] Zaki, M.J. (2000). Generating nonredundent Association Rules.
       In: Proceedings of the KDD Conference, 34-43
[19]   [19] Zaki, M.J. (2004). Mining non-redundant Association Rules. Data
       Mining and Knowledge Discovery, 9: 223-248
[20]   [20] Ziegler, C.N., McNee, S.M., Konstan, J.A & Lausen, G. (2005).
       Improving recommendation lists through topic diversification. In:
       Proceed. of the 14th Inter. World Wide Web Conference, 22-32


   R. Vijaya Prakash is presently working as Associate
Professor in Department of Computer science &
Engineering, SR Engineering College., Pursuing Ph.D
from Kakatiya University, Warangal. He completed his
MCA from Kakatiya University and M.Tech (IT) form Punjabi University,
Patiala.
   Dr. A. Govardhan is working as professor in
Department of Computer Science & Engineering, JNT
University, Hyderabad. He Received his B.Tech in 1992
from Osmain University, M.Tech in 1994 from Jawahar

                                                                                                                                                          49
                                                         All Rights Reserved © 2012 IJARCSEE

Weitere ähnliche Inhalte

Was ist angesagt?

IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep LearningIRJET Journal
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsIRJET Journal
 
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules IJECEIAES
 
Classification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningClassification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningSejal Naidu
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataIOSR Journals
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data StreamIRJET Journal
 
An efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsAn efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsredpel dot com
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingiaemedu
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering SystemIRJET Journal
 
Approaching Rules Induction CN2 Algorithm in Categorizing of Biodiversity
Approaching Rules Induction CN2 Algorithm in Categorizing of BiodiversityApproaching Rules Induction CN2 Algorithm in Categorizing of Biodiversity
Approaching Rules Induction CN2 Algorithm in Categorizing of Biodiversityijtsrd
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
 
Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010girivaishali
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...TELKOMNIKA JOURNAL
 
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...ijcsa
 

Was ist angesagt? (19)

IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep Learning
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy Algorithms
 
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
 
Classification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningClassification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep Learning
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted data
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data Stream
 
An efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsAn efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of things
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
 
31 34
31 3431 34
31 34
 
Poster
PosterPoster
Poster
 
Approaching Rules Induction CN2 Algorithm in Categorizing of Biodiversity
Approaching Rules Induction CN2 Algorithm in Categorizing of BiodiversityApproaching Rules Induction CN2 Algorithm in Categorizing of Biodiversity
Approaching Rules Induction CN2 Algorithm in Categorizing of Biodiversity
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...
 
Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
winbis1005
winbis1005winbis1005
winbis1005
 
Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...Proposing a new method of image classification based on the AdaBoost deep bel...
Proposing a new method of image classification based on the AdaBoost deep bel...
 
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
 

Ähnlich wie 44 49

Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...IAEME Publication
 
Asif nosql
Asif nosqlAsif nosql
Asif nosqlAsif Ali
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesIOSR Journals
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Database Management Systems ( Dbms )
Database Management Systems ( Dbms )Database Management Systems ( Dbms )
Database Management Systems ( Dbms )Patty Buckley
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...cscpconf
 
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache HadoopA Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache HadoopIJTET Journal
 
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
 

Ähnlich wie 44 49 (20)

Gr2411971203
Gr2411971203Gr2411971203
Gr2411971203
 
Nagaraju
NagarajuNagaraju
Nagaraju
 
Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...
 
RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.
 
Asif nosql
Asif nosqlAsif nosql
Asif nosql
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining Methodologies
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
 
Database Management Systems ( Dbms )
Database Management Systems ( Dbms )Database Management Systems ( Dbms )
Database Management Systems ( Dbms )
 
K355662
K355662K355662
K355662
 
K355662
K355662K355662
K355662
 
Ae32208215
Ae32208215Ae32208215
Ae32208215
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
 
395 404
395 404395 404
395 404
 
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache HadoopA Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
 
Dbms basic nots
Dbms basic notsDbms basic nots
Dbms basic nots
 
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
 

Mehr von Ijarcsee Journal (20)

130 133
130 133130 133
130 133
 
122 129
122 129122 129
122 129
 
116 121
116 121116 121
116 121
 
109 115
109 115109 115
109 115
 
104 108
104 108104 108
104 108
 
99 103
99 10399 103
99 103
 
93 98
93 9893 98
93 98
 
88 92
88 9288 92
88 92
 
82 87
82 8782 87
82 87
 
78 81
78 8178 81
78 81
 
73 77
73 7773 77
73 77
 
65 72
65 7265 72
65 72
 
58 64
58 6458 64
58 64
 
52 57
52 5752 57
52 57
 
46 51
46 5146 51
46 51
 
41 45
41 4541 45
41 45
 
36 40
36 4036 40
36 40
 
28 35
28 3528 35
28 35
 
24 27
24 2724 27
24 27
 
19 23
19 2319 23
19 23
 

Kürzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Kürzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

44 49

  • 1. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 5, July 2012 Concise Representation of Multi-level Association Rules using MinMaxExact Rules R. Vijaya Prakash, A. Govardhan, SSVN.Sarma  In this paper, we present a Reliable basis representation of Abstract— Association Rule mining plays an important role non-redundant Association Rules. We then look into this in the discovery of knowledge and information. Association hierarchical redundancy and propose an approach from Rule mining discovers huge number of rules for any dataset for which more non-redundant rules can be derived. We use the different support and confidence values, among this many of them are redundant, especially in the case of multi-level same definition of non-redundant rules in single level datasets. Mining non-redundant Association Rules in datasets, but to this definition we add a requirement that multi-level dataset is a big concern in field of Data mining. In considers the different levels of the item(s) in determining the this paper, we present a definition for redundancy and a concise redundancy rule. By doing so, more redundant Association representation called Reliable Exact basis for representing Rules can be eliminated. We also show that it is possible to non-redundant Association Rules from multi-level datasets. The derive all of the Association Rules, without lose of given non-redundant Association Rules are loss less representation for any datasets. information. The paper is organized as follows. Section 2 briefly Index Terms—Association Rules, Itemsets, Non-Redundant discusses some related work. In Section 3, we first discuss Reliable Rules. the algorithms for generating frequent patterns in multilevel datasets and then we present the definition of hierarchical redundancy and introduce our approach for deriving I. INTRODUCTION multilevel non-redundant exact basis rules. In Section 4, we The huge amount of the extracted rules is a big problem for discuss the redundancy in Association Rules and present a Association Rule mining. Especially, many of the extracted definition to redundant rules. Experiments and results are rules are considered redundant since they produce the same presented in Section 5. Finally, Section 6 concludes the meaning to the user or extracted rules can be replaced by paper. other rules. Many efforts have been made on reducing the size of the extracted rule set. There are number of representations of frequent patterns have been proposed, one II. RELATED WORK of them, is the closed itemsets, is of particular interest as they The approaches proposed in [13, 18] make use of the can be applied for generating non-redundant rules [10,12,18]. closure of the Galois connection [4] to extract non-redundant The use of frequent closed itemsets presents a clear promise rules from frequent closed itemsets instead of from frequent to reduce the number of extracted rules [13, 17, 19]. itemsets. One difference between the two approaches is the However, these approaches have only dealt with definition of redundancy. The approach proposed in [18] redundancy in single level datasets. Multi-level datasets in extracts the rules with shorter antecedent and shorter which the items are not all at the same concept level contain consequent as well among rules which have the same information at different abstract levels. The approaches used confidence, while the method proposed in [13] defines that to find frequent itemsets in single level datasets miss the non-redundant rules are those which have minimal information, as they only look at one level in the dataset. antecedents and maximal consequents. The definition Thus techniques that consider all the levels are needed [6, 7 proposed in [17] is similar to that of [13]. However, the 8,9]. requirement to redundancy is relaxed, and the lesser However, rules derived from multi-level datasets can have requirement makes more rules to be considered redundant the same issues with redundancy as those from a single level and thus eliminated. Most importantly, [17] proved that the dataset. While approaches used to remove redundancy in elimination of such redundant rules increase the belief to the single level datasets [13, 17, 19] can be adapted for use in one extracted rules and the capacity of the extracted rule at a given level gives the same information as another non-redundant rules for solving problems. However, the rule at a different level. work mentioned above has only focused on datasets where all items are at the same concept level. Thus they do not need to Manuscript received June 15, 2012. consider redundancy that can occur when there is a hierarchy R. Vijaya Prakash, Department of Informatics, Kaktiya University, among the items. (e-mail: vijprak@hotmail.com),Warangal, India, 9951332996 Dr. A Govardhan, Department of Computer Science and Engineering , A multi-level dataset is the one which has an implicit JNT University, Hyderabad, India, (e-mail: givardhan_cse@yahoo.co.in). taxonomy or concept tree, like shown in Figure 1. The items Prof. SSVN.Sarm, Department of Computer Science and Engineering , in the dataset exist at the lowest concept level but are part of a Vaagdevi College of Engineering, Warangal, India, (e-mail: hierarchical structure and organization. ssvn.sarma@gmail.com). 44 All Rights Reserved © 2012 IJARCSEE
  • 2. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 5, July 2012 Food multi-level dataset. It is designed for use on single level datasets. But, it has been adapted for multi-level datasets. One adaptation of Apriori to multi-level datasets is the ML_T2L1 algorithm [5, 6]. The ML_T2L1 algorithm uses a transaction table that has the hierarchy information encoded into it. Each level in the dataset is processed individually. Milk Bread Fruit Firstly, level 1 (the highest level in the hierarchy) is analyzed for large 1-itemsets using Apriori. The list of level 1 large 1- itemsets is then used to filter and prune the transaction dataset of any item that does not have an ancestor in the level 1 large 1-itemset list and remove any transaction which has 2 percent Skimmed White Wheat Apple Banana no frequent items (thus contains only infrequent items when assessed using the level 1 large 1-itemset list). From the level Figure 1 A Simple example of product taxonomy 1 large 1-itemset list, level 1 large 2-itemsets are derived (using the filter dataset). Then level 1 large 3-itemsets are Thus for level of the taxonomy but it also belongs to the derived and so on, until there are no more frequent itemsets to example, ‘Old Mills’ is an item at the lowest higher concept discover at level 1. Since ML_T2L1 defines that only the category of ‘bread’ and also the more refined category ‘white items that are descendant from frequent items at level 1 bread’. (essentially they must descend from level 1 large 1-itemsets) Because of the hierarchical nature of a multi-level dataset, can be frequent themselves, the level 2 itemsets are derived a new approach to finding frequent itemsets for multi-level from the filtered transaction table. For level 2, the large datasets has to be considered. Work has been done in 1-itemsets are discovered, from which the large 2-itemsets adapting approaches originally made for single level datasets are derived and then large 3-itemsets etc. After all the into techniques usable on multi-level datasets. The paper in frequent itemsets are discovered at level 2, the level 3 large [5][7] shows one of the earliest approaches proposed to find 1-itemsets are discovered (from the same filtered dataset) and frequent itemsets in multi-level datasets and later was so on. ML_T2L1 repeats until either all levels are searched revisited in [6]. This work primarily focused on finding using Apriori or no large 1-itemsets are a found at a level. frequent itemsets at each of the levels in the dataset and did As the original work shows [5,6], ML_T2L1 does not find not focus heavily on cross-level itemsets (those itemsets that cross-level frequent itemsets. We have added the ability for it are composed of items from two or more different levels). to do this. At each level below 1 (so starting at level 2) when Referring to Figure 1 for an example, the frequent itemset large 2-itemsets or later are derived the Apriori algorithm is {'Dairyland-2%-milk', 'white-bread'} is a cross-level itemset not restricted to just using the large n-1-itemsets at the current as the first item is from the lowest level, while the second level, but can generate combinations using the large itemsets item is from a different concept level. In fact the cross-level from higher levels. The only restrictions on this are that the idea was an addition to the work being proposed. Further derived frequent itemset(s) can not contain an item that has work proposed an approach which included finding an ancestor-descendant relationship with another item within cross-level frequent itemsets [15]. This later work also the same itemset and that the minimum support threshold performs more pruning of the dataset to make finding the used is that of the current level being processed (which is frequent itemsets more efficient. However, even with all this actually the lowest level in the itemset). work the focus has been on finding the frequent itemsets as A second, more recent adaptation of Apriori for use in efficiently as possible and the issue of quality and/or multi-level datasets is a top-down progressive deepening redundancy in single level datasets. Some brief work method by [15]. This approach was developed to find presented by [6] discusses removing rules which are level-crossing Association Rules by extending existing hierarchically redundant, but it relies on the user giving an multi-level mining techniques and uses reduced support and expected confidence variation margin to determine refinement of the transaction table at every hierarchy level. redundancy. There appears to be a void in dealing with This algorithm works very similarly to ML_T2L1 presented hierarchical redundancy in Association Rules derived from previously in that it uses a transaction table which has the multi-level datasets. This work attempts to fill that void and hierarchy encoded into it and each level is processed show an approach to deal with hierarchical redundancy individually, one at a time. Initially, level 1 is processed, without losing any information. followed by level 2, 3 and so on until the lowest level is reached and processed, or a level generates no large 1-itemsets. At each level, the large 1-itemsets are first derived III. MINING FREQUENT PATTERNS and are then used to filter / prune[11] the transaction table (as From the beginning of Association Rule mining in [1][3], described for ML_T2L1). This filtering happens at every the first step has always been to find the frequent patterns or level, not just level 1, like in ML_T2L1. Then large itemsets. The simplest way to do this is through the use of the 2-itemsets, 3-itemsets etc are derived from the filtered table. Apriori algorithm [2]. However, Apriori is not designed to When it comes to level 2 and lower, the itemsets are not work on extracting frequent itemsets at multiple levels in a restricted to just the current level, but can include itemsets 45 All Rights Reserved © 2012 IJARCSEE
  • 3. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 5, July 2012 from large itemset lists of higher levels. This is how level multiple levels and include cross-level rules (due to cross crossing Association Rules will be found. For the itemsets level frequent itemsets). The ReliableExactRule approach that span multiple levels, the minimum support threshold of can remove redundant rules, but as we will show, it does not the lowest level in the itemset is used as the threshold to remove hierarchy redundancy. The rules given in Table 4 are determine whether the itemset is frequent / large. derived from the closed itemsets and generators in Table 3 when the minimum confidence threshold is set to 0.5 or 50% (Table 4). IV. GENERATION OF NON-REDUNDANT Table 3. Frequent closed itemsets and generators derived from MULTI-LEVEL ASSOCIATION RULES the frequent itemsets in Table 2. The use of frequent itemsets as the basis for Association Closed Itemsets Generators Rule mining often results in the generation of a large number [1-*-*] [1-*-*] of rules. This is a widely recognized problem. More recent [1-1-*] [1-1-*] work has demonstrated that the use of closed itemsets and [1-1-1] [1-1-1] generators can reduce the number of rules generated [14,16, [1-*-*, 2-2-*] [2-2-*] 17,18,20]. This has helped to greatly reduce redundancy in [2-*-*, 1-1-*] [2-*-*, 1-1-*] the rules derived from single level datasets. Despite this, [1-1-*, 1-2-*] [1-2-*] redundancy still exists in the rules generated from multi-level [1-1-*, 2-2-*] [2-2-*] datasets even when using some of the methods designed to [1-*-*, 2-2-1] [2-2-1] remove redundancy. This redundancy we call hierarchical [2-*-*, 1-1-1] [2-*-*, 1-1-1] redundancy. Here in this section we first introduce [1-2-*, 1-1-1] [1-2-*, 1-1-1] hierarchical redundancy in multi-level datasets and then we [1-*-*, 2-1-*, 2-2-*] [2-1-*] detail our work to remove this redundancy without losing [2-*, 1-1-*, 1-2-*] [2-*-*, 1-2-*] information. [1-1-*, 1-2-*, 2-2-*] [1-2-*, 2-2-*] [1-1-*, 2-1-*, 2-2-*] [2-1-*] A. Hierarchical Redundancy [1-*-*, 2-1-1, 2-2-*] [2-1-1] Whether a rule is interesting and/or useful is usually [1-1-*, 2-1-1, 2-2-*] [2-1-1] determined through the support and confidence values that it [1-1-*, 2-2-1, 1-2-*] [2-2-1] has. However, this does not guarantee that all of the rules that [2-1-*, 1-1-1, 2-2-*] [2-1-*] [2-2-*, 1-1-1] have a high enough support and confidence actually convey [2-2-*, 1-1-1, 2-1-1] [2-1-1] [2-2-*, 1-1-1] new information. Following is an example transaction table for a multi-level dataset (Table 1). Table 4. Exact basis Association Rules derived from closed itemsets and generators in Table 3. Table 1. Simple multi-level transaction dataset. No Rule Supp Transaction ID Items . 1 [1-1-1, 1-2-1, 2-1-1, 2-2-1] 1 [2-2-*] ==> [1-*-*] 0.571 2 [1-1-1, 2-1-1, 2-2-2, 3-2-3] 2 [1-2-*] ==> [1-1-*] 0.571 3 [1-1-2, 1-2-2, 2-2-1, 4-1-1] 3 [2-2-*] ==> [1-1-*] 0.571 4 [1-1-1, 1-2-1] 4 [2-2-1] ==> [1-*-*] 0.428 5 [1-1-1, 1-2-2, 2-1-1, 2-2-1, 4-1-3] 5 [2-1-*] ==> [1-*-*, 2-2-*] 0.428 6 [1-1-3, 3-2-3, 5-2-4] 6 [2-1-*] ==> [1-1-*, 2-2-*] 0.428 7 [1-3-1, 2-3-1] 7 [2-1-1] ==> [1-*-*, 2-2-*] 0.428 8 [3-2-3, 4-1-1, 5-2-4, 7-1-3] 8 [2-1-1] ==> [1-1-*, 2-2-*] 0.428 This simple multi-level transactional dataset has 3 levels 9 [2-2-1] ==> [1-1-*, 1-2-*] 0.428 with each item belonging to the lowest level. The item ID in 10 [2-1-*] ==> [1-1-1, 2-2-*] 0.428 the table store/holds the hierarchy information for each item. 11 [2-2-*, 1-1-1] ==> [2-1-*] 0.428 Thus the item 1-2-1 belongs to the first category at level 1 12 [2-1-1] ==> [2-2-*, 1-1-1] 0.428 and for level 2 it belongs to the second sub-category of the 13 [2-2-*, 1-1-1] ==> [2-1-1] 0.428 first level 1 category. Finally at level 3 it belongs to the first The ReliableExactRule algorithm lists all of the rules (in subcategory of the parent category at level 2. From this Table 4) as important and non-redundant. However, we argue transaction set we use the ML_T2L1 algorithm with the cross that there are still redundant rules. This type of redundancy is level add-on and a minimum support value of 4 for level 1 beyond what the ReliableExactRule algorithm was designed and 3 for levels 2 and 3. We used the ML_T2L1 algorithm to for. Looking at the rules in Table 4 we claim that rule 4 is discover frequent itemsets. From these frequent itemsets, the redundant to rule 1, rule 7 is redundant to rule 5, rule 8 is closed itemsets and generators are derived (Table 3). The redundant to rule 6 and rule 12 is redundant to rule 10. For itemsets, closed itemsets and generators come from all three example, the item 2-2-1 (from rule 4) is a child of the more levels. general/abstract item 2-2-* (from rule 1). Thus rule 4 is in Finally from the closed itemsets and generators the fact a more specific version of rule 1. Because we know that Association Rules can be generated. In this example we use rule 1 says 2-2-* is enough to fire the rule with consequent C, the ReliableExactRule approach presented in [17,18] to whereas rule 4 requires 2-2-1 to fire with consequent C, any generate the exact basis rules. The discovered rules are from item that is a descendant of 2-2-* will cause a rule to fire with 46 All Rights Reserved © 2012 IJARCSEE
  • 4. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 5, July 2012 consequent C. It does not have to be 2-2-1. Thus rule 4 is Thus the algorithms to extract non-redundant multi-level more restrictive. Because 2-2-1 is part of 2-2-* having rule 4 rules using either MinmaxExactHR or ReliableExactHR are does not actually bring any new information to the user, as given as follows: the information contained in it is actually part of the Algorithm 1: MinmaxExactHR() information contained in rule 1. Thus rule 4 is redundant. We Input: C: a set of frequent closed itemsets G: a set of minimal define hierarchical redundancy in exact Association Rules generators. For g G , g.closure is the closed itemset of g. through the following definition. Output: A set of non-redundant multilevel rules. Definition 1: Let R1 = X1 => Y and R2 = X2 => Y be two 1. MinMaxExact :=  exact Association Rules, with exactly the same itemset Y as 2. for each k=1 to v the consequent. Rule R1 is redundant to rule R2 if (1) the 3. for each k-generator g Gk itemset X1 is made up of items where at least one item in X1 4. nonRedundant = true is descendant from the items in X2 and (2) the itemset X2 is 5. if g g closure entirely made up of items where at least one item in X2 is an 6. for all g′  G ancestor of the items in X1 and (3) the other non-ancestor 7. if (g ′ ≠ g) items in X2 are all present in itemset X1. 8. if ( g′ is ancestor set of g ) and ((c′ =c) or ( g= g ′)) and (g′ is not ancestor set of c′) From this definition, if for an exact Association Rule X1 => 9. then nonRedundant = false Y1 there does not exist any other rule X2 => Y2 such that at 10. break least one item in X1 shares an ancestor-descendant 11. end if relationship with X2 containing the ancestor(s) and all other 12. end if items X2 are present in X1, then X1 => Y1 is a 13. end for non-redundant rule. To test for redundancy, we take this 14. if nonRedundant = true definition and add another condition for a rule to be 15. insert { (g c), g.supp} in MinMaxExact 16. end if considered valid. A rule X => Y is valid if it has no 17. end if ancestor-descendant relationship between any items in 18. end for itemsets X and Y. Thus for example 1-2-1 => 1-2-* is not a 19. end for valid rule, but 1-2-1 => 1-1-3 is a valid rule. If this condition 20. return MinMaxExact is not met by any rule X2 => Y2 when testing to see if X1 => Algorithm 2: ReliableExactHR() Y1 is redundant to X2 => Y2, then X1 => Y1 is a Input: C: a set of frequent closed itemsets G: a set of minimal non-redundant rule as X2 => Y2 is not a valid rule.Submit generators. For g G, g.closure is the closed itemset of g. your manuscript electronically for review. Output: A set of non-redundant multilevel rules. 1. ReliableExact :=  B. Generating Exact Basis Rules 2. for all c C As previous work has shown [14, 17, 18] using frequent 3. for all g Gc closed itemsets in the generation of Association Rules can 4. nonRedundant = false reduce the quantity of discovered rules. Because we wish to 5. if c C such that c  c and c g Gc , we have remove redundancy on top of the redundancy already being (c or c)g)  g) removed, our approach uses the closed itemsets and 6. then nonRedundant = true generators to discover the non-redundant rules. [14, 17, 18] 7. else have both proposed condensed/more concise bases to 8. nonRedundant = false represent non-redundant exact rules. Exact rules refer to rules 9. break whose confidence is 1. The proposed approach will be 10. end if extended to other rules (i.e., so called approximate rules). 11. for all g G The following definitions outline these two bases: 12. if g ′ ≠ g Definition 2: For the Min-MaxExact (MME) basis, C is the 13. if ( g′ is ancestor set of g ) and (c=c or g=g) and ( g′ is set of the discovered frequent closed itemsets. For each not ancestor set of (c or g) and (g′ is not descendant set closed itemset c in C, Gc is the set of generators for c. From of (c or g′ ) this the exact basis for min-max is: 14. then nonRedundant = true and there exists 15. break no rules gc where cC, gGc, c  c, g  c and g is 16. end if descendant set of g, g has no ancestor or descendant of c or 17. end if g 18. end for Definition 8: (Reliable Exact Basis without Hierarchy 19. if nonRedundant = true Redundancy) Let C be the set of frequent closed itemsets. For 20. insert { (g c or g, g.supp} in ReliableExact 21. end if each frequent closed itemset c, let Gc be the set of minimal 22. end for generators of c. The Reliable exact basis is: 23. end for (c or c g), where 24. return ReliableExact cC, c c, gGc, and there exists no rules g c where cc, gc and g is descendant set of g, g has no ancestor or descendant of c or g. 47 All Rights Reserved © 2012 IJARCSEE
  • 5. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 5, July 2012 C. Deriving Exact Rules from the Exact Basis Rules The Min-MaxExact approach and ReliableExact approach have proven that they can deduce all of the exact rules from V. EVALUATION their basis set [17]. Comparing with the Min-MaxExact Experiments were conducted to test and evaluate the approach and ReliableExact approach, our work results in a effectiveness of the proposed hierarchically non-redundant smaller exact basis set by not only removing the redundant exact basis and to confirm that it is also a lossless basis set. rules that are removed by the Min-MaxExact approach and This section presents and details the experiments and their ReliableExact approach, but also removing the hierarchically results. redundant rules. If we can recover all of the hierarchically redundant rules, then we can derive all the exact rules by A. Datasets using the Min-MaxExact or ReliableExact recovery We used 6 datasets to test our approach to discover whether it algorithm. This will ensure that all the exact rules can still be reduced the size of the exact basis rule set and to test that the derived and by achieving this, our approach will be a lossless basis set was lossless, meaning all the rules could be representation of the exact Association Rules. The following recovered. These datasets were composed of 10, 20, 50, 200 algorithm is designed to recover the hierarchically redundant and 500 transactions and are named A to F respectively. The rules from the exact basis. By adding it to the algorithms used key statistics for these built datasets are detailed in Table 5. by Min-MaxExact and ReliableExact to derive the exact rules Table 5 Obtained Frequent itemsets using Exact Basis it is then able for the existing ReliableExact recovery Dataset MME MMEHR RE REHR algorithm to derive all of the exact rules. This is because our A 15 10 13 9 algorithm will give them a basis set that includes the hierarchically redundant rules (which the ReliableExact B 106 68 80 58 approach would not have removed in the first place). The C 174 134 113 89 basic idea is that, for each exact basis rule, first from D 577 429 383 305 generators to construct all possible exact basis rules whose E 450 405 315 287 antecedent is a descendant of the exact basis rule (steps 4 to 7 in Algorithm 3). These rules are potential exact basis rules F 725 602 91 80 that might have been eliminated due to the Fig. 2. Multi Level association Rules ancestor-descendant relationship. Then check to make sure these potential rules are valid (steps 8 to 12), finally, from the potential exact rules to find exact basis rules. These exact basis rules have been eliminated due to the ancestor-descendant relationship (steps 13 to 18). Algorithm 3: DeriveExactHR() Input: Set of exact basis rules denoted as Exact basis, set of frequent closed itemsets C and generators G. Output: Set of rules that covers the exact basis and the hierarchically redundant rules. 1. Recovered :=  2. r Exact basis 3. CandidateBasis:=  4. for all generator g in G 5. if any of the item x in the antecedent X of rule r: X Y is the ancestor of g. 6. then add all of the possible subsets of g into S 7. end for VI. CONCLUSION 8. for all s in S, check every , x X if x doesn’t have a descendant in s, add x to s to make s a descendant set of X Redundancy in Association Rules affects the quality of the 9. if s has no ancestors in Y and s has no descendants in Y and information presented and this affects and reduces the use of the rule set. The goal of redundancy elimination is to improve for all items i s there are no ancestor-descendant relations the quality, thus allowing them to better solve problems being with item i s and for all item i Y there are no faced. Our work aims to remove hierarchical redundancy in ancestor-descendant relation with item i Y multi-level datasets, thus reducing the size of the rule set to 10. then insert s Y in CandidateBasis improve the quality and usefulness, without causing the loss 11. end if of any information. We have proposed an approach which 12. end for removes hierarchical redundancy through the use of frequent 13. for all B D  CandidateBasis closed itemsets and generators. This allows it to be added to 14. if BUD= itemset i C and B = gGi other approaches which also remove redundant rules, thereby 15. insert {B D , g.supp} in Recovered allowing a user to remove as much redundancy as possible. 16. end if The next step in our work is to apply this approach to the 17. end for approximate basis rule set to remove redundancy there. We 18. end for will also review our work to see if there are other hierarchical 19. return Exactbasis U Recovered 48 All Rights Reserved © 2012 IJARCSEE
  • 6. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 5, July 2012 redundancies in the basis rule sets that should be removed Lal Nehru University, Delhi and Ph.D from JNT University in 2003. He is very eminent person in the field of teaching he guided 4 Ph.D scholars. and will investigate what should and can be done to further improve the quality of multi-level Association Rules. Prof. SSVN. Sarma is an eminent professor in Mathematics, He Worked in Department of Mathematics and Department of Informatics, Kakatiya University over period of 30 years. He is well known person in the field of Mathematics and Computer Science. He guided many Ph.D Scholars REFERENCES [1] Bayardo, R.J., Agrawal, R. & Gunopulos, D.(2000). Constraint-based rule mining in large,dense databases. Data Mining and Knowledge Discovery, 4: 217-240 [2] Berry, M.J.A. & Linoff, G.S. (1997). Data Mining Techniques for Marketing Sales and Customer Support. John Wiley and Sons [3] [3] Brin, S., Motwani, R., Ullman, J.D. & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD Conference, 255-264 [4] [4] Ganter, B. & Wille, R. (1999). Formal Concept Analysis: Mathematical Foundations. Springer-Verlag [5] [5] Han, J. & Fu, Y. (1995). Discovery of multiple-level Association Rules from large databases. In: Proceedings of the 21st International Conference on Very Large Databases, 420-431 [6] [6] Han, J. & Fu, Y. (1999). Mining multiple-level Association Rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11 (5): 798-805 [7] [7] Han, J. & Fu, Y. (2000). Mining multiple-level Association Rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11: 798-804 [8] [8] Hong, T.P., Lin, K.Y. & Chien, B.C. (2003). Mining fuzzy multiple-level Association Rules from quantitative data. Applied Intelligence, 18 (1): 79-90 [9] [9] Kaya, M. & Alhajj, R. (2004). Mining multi-cross-level fuzzy weighted Association Rules. In: the 2nd International IEEE Conference on Intelligent Systems, 225-230 [10] [10] Kryszkiewicz, M., Rybinski, H. & Gajek, M. (2004). Dataless transitions between concise representations of frequent patterns. Journal of Intelligent Information Systems, 22 (1): 41-70 [11] [11] Ng, R.T., Lakshmanan, V., Han, J. & Pang, A. (1998). Exploratory mining and pruning otimizations of constrained Association Rules. In: Proceedings of the SIGMOD conference, 13-24 [12] [12] Pasquier, N., Bastide, Y., Taouil, R. & Lakhal, L. (1999). Efficient mining of association rultes using closed itemset lattices. Information Systems, 24 (1): 25-46 [13] [13] Pasquier, N., Taouil, R., Bastide, Y., Stumme, G. & Lakhal, L. (2005). Generating a condensed representation for Association Rules. Journal of Intelligent Information Systems, 24 (1): 29-60 [14] [14] Srikant, R., Vu, Q. & Agrawal, R. (1997). Mining Association Rules with item constraints. In: Proceedings of the KDD Conference, 67-73 [15] [15] Thakur, R.S., Jain, R.C. & Pardasani, K.P. (2006). Mining level-crossing Association Rules from large databases. Journal of Computer Science, 76-81 [16] [16] Wille, R. (1982). Restructuring lattices theory: An approach based on hierarchies of concepts. In: Rival, I. (ed.), Ordered Sets. Dordrecht-Boston [17] [17] Xu, Y. & Li, Y. (2007). Generating concise Association Rules. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management (CIKM07), 781-790 [18] [18] Zaki, M.J. (2000). Generating nonredundent Association Rules. In: Proceedings of the KDD Conference, 34-43 [19] [19] Zaki, M.J. (2004). Mining non-redundant Association Rules. Data Mining and Knowledge Discovery, 9: 223-248 [20] [20] Ziegler, C.N., McNee, S.M., Konstan, J.A & Lausen, G. (2005). Improving recommendation lists through topic diversification. In: Proceed. of the 14th Inter. World Wide Web Conference, 22-32 R. Vijaya Prakash is presently working as Associate Professor in Department of Computer science & Engineering, SR Engineering College., Pursuing Ph.D from Kakatiya University, Warangal. He completed his MCA from Kakatiya University and M.Tech (IT) form Punjabi University, Patiala. Dr. A. Govardhan is working as professor in Department of Computer Science & Engineering, JNT University, Hyderabad. He Received his B.Tech in 1992 from Osmain University, M.Tech in 1994 from Jawahar 49 All Rights Reserved © 2012 IJARCSEE