SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN : 2278-800X, www.ijerd.com
Volume 5, Issue 9 (January 2013), PP. 30-35

  Data Mining By Parallelization of Fp-Growth Algorithm
                           Prof Vikram singh Tamra1, Robil varshney2
                                1,2
                                      Computer Science, Swit Ajmer, Rajasthan, India


    Abstract:- In this paper we present idea to make one main tree on master node and slave do
    processing with database rather than have multiple FP-trees, one for each processor Firstly, the dataset
    is divided equally among all participating processors P i. Before the tree construction initiates, every
    processor must sort the items according to their support count. The master processor gathers all local
    counts to be summed together and ultimately form a global support count to be broadcasted to all
    processors in the group. Items having support count less than the minimum threshold are removed
    from the process in slave node as well as master node. The next step is the FP-tree construction in
    which each node scans and sends the transaction according to threshold. At the same time master node,
    make its tree and also updated the tree as items come from slave node.

                                            I.       INTRODUCTION
         One of the currently fastest and most popular algorithms for frequent item set mining is the FP-growth
algorithm. It is based on a prefix tree representation of the given database of transactions (called an FP-tree),
which can save considerable amounts of memory for storing the transactions.
Frequent pattern mining plays an essential role in mining associations, correlations, causality, sequential
patterns , episodes , multidimensional patterns and many other important data mining tasks. Most of the
previous studies adopt an Apriori-algorithm may still suffer from the following two nontrivial costs:
   It is costly to handle a huge number of candidate sets. For example, if there are 10 4 frequent 1-itemsets,
       the Apriori algorithm will need to generate more than 107 length-2 candidates and accumulate and test
       their occurrence frequencies.
   It is tedious to repeatedly scan the database and check a large set of candidates by pattern
        matching, which is especially true for mining long patterns.

                                      II.        DEFINITION OF TERMS
         Let I ={ i1,i2,i3,…...in} be a set of items. An item set X is a non-empty subset of I . An item set with m
items is called an m-itemset. Tuple < tid , X > is called a transaction where tid is a transaction identifier and X is
an item set. A transaction database TDB is a set of transactions. Given a transaction database TDB , the support
of an item set X , denoted as sup(X ) , is the number of transactions including the item set X . A frequent pattern
is defined as the item set whose support is higher than the minimum support min_sup.

         III.          CONCEPT OF FREQUENT PATTERN MINING ALGORITHM
         In 2000, Han et al. proposed the FP-growth algorithm—the first pattern-growth concept algorithm. FP-
growth constructs an FP-tree structure and mines frequent patterns by traversing the constructed FP tree. The
FP-tree structure is an extended prefix-tree structure involving crucial condensed information of frequent
patterns
         FP-growth algorithm is an efficient method of mining all frequent item sets without candidate’s
generation. The algorithm mine the frequent item sets by using a divide-and-conquer strategy as follows: FP-
growth first compresses the database representing frequent item set into a frequent-pattern tree. The next step is
to divide a compressed database into set of conditional databases (a special kind of projected database), each
associated with one frequent item. Finally, mine each such database separately.

a) FP-tree structure
          The FP-tree structure has sufficient information to mine complete frequent patterns. It consists of a
prefix tree of frequent 1-itemset and a frequent-item header table. Each node in the prefix-tree has three fields:
item-name, count, and node-link.
    item-name is the name of the item.
    count is the number of transactions that consist of the frequent 1-items on the path from root to this node.
    node-link is the link to the next same item name node in the FP-tree.
       Each entry in the frequent-item header table has two
                                                          30
Data Mining By Parallelization of Fp-Growth Algorithm

  fields: item-name and head of node-link.
   item-name is the name of the item.
   head of node-link is the link to the first same item-name node in the prefix-tree.

b) Construction of FP-tree
         FP-growth has to scan the TDB twice to construct an FP-tree. The first scan of TDB retrieves a set of
frequent items from the TDB . Then, the retrieved frequent items are ordered by descending order of their
supports. The ordered list is called an F-list. In the second scan, a tree T whose root node R labeled with ―null‖
is created. Then, the following steps are applied to every transaction in the TDB . Here, let a transaction
represent [ p | P] where p is the first item of the transaction and P is the remaining items. In each transaction,
infrequent items are discarded.
         Then, only the frequent items are sorted by the same order of F-list.Call insert_tree ( p | P, R) to
construct an FP-tree. The function insert_tree ( p | P, R) appends a transaction [ p | P] to the root node R of the
tree T .Pseudo code of the function insert_tree ( p | P, R) is shown in Figure 1.An example of an FP-tree is
shown in Figure 2. This FP-tree is constructed from the TDB shown in Table 1 with min_sup = 3. In Figure 2,
every node is represented by (item     name : count) . Links to next same item name node are represented by
dotted arrows.
                                  TID Item                          Frequent
                                                                    Item
                                  1       s, z, x, w, r, p, l, i    s, x, z, l, i
                                  2       z, y, x, s, p, l, j       s, x, z, y, l
                                  3       y, s, k, o, j             s, y
                                  4       y, x, n, e, i             x, y, i
                                  5       z, s, x, t, m, i, l, k    s, x, z, l, i
                                                 Table1.Sample TDB




                                         Figure 2. Example of an FP-tree

                                          {(s:2, x:2, z:2, l:2),(x:1, y:1)}
                                           i’s conditional pattern base




                                         Figure 3.i ’s conditional FP-tree

c) FP-growth
         FP-growth mines frequent patterns from an FP-tree. To generate complete frequent patterns, FP-growth
traverses all the node-links from ―head of node-links‖ in the FP-tree’s header table. For any frequent item ai , all
possible frequent patterns including ai can be mined by following
         ai ’s node-link starting from ai ’s head in the FP-tree header table. In detail, ai’s prefix path from ai’s
node to root node is extracted at first. Then, the prefix path is transformed into a i’s conditional pattern base,
which is a list of items that occur before a i with the support values of all the items along the list. Then, FP-
growth constructs ai’s conditional FP-tree containing only the paths in ai’s conditional pattern base. It then
                                                         31
Data Mining By Parallelization of Fp-Growth Algorithm

mines all the frequent patterns including item ai from ai’s conditional FP-tree. For example, we describe how to
mine all the frequent patterns including item p from the FP-tree shown in Figure 2. For node i , FP-growth
mines a frequent pattern (i:3) by traversing i ’s node-links through node (i:2) to node (i:1). Then, it extracts i ’s
prefix paths; < s:4, x:3, z:3, l:2 > and <x:1,y:1>. To study which items appear together with i , the transformed
path < s:2, x:2, z:2, l:2 > is extracted from < s:4, x:3, z:3, l:2 > because the support value of i is 2. Similarly, we
have <x:1,y:1>. The set of these paths {(s:2,x:2,z:2,l:2),(x:1,y:1)}is called i’s conditional pattern base. FP-
growth then constructs i ’s conditional FP-tree containing only the paths in i ’s conditional pattern base as
shown in Figure 3. As only x is an item occurring more than min_sup appearing in i ’s conditional pattern base,
i ’s conditional FP-tree leads to only one branch (x:3). Hence, only one frequent pattern (xi:3) is mined. The
final frequent patterns including item i are (i:3) and (xi:3).

                        IV.          EXISTED AND PROPOSED APPROACH
           The proposed algorithm is based on one of the best known sequential techniques referred to as
Frequent Pattern (FP) Growth algorithm. Unlike most of the earlier parallel approaches based on different
variants of the Apriori Algorithm, the algorithm presented here does not explicitly result in having entire
counting data structure duplicated on each processor. Furthermore, the proposed algorithm introduces minimum
communication (and hence synchronization) overheads by efficiently partitioning the list of frequent elements
list over processors.

4.1.1. Existing Parallel Fp-Tree Constructing Algorithm
           It is obvious that Constructing FP-tree is the key step in FP-growth algorithm, FP-tree contains
compressed message of frequent item sets, it is easy to mining frequent item sets from FP-tree. To construct FP-
tree, we must scan data base twice, one for creating 1-itemset and one for build FP-tree. To mine useful
information from database effectively, we needs to solve efficiency problem of mining, when data quantity very
large, efficiency of mining algorithm become the key of mining problem, to increase efficiency of mining
algorithm, so we developed parallel algorithm in a effective manner.
           First we scan the database to identified the frequent 1-itemsets. In order to enumerate the frequent
items efficiently, then we divide the datasets among the available processors. Each processor is given an
approximately equal number of transactions to read and analyze. As a result, the dataset is split in n equal sizes.
Each processor enumerates the items appearing in the transactions at hand. After enumeration of sub
occurrences, a global count is necessary to identify the frequent items. This count is done in parallel where each
processor is allocated an equal number of items to sum their sub supports into global count. Finally, in a
sequential phase infrequent items with a support less than the support threshold are not consider in next steps
and the remaining frequent items are sorted by their frequency.
           On master node, These sorted transaction items are used in constructing the FP-Trees as follows: for
the first item on the sorted transactional dataset, check if it exists as one of the children of the root. If it exists
then increment the support for this node. Otherwise, add a new node for this item as a child for the root node
with 1 as support. Then, consider the current item node as the newly temporary root and repeat the same
procedure with the next item on the sorted transaction .As transaction comes from slave nodes, master add them
in it’s already build tree and make one global tree with header table link and count. After that master applied FP
Growth on this tree to find the conditional pattern base and make recursive call.

Existing Algorithm (PFPTC: Parallel FP-tree Constructing Algorithm)
Input: A transaction database DB and a minimum support threshold ξ.
Output: Its frequent pattern tree, FP-Tree
Method: The FP-tree is constructed in the following steps.
Suppose there are n processor p1, p2… pn;
Divide database DB into n part DB1… DBn with same size;
Processor pi scan DBj once. And create total set of frequent items F and their supports. Sort F in support
descending order as L.
for ( j = 1; j≤n; j++) do ( concurrently)
{
Processor pj parallel scan DBj, construct FP-treej, do not create head table and node link ;
}
While (n>1) do
for ( j = 1; j≤n; j=j+2)
{
Processor pj parallel call FP-merge (FP-treej, FP-treej+1, int (j/2)+1 );
n=n/2 ;

                                                          32
Data Mining By Parallelization of Fp-Growth Algorithm

}
Create head table and same nodes link for FP-tree1;
Return FP-tree1;
End;

4.1.2. Proposed Algorithm (PFPTC : Parallel FP-Tree Constructing Algorithm)
Input: A transaction database DB and a minimum support threshold ξ.
Output: Its frequent pattern tree, FP-Tree
Method: The FP-tree is constructed in the following steps.
Suppose there are n processor p1, p2,…,pn
(one master node p1, remaining slave nodes p2,p3…….pn);
Divide database DB into n part DB1,DB2… DBn with same size;
Processor pj scans DBj once. And create total set of items and their supports. Send it to master node which
merges them to generate the cumulative support.
The master node sends the generated results back to slaves individually.
for ( j = 2; j≤n; j++) do ( concurrently)
{
Processor pj parallel scan DBj,
For each transaction, find the frequent items of the transaction based on the global threshold.
Send the results to the master node for each transaction.
}
If (j=1)
{
For each transaction, find the frequent items of the transaction based on the global threshold and
   if ( tree is created)
      Add this data
Else
      Create tree with this data.
Receive the results from the slave nodes for each transaction for each slave node one by one.
Add this data to tree.
Call FP Growth on root.
Calculate execution time for FP Tree on each node (including master node)
Free memory
End;

                                V.     PERFORMANCE EVALUATION
         Number of experiments have been done to evaluate the scalability of parallel algorithm. Performance
evaluation would be incomplete if we exclude from discussion issues that do not occur in sequential
programming such as sources of parallel overhead and scalability.
The performance analysis results will indicate comparisons as follows:
Serial execution Vs. Parallel execution of given application. This results show how well cluster is performing
for given application.
Show the scalability of parallel algorithm by means of execution, speed up and efficiency.

5.1 Serial Execution Vs Parallel Execution:
5.1.1 Consideration Of Database (Size=5577)
          here we are taken a database of size 5577 transaction and perform mining of item serial as well as
parallel. The corresponding result is shown in table. We analyze it through graph.

                              Threshold   Serial Execution Parallel Execution
                                          (seconds)            (seconds)
                              .10         .104                 .048
                              .15         .060                 .040
                              .20         .044                 .038
                            Table 2. Execution time in serial and parallel algorithm

5.1.1.2 Graph Corresponding to Table:



                                                      33
Data Mining By Parallelization of Fp-Growth Algorithm

                                                   0.15




                                 Execution Time
                                                     0.1

                                                   0.05

                                                       0
                                                            0.1     0.15       0.2 Threshold
                                                    Serial Execution        parallel Execution

                              Figure 4.Graph between execution time and threshold

5.2 Scalability Evaluation:
Serial run-time is indicated using Ts
Parallel run-time using Tp.

5.2.1 Speed up:
         Speed-up, S, is defined as ―the ratio of the serial run-time of the best sequential algorithm for solving a
problem to the time taken by the parallel algorithm to solve the same problem on p processors:
S = Ts / Tp

5.2.2 Efficiency:
         Efficiency E, is ―a measure of the fraction of time for which a processor usefully employed‖; it is
defined as ―the ratio of speed-up to the number of processors‖.
E=S/P
Result corresponding to Speed up and Efficiency

                                                     Threshold Speed up Efficiency
                                                     .10          2.16        .72
                                                     .15          1.5         .50
                                                     .20          1.15        .38
                                                  Table 3. Table for speed up and efficiency

                                                     VI.           CONCLUSION
          Throughout the last decade, a lot of people have implemented and compared several algorithms that try
to solve the frequent item set mining problem as efficiently as possible. Moreover, we experienced that different
implementations of the same algorithms could still result in significantly different performance results. As a
consequence, several claims that were presented in some articles were later contradicted in other articles.
In this paper, we performed mining frequent item in transactional database for making difference between
sequential and parallel algorithm, we noted the time of making FP Tree for both algorithm on different
threshold. After compare the result of different metric with idle parallel system metric, we conclude that our
algorithm works good in parallel environment.

                                                   VII.           FUTURE SCOPE
“Change is the only constant thing.”
         Rightly said. There is always a scope for amelioration. The future extension to this project may include
some updations such as using optimization to make the database efficient. The database here comprised of
numeric values. Instead, alphabetical letters could also be taken and then the scanning would be word by word.
Also, most significantly, the processing time can be further compressed. Apart from that, we can also introduce
the stream data analysis, in which data comes regularly and we have to perform mining on time bases.
Efficiency of processors also matters in this approach, if some processors are more efficient than others than we
can perform the load balancing concept by mean of divide the task according to processor efficiency.
         The optimization method introduced still operates on static work generation. It puts down to the
appropriate level and each time, produces work generation only in the middle 50 percent of the tree. This was
applied in order to reduce the communication costs caused by the generation of jobs that are too small and thus
not worth sending them to a different unit for processing. It would be much better if a dynamic work generation


                                                                       34
Data Mining By Parallelization of Fp-Growth Algorithm

scheme is introduced. Such a scheme must be able produce a desirable number of nodes for processing
according to the number of nodes at each level.

                                             REFERENCES
[1].   T. Imielinski R. Agrawal and A.N. Swami. A tree projection algorithm for generation of frequent item
       sets. Parallel and Distributed Computing, pages 350–371, March 2001
[2].   J. Han J. Pei and Y. Yin. Minning frequent pattern without candidate generation. In Proc. 2000
       ACMSIGMOD Int. Conf. on Management of Data. ACM, 2000.
[3].   Agrawal, R., Imielinski, T. and Swami, A. (1993) Mining association rules between sets of items in
       large databases. In Proceedings of the 1993 International Conference on Management of Data
       (SIGMOD 93), pages 207-216.
[4].   Bayardo R. J. (1998). Efficiently mining long patterns from databases. Proceedings of the 1998 ACM
       SIGMOD international conference on Management of data, 85-93.
[5].   Christian Borgelt. An implementation of fp-growth algorithm.
[6].   Li J., Liu Y., and Choudhary A. (2006). Parallel Data Mining Algorithms for Association Rules and
       Clustering. Handbook of Parallel Computing: Models, Algorithms and Applications. Sanguthevar
       Rajasekaran and John Reif, ed., CRC Press.
[7].   Han, E. H., Karypis, G., and Kumar, V. (2000) Scalable parallel data mining for association rules.
       IEEE Transactions on Knowledge and Data Engineering, 12, 337- 352. 76
[8].   Agrawal, R., Shafer, J. C., Center, I., and San Jose, C. A. (1996) Parallel mining of association rules.
       Knowledge and Data Engineering, IEEE Transactions on, 8, 962-969.
[9].   Chung, S. M., and Luo, C. (2003) Parallel mining of maximal frequent itemsets from databases. Tools
       with Artificial Intelligence, 2003. Proceedings. 15th IEEE International Conference on, 134-139.




                                                     35

Weitere ähnliche Inhalte

Was ist angesagt?

Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...IJDKP
 
Paper id 42201608
Paper id 42201608Paper id 42201608
Paper id 42201608IJRAT
 
Associations1
Associations1Associations1
Associations1mancnilu
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalramya marichamy
 
Trees (data structure)
Trees (data structure)Trees (data structure)
Trees (data structure)Trupti Agrawal
 
Tree and binary tree
Tree and binary treeTree and binary tree
Tree and binary treeZaid Shabbir
 
Trees - Data structures in C/Java
Trees - Data structures in C/JavaTrees - Data structures in C/Java
Trees - Data structures in C/Javageeksrik
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
 
Binary Search Tree and AVL
Binary Search Tree and AVLBinary Search Tree and AVL
Binary Search Tree and AVLKatang Isip
 
Stacks and queues
Stacks and queuesStacks and queues
Stacks and queuesAbbott
 

Was ist angesagt? (20)

B0950814
B0950814B0950814
B0950814
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
AN ENHANCED FREQUENT PATTERN GROWTH BASED ON MAPREDUCE FOR MINING ASSOCIATION...
 
Paper id 42201608
Paper id 42201608Paper id 42201608
Paper id 42201608
 
Associations1
Associations1Associations1
Associations1
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
Trees (data structure)
Trees (data structure)Trees (data structure)
Trees (data structure)
 
Tree and binary tree
Tree and binary treeTree and binary tree
Tree and binary tree
 
arrays of structures
arrays of structuresarrays of structures
arrays of structures
 
Arrays in c
Arrays in cArrays in c
Arrays in c
 
Dm assignment
Dm assignmentDm assignment
Dm assignment
 
Trees - Data structures in C/Java
Trees - Data structures in C/JavaTrees - Data structures in C/Java
Trees - Data structures in C/Java
 
BINARY SEARCH TREE
BINARY SEARCH TREEBINARY SEARCH TREE
BINARY SEARCH TREE
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Python for beginners
Python for beginnersPython for beginners
Python for beginners
 
Bst(Binary Search Tree)
Bst(Binary Search Tree)Bst(Binary Search Tree)
Bst(Binary Search Tree)
 
Binary Search Tree and AVL
Binary Search Tree and AVLBinary Search Tree and AVL
Binary Search Tree and AVL
 
Stacks and queues
Stacks and queuesStacks and queues
Stacks and queues
 
B03606010
B03606010B03606010
B03606010
 

Andere mochten auch

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Andere mochten auch (10)

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
F05093640
F05093640F05093640
F05093640
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Ähnlich wie Welcome to International Journal of Engineering Research and Development (IJERD)

Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...Dr. Amarjeet Singh
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...ijsrd.com
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining iosrjce
 
An improvised frequent pattern tree
An improvised frequent pattern treeAn improvised frequent pattern tree
An improvised frequent pattern treeIJDKP
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...eSAT Publishing House
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...eSAT Journals
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2Krish_ver2
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithmijceronline
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac treeAlexander Decker
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac treeAlexander Decker
 
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...iosrjce
 
Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)pushpalathakrishnan
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureIOSR Journals
 
Efficient Mining of Association Rules in Oscillatory-based Data
Efficient Mining of Association Rules in Oscillatory-based DataEfficient Mining of Association Rules in Oscillatory-based Data
Efficient Mining of Association Rules in Oscillatory-based DataWaqas Tariq
 
Survey Performance Improvement Construct FP-Growth Tree
Survey Performance Improvement Construct FP-Growth TreeSurvey Performance Improvement Construct FP-Growth Tree
Survey Performance Improvement Construct FP-Growth Treeijsrd.com
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...AshishDPatel1
 

Ähnlich wie Welcome to International Journal of Engineering Research and Development (IJERD) (20)

Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
G017633943
G017633943G017633943
G017633943
 
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
 
An improvised frequent pattern tree
An improvised frequent pattern treeAn improvised frequent pattern tree
An improvised frequent pattern tree
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...
 
FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithm
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree
 
R01732105109
R01732105109R01732105109
R01732105109
 
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...
Analytical Study and Newer Approach towards Frequent Pattern Mining using Boo...
 
Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining ProcedureDiscovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
 
Efficient Mining of Association Rules in Oscillatory-based Data
Efficient Mining of Association Rules in Oscillatory-based DataEfficient Mining of Association Rules in Oscillatory-based Data
Efficient Mining of Association Rules in Oscillatory-based Data
 
Survey Performance Improvement Construct FP-Growth Tree
Survey Performance Improvement Construct FP-Growth TreeSurvey Performance Improvement Construct FP-Growth Tree
Survey Performance Improvement Construct FP-Growth Tree
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
 

Mehr von IJERD Editor

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEIJERD Editor
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignIJERD Editor
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationIJERD Editor
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string valueIJERD Editor
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingIJERD Editor
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart GridIJERD Editor
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
 

Mehr von IJERD Editor (20)

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACE
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding Design
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and Verification
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive Manufacturing
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string value
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF Dxing
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart Grid
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powder
 

Welcome to International Journal of Engineering Research and Development (IJERD)

  • 1. International Journal of Engineering Research and Development e-ISSN: 2278-067X, p-ISSN : 2278-800X, www.ijerd.com Volume 5, Issue 9 (January 2013), PP. 30-35 Data Mining By Parallelization of Fp-Growth Algorithm Prof Vikram singh Tamra1, Robil varshney2 1,2 Computer Science, Swit Ajmer, Rajasthan, India Abstract:- In this paper we present idea to make one main tree on master node and slave do processing with database rather than have multiple FP-trees, one for each processor Firstly, the dataset is divided equally among all participating processors P i. Before the tree construction initiates, every processor must sort the items according to their support count. The master processor gathers all local counts to be summed together and ultimately form a global support count to be broadcasted to all processors in the group. Items having support count less than the minimum threshold are removed from the process in slave node as well as master node. The next step is the FP-tree construction in which each node scans and sends the transaction according to threshold. At the same time master node, make its tree and also updated the tree as items come from slave node. I. INTRODUCTION One of the currently fastest and most popular algorithms for frequent item set mining is the FP-growth algorithm. It is based on a prefix tree representation of the given database of transactions (called an FP-tree), which can save considerable amounts of memory for storing the transactions. Frequent pattern mining plays an essential role in mining associations, correlations, causality, sequential patterns , episodes , multidimensional patterns and many other important data mining tasks. Most of the previous studies adopt an Apriori-algorithm may still suffer from the following two nontrivial costs:  It is costly to handle a huge number of candidate sets. For example, if there are 10 4 frequent 1-itemsets, the Apriori algorithm will need to generate more than 107 length-2 candidates and accumulate and test their occurrence frequencies.  It is tedious to repeatedly scan the database and check a large set of candidates by pattern matching, which is especially true for mining long patterns. II. DEFINITION OF TERMS Let I ={ i1,i2,i3,…...in} be a set of items. An item set X is a non-empty subset of I . An item set with m items is called an m-itemset. Tuple < tid , X > is called a transaction where tid is a transaction identifier and X is an item set. A transaction database TDB is a set of transactions. Given a transaction database TDB , the support of an item set X , denoted as sup(X ) , is the number of transactions including the item set X . A frequent pattern is defined as the item set whose support is higher than the minimum support min_sup. III. CONCEPT OF FREQUENT PATTERN MINING ALGORITHM In 2000, Han et al. proposed the FP-growth algorithm—the first pattern-growth concept algorithm. FP- growth constructs an FP-tree structure and mines frequent patterns by traversing the constructed FP tree. The FP-tree structure is an extended prefix-tree structure involving crucial condensed information of frequent patterns FP-growth algorithm is an efficient method of mining all frequent item sets without candidate’s generation. The algorithm mine the frequent item sets by using a divide-and-conquer strategy as follows: FP- growth first compresses the database representing frequent item set into a frequent-pattern tree. The next step is to divide a compressed database into set of conditional databases (a special kind of projected database), each associated with one frequent item. Finally, mine each such database separately. a) FP-tree structure The FP-tree structure has sufficient information to mine complete frequent patterns. It consists of a prefix tree of frequent 1-itemset and a frequent-item header table. Each node in the prefix-tree has three fields: item-name, count, and node-link.  item-name is the name of the item.  count is the number of transactions that consist of the frequent 1-items on the path from root to this node.  node-link is the link to the next same item name node in the FP-tree. Each entry in the frequent-item header table has two 30
  • 2. Data Mining By Parallelization of Fp-Growth Algorithm fields: item-name and head of node-link.  item-name is the name of the item.  head of node-link is the link to the first same item-name node in the prefix-tree. b) Construction of FP-tree FP-growth has to scan the TDB twice to construct an FP-tree. The first scan of TDB retrieves a set of frequent items from the TDB . Then, the retrieved frequent items are ordered by descending order of their supports. The ordered list is called an F-list. In the second scan, a tree T whose root node R labeled with ―null‖ is created. Then, the following steps are applied to every transaction in the TDB . Here, let a transaction represent [ p | P] where p is the first item of the transaction and P is the remaining items. In each transaction, infrequent items are discarded. Then, only the frequent items are sorted by the same order of F-list.Call insert_tree ( p | P, R) to construct an FP-tree. The function insert_tree ( p | P, R) appends a transaction [ p | P] to the root node R of the tree T .Pseudo code of the function insert_tree ( p | P, R) is shown in Figure 1.An example of an FP-tree is shown in Figure 2. This FP-tree is constructed from the TDB shown in Table 1 with min_sup = 3. In Figure 2, every node is represented by (item  name : count) . Links to next same item name node are represented by dotted arrows. TID Item Frequent Item 1 s, z, x, w, r, p, l, i s, x, z, l, i 2 z, y, x, s, p, l, j s, x, z, y, l 3 y, s, k, o, j s, y 4 y, x, n, e, i x, y, i 5 z, s, x, t, m, i, l, k s, x, z, l, i Table1.Sample TDB Figure 2. Example of an FP-tree {(s:2, x:2, z:2, l:2),(x:1, y:1)} i’s conditional pattern base Figure 3.i ’s conditional FP-tree c) FP-growth FP-growth mines frequent patterns from an FP-tree. To generate complete frequent patterns, FP-growth traverses all the node-links from ―head of node-links‖ in the FP-tree’s header table. For any frequent item ai , all possible frequent patterns including ai can be mined by following ai ’s node-link starting from ai ’s head in the FP-tree header table. In detail, ai’s prefix path from ai’s node to root node is extracted at first. Then, the prefix path is transformed into a i’s conditional pattern base, which is a list of items that occur before a i with the support values of all the items along the list. Then, FP- growth constructs ai’s conditional FP-tree containing only the paths in ai’s conditional pattern base. It then 31
  • 3. Data Mining By Parallelization of Fp-Growth Algorithm mines all the frequent patterns including item ai from ai’s conditional FP-tree. For example, we describe how to mine all the frequent patterns including item p from the FP-tree shown in Figure 2. For node i , FP-growth mines a frequent pattern (i:3) by traversing i ’s node-links through node (i:2) to node (i:1). Then, it extracts i ’s prefix paths; < s:4, x:3, z:3, l:2 > and <x:1,y:1>. To study which items appear together with i , the transformed path < s:2, x:2, z:2, l:2 > is extracted from < s:4, x:3, z:3, l:2 > because the support value of i is 2. Similarly, we have <x:1,y:1>. The set of these paths {(s:2,x:2,z:2,l:2),(x:1,y:1)}is called i’s conditional pattern base. FP- growth then constructs i ’s conditional FP-tree containing only the paths in i ’s conditional pattern base as shown in Figure 3. As only x is an item occurring more than min_sup appearing in i ’s conditional pattern base, i ’s conditional FP-tree leads to only one branch (x:3). Hence, only one frequent pattern (xi:3) is mined. The final frequent patterns including item i are (i:3) and (xi:3). IV. EXISTED AND PROPOSED APPROACH The proposed algorithm is based on one of the best known sequential techniques referred to as Frequent Pattern (FP) Growth algorithm. Unlike most of the earlier parallel approaches based on different variants of the Apriori Algorithm, the algorithm presented here does not explicitly result in having entire counting data structure duplicated on each processor. Furthermore, the proposed algorithm introduces minimum communication (and hence synchronization) overheads by efficiently partitioning the list of frequent elements list over processors. 4.1.1. Existing Parallel Fp-Tree Constructing Algorithm It is obvious that Constructing FP-tree is the key step in FP-growth algorithm, FP-tree contains compressed message of frequent item sets, it is easy to mining frequent item sets from FP-tree. To construct FP- tree, we must scan data base twice, one for creating 1-itemset and one for build FP-tree. To mine useful information from database effectively, we needs to solve efficiency problem of mining, when data quantity very large, efficiency of mining algorithm become the key of mining problem, to increase efficiency of mining algorithm, so we developed parallel algorithm in a effective manner. First we scan the database to identified the frequent 1-itemsets. In order to enumerate the frequent items efficiently, then we divide the datasets among the available processors. Each processor is given an approximately equal number of transactions to read and analyze. As a result, the dataset is split in n equal sizes. Each processor enumerates the items appearing in the transactions at hand. After enumeration of sub occurrences, a global count is necessary to identify the frequent items. This count is done in parallel where each processor is allocated an equal number of items to sum their sub supports into global count. Finally, in a sequential phase infrequent items with a support less than the support threshold are not consider in next steps and the remaining frequent items are sorted by their frequency. On master node, These sorted transaction items are used in constructing the FP-Trees as follows: for the first item on the sorted transactional dataset, check if it exists as one of the children of the root. If it exists then increment the support for this node. Otherwise, add a new node for this item as a child for the root node with 1 as support. Then, consider the current item node as the newly temporary root and repeat the same procedure with the next item on the sorted transaction .As transaction comes from slave nodes, master add them in it’s already build tree and make one global tree with header table link and count. After that master applied FP Growth on this tree to find the conditional pattern base and make recursive call. Existing Algorithm (PFPTC: Parallel FP-tree Constructing Algorithm) Input: A transaction database DB and a minimum support threshold ξ. Output: Its frequent pattern tree, FP-Tree Method: The FP-tree is constructed in the following steps. Suppose there are n processor p1, p2… pn; Divide database DB into n part DB1… DBn with same size; Processor pi scan DBj once. And create total set of frequent items F and their supports. Sort F in support descending order as L. for ( j = 1; j≤n; j++) do ( concurrently) { Processor pj parallel scan DBj, construct FP-treej, do not create head table and node link ; } While (n>1) do for ( j = 1; j≤n; j=j+2) { Processor pj parallel call FP-merge (FP-treej, FP-treej+1, int (j/2)+1 ); n=n/2 ; 32
  • 4. Data Mining By Parallelization of Fp-Growth Algorithm } Create head table and same nodes link for FP-tree1; Return FP-tree1; End; 4.1.2. Proposed Algorithm (PFPTC : Parallel FP-Tree Constructing Algorithm) Input: A transaction database DB and a minimum support threshold ξ. Output: Its frequent pattern tree, FP-Tree Method: The FP-tree is constructed in the following steps. Suppose there are n processor p1, p2,…,pn (one master node p1, remaining slave nodes p2,p3…….pn); Divide database DB into n part DB1,DB2… DBn with same size; Processor pj scans DBj once. And create total set of items and their supports. Send it to master node which merges them to generate the cumulative support. The master node sends the generated results back to slaves individually. for ( j = 2; j≤n; j++) do ( concurrently) { Processor pj parallel scan DBj, For each transaction, find the frequent items of the transaction based on the global threshold. Send the results to the master node for each transaction. } If (j=1) { For each transaction, find the frequent items of the transaction based on the global threshold and if ( tree is created) Add this data Else Create tree with this data. Receive the results from the slave nodes for each transaction for each slave node one by one. Add this data to tree. Call FP Growth on root. Calculate execution time for FP Tree on each node (including master node) Free memory End; V. PERFORMANCE EVALUATION Number of experiments have been done to evaluate the scalability of parallel algorithm. Performance evaluation would be incomplete if we exclude from discussion issues that do not occur in sequential programming such as sources of parallel overhead and scalability. The performance analysis results will indicate comparisons as follows: Serial execution Vs. Parallel execution of given application. This results show how well cluster is performing for given application. Show the scalability of parallel algorithm by means of execution, speed up and efficiency. 5.1 Serial Execution Vs Parallel Execution: 5.1.1 Consideration Of Database (Size=5577) here we are taken a database of size 5577 transaction and perform mining of item serial as well as parallel. The corresponding result is shown in table. We analyze it through graph. Threshold Serial Execution Parallel Execution (seconds) (seconds) .10 .104 .048 .15 .060 .040 .20 .044 .038 Table 2. Execution time in serial and parallel algorithm 5.1.1.2 Graph Corresponding to Table: 33
  • 5. Data Mining By Parallelization of Fp-Growth Algorithm 0.15 Execution Time 0.1 0.05 0 0.1 0.15 0.2 Threshold Serial Execution parallel Execution Figure 4.Graph between execution time and threshold 5.2 Scalability Evaluation: Serial run-time is indicated using Ts Parallel run-time using Tp. 5.2.1 Speed up: Speed-up, S, is defined as ―the ratio of the serial run-time of the best sequential algorithm for solving a problem to the time taken by the parallel algorithm to solve the same problem on p processors: S = Ts / Tp 5.2.2 Efficiency: Efficiency E, is ―a measure of the fraction of time for which a processor usefully employed‖; it is defined as ―the ratio of speed-up to the number of processors‖. E=S/P Result corresponding to Speed up and Efficiency Threshold Speed up Efficiency .10 2.16 .72 .15 1.5 .50 .20 1.15 .38 Table 3. Table for speed up and efficiency VI. CONCLUSION Throughout the last decade, a lot of people have implemented and compared several algorithms that try to solve the frequent item set mining problem as efficiently as possible. Moreover, we experienced that different implementations of the same algorithms could still result in significantly different performance results. As a consequence, several claims that were presented in some articles were later contradicted in other articles. In this paper, we performed mining frequent item in transactional database for making difference between sequential and parallel algorithm, we noted the time of making FP Tree for both algorithm on different threshold. After compare the result of different metric with idle parallel system metric, we conclude that our algorithm works good in parallel environment. VII. FUTURE SCOPE “Change is the only constant thing.” Rightly said. There is always a scope for amelioration. The future extension to this project may include some updations such as using optimization to make the database efficient. The database here comprised of numeric values. Instead, alphabetical letters could also be taken and then the scanning would be word by word. Also, most significantly, the processing time can be further compressed. Apart from that, we can also introduce the stream data analysis, in which data comes regularly and we have to perform mining on time bases. Efficiency of processors also matters in this approach, if some processors are more efficient than others than we can perform the load balancing concept by mean of divide the task according to processor efficiency. The optimization method introduced still operates on static work generation. It puts down to the appropriate level and each time, produces work generation only in the middle 50 percent of the tree. This was applied in order to reduce the communication costs caused by the generation of jobs that are too small and thus not worth sending them to a different unit for processing. It would be much better if a dynamic work generation 34
  • 6. Data Mining By Parallelization of Fp-Growth Algorithm scheme is introduced. Such a scheme must be able produce a desirable number of nodes for processing according to the number of nodes at each level. REFERENCES [1]. T. Imielinski R. Agrawal and A.N. Swami. A tree projection algorithm for generation of frequent item sets. Parallel and Distributed Computing, pages 350–371, March 2001 [2]. J. Han J. Pei and Y. Yin. Minning frequent pattern without candidate generation. In Proc. 2000 ACMSIGMOD Int. Conf. on Management of Data. ACM, 2000. [3]. Agrawal, R., Imielinski, T. and Swami, A. (1993) Mining association rules between sets of items in large databases. In Proceedings of the 1993 International Conference on Management of Data (SIGMOD 93), pages 207-216. [4]. Bayardo R. J. (1998). Efficiently mining long patterns from databases. Proceedings of the 1998 ACM SIGMOD international conference on Management of data, 85-93. [5]. Christian Borgelt. An implementation of fp-growth algorithm. [6]. Li J., Liu Y., and Choudhary A. (2006). Parallel Data Mining Algorithms for Association Rules and Clustering. Handbook of Parallel Computing: Models, Algorithms and Applications. Sanguthevar Rajasekaran and John Reif, ed., CRC Press. [7]. Han, E. H., Karypis, G., and Kumar, V. (2000) Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Engineering, 12, 337- 352. 76 [8]. Agrawal, R., Shafer, J. C., Center, I., and San Jose, C. A. (1996) Parallel mining of association rules. Knowledge and Data Engineering, IEEE Transactions on, 8, 962-969. [9]. Chung, S. M., and Luo, C. (2003) Parallel mining of maximal frequent itemsets from databases. Tools with Artificial Intelligence, 2003. Proceedings. 15th IEEE International Conference on, 134-139. 35