SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
International Journal of Computer Engineering
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
and Technology (IJCET), 1, Number 2, Sep 6367(Print)© IAEME
ISSN 0976 – 6375(Online) Volume ISSN 0976 – - Oct (2010),
ISSN 0976 – 6375(Online) Volume 1
                                                                                IJCET
Number 2, Sep - Oct (2010), pp. 192-200                                     ©IAEME
© IAEME, http://www.iaeme.com/ijcet.html

  PREDICTION OF CUSTOMER BEHAVIOR USING CMA
                                         F.Fenita
                      Department of Computer Science and Engineering
                           Karunya University, Coimbatore, India
                              Email id: fenita87@gmail.com
                                       Sumitha C.H
                      Department of Computer Science and Engineering
                           Karunya University, Coimbatore, India
                             Email id: sumithach@karunya.edu

ABSTRACT:
        The process of extracting certain sequential pattern that exceeds the minimum
support threshold is called Sequential Pattern Mining. It has been used to predict various
aspects of customer buying behavior for a long time. The sequence that is discovered
using sequential pattern mining algorithm reveals the chronological relation between the
items and provides valuable information which will help in developing the marketing
strategies. There are various approaches proposed for sequential pattern mining. But in
the existing sequential pattern mining approaches, the cyclic behavior of the buying
pattern and the interval between the two consecutive items is hardly known. The
proposed algorithms can be implemented and supplied to test against teal time customer
data from the consumer goods company. The experimental results illustrate how the
model can be used to predict likely purchases within a certain time limit.
Keywords: Apriori, FreeSpan, MEMSIP, PrefixSpan, SPIRIT, Trend Modeling, Trend
distribution function.
1 INTRODUCTION
        A pattern is a type that predicts the manner in which the set of events occur. The
process of extracting the information about such pattern is called as Pattern Mining.
Sequential pattern is a sequence of item sets that occurs frequently in a specific order. All
the customer transactions viewed together as a sequence is called as customer-sequence
where each transaction is represented as an item sets in that sequence. Given a set of
sequences, a complete set of Frequent subsequences can be mined using Sequential



                                                 192
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

Pattern Mining. The applications of Sequential Pattern Mining are customer shopping
sequences, Telephone calling patterns, natural disasters like earth quakes etc. The process
of knowledge discovery in database consists of three steps. The first one is called
preprocessing which includes data cleaning, integration, selection and transformation.
Secondly, data mining which is the main process where different algorithms are applied
to produce the knowledge which are hidden. Thirdly, the post processing process which
evaluates the mining results according to the user’s requirements. The knowledge can be
presented only if the results are satisfactory. Different knowledge comes out as a result
when various data mining techniques are applied to the data sources. Depending on the
evaluation results obtained, mining can be redone or the user can modify the
requirements.
2 SEQUENTIAL PATTERN MINING
        The process of finding out the inter-transaction pattern in a given set of sequence
is called as sequential pattern mining. The sequential pattern mining algorithms are used
in finding out the complete set of frequent sub- sequences in the given set of sequences.
These algorithms are efficient in the case of sequential patterns which are too long. The
discovery of sequential pattern in transaction data is Very important in many application
domains especially in customer analyses where certain buying patterns such as follow-up
purchases can be determined. Thus, discovered sequential patterns from the super
markets are used to develop the marketing strategies. Sequential pattern is a sequence of
item sets that occurs frequently in a specific order. All the customer transactions viewed
together as a sequence is called as customer-sequence where each transaction is
represented as an itemset in that sequence. The customer support a sequence s, if s is
contained in the customer sequence, then the support of the sequence s is defined as the
fraction of customers who support the sequence.


                                                                            (1)
        The problem of discovering the sequence was first introduced by Agrawal and
Sirkant [2]. Many algorithms were developed later and successfully improved the
efficiency of the task of mining sequential patterns. The most basic and earlier algorithms
are based upon Apriori algorithm [1]. The core of the Apriori property is that any sub


                                                193
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

pattern of a frequent pattern must be frequent. Based on this heuristic, a series of Apriori-
like algorithms such as AprioriAll, AprioriSome, DynamicSome [8], and GSP [5] were
developed.
        Later on, different kinds of algorithms are proposed by different researchers. For
example, FreeSpan [3] and PrefixSpan [7] were developed by the data projection
approach. Han et al. [6] proposed two algorithms for mining partial periodic pattern
single period and multiple periods. Yanget al. [11] proposed distance- based pruning to
find the periodic patterns, which may contain a disturbance of length up to a certain
threshold. The mining of frequent partial periodic sequential patterns in a time series is to
find possibly with some restriction or disturbance. Ozden et al. [9] proposed the
sequential algorithm and interleaved algorithm to determine cyclic association rules.
Associate rules capture interrelationships between various items.
        The approaches presented above primarily target the orders of items purchased, or
cyclic patterns occurring within a particular time frame defined by users. However, by
these works, the cyclic behavior and the interval of the items purchased is hardly known.
Hence the Data mining skills and the fundamentals of statistic are combined to introduce
an algorithm Cyclic Model Analysis (CMA) to find out the model of recurring
purchasing. The modeling process starts with the discovery of sequential patterns from
the transactional database. From the discovered sequence, the trend distribution function
(TDF) is calculated. Then the existence of periodicity and the interval of successive
events are identified by the Generalized Periodicity Detection (GPD) and Trend
Modeling (TM) computed, which will be explained in more detail later. Next, the CMA
algorithm is used to obtain the period and trends of quantities of purchasing.
Consequently, marketing people can recommend the right products to the right customers
at the right time.
3 TREND DISTRIBUTION FUNCTION
        The distribution function is a time series representation of the transactions
associated with the sequence discovered. The traditional sequential formulation reveals
the chronological order of purchase only. If the plot of the function is sketched, the
movement along the curve shows the tendency of the purchase. For any given subset of
the domain of the function, a simple linear regression is used to construct the regression


                                                194
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

line of the function within the sub-domain. If the slope of the obtained straight line is a
positive number, the purchase of the item increases in a certain rate. If the slope is a
negative number, the sales volume of the item decreases after a certain point is reached.
If f(x) is linearly decreasing distribution function and reaches zero within the domain, the
point x is called the degeneration point of the function. The degeneration point signifies
the fact that the customers tend to stop purchasing. Hence, the business owner must be
alerted before the degeneration point is reached.
        Once the frequent itemsets are generated the trend distribution function is
calculated. First the trend distribution function for one itemset that is itemset containing
only one item is calculated. The entire database is divided into four tables. The first item
from the gene rated frequent item set is taken and the count of occurrence of that
particular item is checked in the tables. The difference in the count value is calculated
and it is incremented by one. The distribution calculated can be of three types. If the
obtained value of the distribution function is positive then it is called as ascending type of
distribution. If the obtained value of the distribution function is negative then it is called
as descending type of distribution. If the obtained distribution function is zero, then the
trend is constant that occurs more frequently in real world scenarios. Similarly the trend
distribution function is calculated for two itemsets and three itemsets and so on. Once the
trend distribution is calculated, these values can be given as inputs for further procedures.
4 GENERALIZED PERIODICITY DETECTION
        The scheme presented in this paper takes a two-phase approach to cope with all
periodicity-related problems, which occur in the analysis process of sequential pattern
mined from transactions. It is well known that a host of algorithms have been developed
for efficient mining of sequential patterns. To solve the periodicity problem, a
mathematical model constructed to portray the sequential pattern mined from the
database.
        The scheme comprises the sequential pattern mining technique and the algorithms
presented in this section. Once the result of sequential pattern mining is given, the
primary concern is to know where there are regularities that can be found. Thus, the value
of trend distribution function is computed and then the GPD is introduced to detect the
periodicity of the function. If the periodicity can be identified, the analysts can have a


                                                195
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

better understanding of the patterns and decide on the next appropriate action. After the
value of trend distribution function is determined the GPD is introduced to detect the
periodicity of the function.
        The trend distribution function values, the number of elements, the minimal
period which is the smallest possible value of the period and the maximum error which is
the error threshold used to judge if the investigated value is subject to the generalized
periodicity are given as input to the GPD algorithm. The output generated from this
algorithm will be the list of periods and empty if the periodicity does not exist. The first
step is to find the linear model y=ax+b of the given function f. If the error (xi) is less
than the maximum error threshold predefined by the user, then the value xi is the period
prospect. The list of periods and the trend distribution function values are given as the
input to the Trend Modeling (TM) algorithm.




5 TREND MODELING
        Once the existence of periodicity has been identified, the characteristic of the
sequence can be captured more precisely. Hence, a mathematical structure, which
symbolizes and describes the transactions occurring in the real world, is needed. The
results of the modeling process must map the relationships between relevant attributes in
the transaction databases to those relationships between relevant attributes of the function
obtained by computing rules. Broad movements evolve more gradually than other
movements which are evident. These gradual changes are called trends. The changes
which are transitory in nature are described as fluctuations.
        Since regression analysis is frequently used for fitting equation to data, regression
techniques are applied to construct Trend Modeling procedure. TM algorithm is straight
forward and begins with the establishment of simple linear regression model y=ax+b.
Combined with the result of GPD, the complete model characterizing the given input
distribution function is obtained. The polynomial gained by the TM process is an aid to
identify the nature of the sequence mined. To illustrate how Trend Modeling is used to
find the approximating model of a given input, two typical linearly increasing/decreasing
periodic functions are used as sample input.


                                                196
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

6 CYCLIC MODEL ANALYSIS
        The purpose of the establishment of the mathematical model is to help analysts
obtain a better understanding of the whole picture of what happened and predict what is
likely to happen. With GPD and TM it can be determined if customers tend to repeat
buying at regular period and an equation can be formulated to approximate the
distribution obtained from the sequence. In short, the mathematical model established by
GPD/TM is used to describe characteristics of the sequential pattern mined from
designated time frame. Next, CMA is proposed to analyze and describe the characteristics
of the sequential pattern mined from the transactional databases. The user must determine
the value of parameters minimum period, maximum error, degree and terminating
condition.
        The value of the terminating condition. is the terminating condition of the process.
If the length of the domain is too short, the process should be stopped since its
meaningless to investigate the characteristic of repeated patterns. Then the type of the
function is determined by finding the local maximum of the function. If the local
maximum exists at the end of the domain of the function, the function belongs to
ascending type.




                           Figure 6.1 The process flow of CMA
        The descending type can be determined if the local maximum exists at the
beginning of the domain. If the distribution function is ascending or descending type,


                                                197
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

then apply GPD/TM directly to get the polynomial approximating the patterns and find
the period of the distribution function. If the distribution is neither descending nor
ascending type, the whole time frame is partitioned into two sub frames and invokes
CMA recursively until the distribution function of the sub frame is simplified. If the
length of the inspected sub frame is smaller than the predefined terminating condition
trcd, the process will be stopped. CMA performs well in exploring the trends of repeat-
buying behaviors and provides practical model for predicting the customer buying
behavior.
        The process of applying CMA to real-world is displayed in Fig 6.1. Once the
mining process is completed, the sequential patterns to be investigated were selected. The
distribution function of the designated pattern is computed. Then, the CMA is applied to
capture the model of the mathematical model. CMA divides the domain into smaller
segments such that the distribution function defined on the segment is of simply
increasing or decreasing type.
        Then GPD and TM were invoked to obtain the polynomial approximating to the
pattern defined on the segment. The result of divide-and-conquer process will be
collected and synthesized. Then this synthesized knowledge will be used to help analyze
past and likely future behaviors. From the collected knowledge, marketers can predict the
customer buying behavior. The consistency of the repeat-buying behavior over time and
the item-based feature of the CMA algorithm suggested that a hybrid recommendation
system can be formed to provide better prediction. Thus, marketing professionals will
have a better tool with which they retain their customers. Consequently, the marketers
can take appropriate action to favorably impact customers’ behavior.
7 ENHANCEMENTS
        In the cyclic model analysis algorithm, when mining the frequent itemsets, so
many frequent itemsets will be pruned. Hence this is the disadvantage in this technique.
This can be improved by using fuzzy techniques. Multi-valued logic derived from fuzzy
set theory to deal with reasoning that is approximate rather than accurate is called fuzzy
logic. In "crisp logic", where binary sets have binary logic, fuzzy logic variables may
have a truth value that ranges between 0 and 1. Though fuzzy logic has been applied to
many fields, from control theory to artificial intelligence, it still remains controversial


                                                198
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

among most statisticians. Fuzzy logics take on a range of values High, medium and low
rather than always taking true or false values. This prevents in loss of itemsets using
pruning. Fuzzy values are not probabilities. Fuzzy logic and probability are different
ways of expressing uncertainty. But both fuzzy logic and probability theory can be used
to represent subjective belief, fuzzy set theory uses the concept of fuzzy set membership
(i.e., number of variable in a set), probability theory uses the concept of subjective
probability (i.e., how probable is a variable in a set). While this distinction is mostly
philosophical, the fuzzy-logic-derived possibility measure is inherently different from the
probability measure, hence they are not directly equivalent. Hence the fuzzy logics can be
used to improve the existing techniques.
8 CONCLUSION
        In this paper some of the existing sequential Pattern Mining techniques are
compared and analyzed. The cyclic model analysis on sequential pattern mining
technique is more advantageous than the other approaches since this the only method
which helps in revealing the cyclic property of the sequential patterns. CMA performs
well in describing the buying behavior of the customer. However, there are some
disadvantages in this technique. First, the periodicity detecting procedure can be
improved by applying fuzzy techniques and other statistics tools. Fuzzy logics are used to
prevent the loss of frequent item sets during the pruning phase thereby improves the
marketing strategies.
REFERENCES
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of
    Items in Large Databases,” Proc. ACM SIGMOD ’93, pp. 207-216, 1993.
[2] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc.1995 IEEE 11th Int’l
    Conf. Data Eng. (ICDE ’95), pp. 3-14, 1995.
[3] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and M. Hsu, “FreeSpan: Frequent
    Pattern- Projected Sequential Pattern Mining,” Proc.ACM SIGKDD Int’l Conf.
    Knowledge Discovery and Data, Data Mining (SIGKDD ’00), pp. 355-359, 2000.
[4] M. Lin and S. Lee, “Fast Discovery of Sequential Patterns by Memory Indexing,”
    Proc. Fourth Int’l Conf. Data Warehousing Knowledge Discovery (DaWak, 02), pp
    150-160, 2002.


                                                199
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME

[5] Yu Hirate and Hayato Yamana,” Generalized sequential Pattern Mining with Item
    Intervals”, Journal of Computers, Vol 1, No 3, pp.51-60, 2006.
[6] J.Han, G.Dong, Y.Yin, "Efficient Mining of Partial Periodic Patterns in Time Series
Database,”Proc.Int’l Conf. Data Eng. (ICDE ‘99) p.106-115, 1999.
[7] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu,
    “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern
    Growth,” Proc. 17th Int’l Conf. Data Eng. (ICDE ’01), pp. 215-224, 2001.
[8] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large
    Databases,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 487-499,
    1994.
[9] B. Ozden, S. Ramaswamy, and A. Silberschatz, “Cyclic Association Rules,” Proc.
    14th Int’l Conf. Data Eng. (ICDE ’98), pp. 412-421,1998.
[10] J. Yang, W. Wang, P.S. Yu, and J. Han, “Mining Long Sequential Patterns in a
    Noisy Environment,” Proc. ACM SIGMOD ’02, pp. 406-417, 2002.
[11] Ding-An Chiang, Cheng-Tzu Wang, Shao-Ping Chen, & Chun-Chi Chen , “The
    Cyclic Model Analysis on Sequential Patterns” IEEE Transactions on Knowledge and
    Data Engineering, Vol. 21 No.11, Nov 2009 Pa.No.1617-1627 .




                                                200

Weitere ähnliche Inhalte

Was ist angesagt?

A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...ijmvsc
 
4317mlaij02
4317mlaij024317mlaij02
4317mlaij02mlaij
 
IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...
IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...
IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...IRJET Journal
 
Trading outlier detection machine learning approach
Trading outlier detection  machine learning approachTrading outlier detection  machine learning approach
Trading outlier detection machine learning approachEditorIJAERD
 
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...IOSR Journals
 
Study of effectiveness of time series modeling (arima) in forecasting stock p...
Study of effectiveness of time series modeling (arima) in forecasting stock p...Study of effectiveness of time series modeling (arima) in forecasting stock p...
Study of effectiveness of time series modeling (arima) in forecasting stock p...IJCSEA Journal
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET Journal
 
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURIA Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURIidescitation
 
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexRegression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexIAEME Publication
 
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexRegression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexiaemedu
 
Stock market prediction technique:
Stock market prediction technique:Stock market prediction technique:
Stock market prediction technique:Paladion Networks
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET-  	  Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET-  	  Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET- Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET Journal
 
Performance analysis and prediction of stock market for investment decision u...
Performance analysis and prediction of stock market for investment decision u...Performance analysis and prediction of stock market for investment decision u...
Performance analysis and prediction of stock market for investment decision u...Hari KC
 
A Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationA Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationWaqas Tariq
 

Was ist angesagt? (20)

A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
 
4317mlaij02
4317mlaij024317mlaij02
4317mlaij02
 
IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...
IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...
IRJET- Stock Price Prediction using combination of LSTM Neural Networks, ARIM...
 
Ijmet 10 01_176
Ijmet 10 01_176Ijmet 10 01_176
Ijmet 10 01_176
 
Trading outlier detection machine learning approach
Trading outlier detection  machine learning approachTrading outlier detection  machine learning approach
Trading outlier detection machine learning approach
 
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
A Novel Algorithm for Mining Fuzzy High Utility Itemset from Fuzzy Transactio...
 
Study of effectiveness of time series modeling (arima) in forecasting stock p...
Study of effectiveness of time series modeling (arima) in forecasting stock p...Study of effectiveness of time series modeling (arima) in forecasting stock p...
Study of effectiveness of time series modeling (arima) in forecasting stock p...
 
Stock prediction system using ann
Stock prediction system using annStock prediction system using ann
Stock prediction system using ann
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURIA Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
 
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexRegression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock index
 
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexRegression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock index
 
Q04602106117
Q04602106117Q04602106117
Q04602106117
 
Stock market prediction technique:
Stock market prediction technique:Stock market prediction technique:
Stock market prediction technique:
 
Stock analysis report
Stock analysis reportStock analysis report
Stock analysis report
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET-  	  Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET-  	  Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
 
Performance analysis and prediction of stock market for investment decision u...
Performance analysis and prediction of stock market for investment decision u...Performance analysis and prediction of stock market for investment decision u...
Performance analysis and prediction of stock market for investment decision u...
 
A Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets GenerationA Performance Based Transposition algorithm for Frequent Itemsets Generation
A Performance Based Transposition algorithm for Frequent Itemsets Generation
 

Andere mochten auch

Optimization of cutting parameters in dry turning operation of mild steel
Optimization of cutting parameters in dry turning operation of mild steelOptimization of cutting parameters in dry turning operation of mild steel
Optimization of cutting parameters in dry turning operation of mild steeliaemedu
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingiaemedu
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationiaemedu
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challengesiaemedu
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanismiaemedu
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingiaemedu
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationiaemedu
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridiaemedu
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 

Andere mochten auch (9)

Optimization of cutting parameters in dry turning operation of mild steel
Optimization of cutting parameters in dry turning operation of mild steelOptimization of cutting parameters in dry turning operation of mild steel
Optimization of cutting parameters in dry turning operation of mild steel
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Revisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modificationRevisiting the experiment on detecting of replay and message modification
Revisiting the experiment on detecting of replay and message modification
 
Semantic web services and its challenges
Semantic web services and its challengesSemantic web services and its challenges
Semantic web services and its challenges
 
Website based patent information searching mechanism
Website based patent information searching mechanismWebsite based patent information searching mechanism
Website based patent information searching mechanism
 
Effect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routingEffect of scenario environment on the performance of mane ts routing
Effect of scenario environment on the performance of mane ts routing
 
Adaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow applicationAdaptive job scheduling with load balancing for workflow application
Adaptive job scheduling with load balancing for workflow application
 
Effective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using gridEffective broadcasting in mobile ad hoc networks using grid
Effective broadcasting in mobile ad hoc networks using grid
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 

Ähnlich wie Prediction of customer behavior using cma

A Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining AlgorithmsA Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining AlgorithmsSara Alvarez
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039ijsrd.com
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIRJET Journal
 
Adaptive and Fast Predictions by Minimal Itemsets Creation
Adaptive and Fast Predictions by Minimal Itemsets CreationAdaptive and Fast Predictions by Minimal Itemsets Creation
Adaptive and Fast Predictions by Minimal Itemsets CreationIJERA Editor
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Miningijsrd.com
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET Journal
 
Sequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap ShotSequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap ShotIOSR Journals
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESIRJET Journal
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESIRJET Journal
 
Turnover Prediction of Shares Using Data Mining Techniques : A Case Study
Turnover Prediction of Shares Using Data Mining Techniques : A Case Study Turnover Prediction of Shares Using Data Mining Techniques : A Case Study
Turnover Prediction of Shares Using Data Mining Techniques : A Case Study csandit
 
Fast Sequential Rule Mining
Fast Sequential Rule MiningFast Sequential Rule Mining
Fast Sequential Rule Miningijsrd.com
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesEditor IJMTER
 
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...ITIIIndustries
 
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...Editor IJCATR
 
Parametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmParametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmIAEME Publication
 
Stock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term MemoryStock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term MemoryIRJET Journal
 

Ähnlich wie Prediction of customer behavior using cma (20)

A Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining AlgorithmsA Brief Overview On Frequent Pattern Mining Algorithms
A Brief Overview On Frequent Pattern Mining Algorithms
 
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using Apriori
 
Adaptive and Fast Predictions by Minimal Itemsets Creation
Adaptive and Fast Predictions by Minimal Itemsets CreationAdaptive and Fast Predictions by Minimal Itemsets Creation
Adaptive and Fast Predictions by Minimal Itemsets Creation
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Mining
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
 
Sequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap ShotSequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap Shot
 
50120140503013
5012014050301350120140503013
50120140503013
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIES
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIES
 
50120130405016 2
50120130405016 250120130405016 2
50120130405016 2
 
50120140503005
5012014050300550120140503005
50120140503005
 
Turnover Prediction of Shares Using Data Mining Techniques : A Case Study
Turnover Prediction of Shares Using Data Mining Techniques : A Case Study Turnover Prediction of Shares Using Data Mining Techniques : A Case Study
Turnover Prediction of Shares Using Data Mining Techniques : A Case Study
 
Fast Sequential Rule Mining
Fast Sequential Rule MiningFast Sequential Rule Mining
Fast Sequential Rule Mining
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
 
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...
 
Parametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithmParametric comparison based on split criterion on classification algorithm
Parametric comparison based on split criterion on classification algorithm
 
Stock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term MemoryStock Market Prediction using Long Short-Term Memory
Stock Market Prediction using Long Short-Term Memory
 

Mehr von iaemedu

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech iniaemedu
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presenceiaemedu
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineeringiaemedu
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobilesiaemedu
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacementiaemedu
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approachiaemedu
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentiaemedu
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationiaemedu
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksiaemedu
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyiaemedu
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryiaemedu
 
A comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueA comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueiaemedu
 
A comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’sA comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’siaemedu
 
The detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networksThe detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networksiaemedu
 
Visual cryptography scheme for color images
Visual cryptography scheme for color imagesVisual cryptography scheme for color images
Visual cryptography scheme for color imagesiaemedu
 
Software process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various modelsSoftware process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various modelsiaemedu
 
Software metric analysis methods for product development
Software metric analysis methods for product developmentSoftware metric analysis methods for product development
Software metric analysis methods for product developmentiaemedu
 
Software process and product quality assurance in it organizations
Software process and product quality assurance in it organizationsSoftware process and product quality assurance in it organizations
Software process and product quality assurance in it organizationsiaemedu
 
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexRegression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexiaemedu
 
Shared bandwidth reservation of backup paths of multiple
Shared bandwidth reservation of backup paths of multipleShared bandwidth reservation of backup paths of multiple
Shared bandwidth reservation of backup paths of multipleiaemedu
 

Mehr von iaemedu (20)

Tech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech inTech transfer making it as a risk free approach in pharmaceutical and biotech in
Tech transfer making it as a risk free approach in pharmaceutical and biotech in
 
Performance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presencePerformance analysis of manet routing protocol in presence
Performance analysis of manet routing protocol in presence
 
Performance measurement of different requirements engineering
Performance measurement of different requirements engineeringPerformance measurement of different requirements engineering
Performance measurement of different requirements engineering
 
Mobile safety systems for automobiles
Mobile safety systems for automobilesMobile safety systems for automobiles
Mobile safety systems for automobiles
 
Efficient text compression using special character replacement
Efficient text compression using special character replacementEfficient text compression using special character replacement
Efficient text compression using special character replacement
 
Agile programming a new approach
Agile programming a new approachAgile programming a new approach
Agile programming a new approach
 
Adaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environmentAdaptive load balancing techniques in global scale grid environment
Adaptive load balancing techniques in global scale grid environment
 
A survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow applicationA survey on the performance of job scheduling in workflow application
A survey on the performance of job scheduling in workflow application
 
A survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networksA survey of mitigating routing misbehavior in mobile ad hoc networks
A survey of mitigating routing misbehavior in mobile ad hoc networks
 
A novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classifyA novel approach for satellite imagery storage by classify
A novel approach for satellite imagery storage by classify
 
A self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imageryA self recovery approach using halftone images for medical imagery
A self recovery approach using halftone images for medical imagery
 
A comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining techniqueA comprehensive study of non blocking joining technique
A comprehensive study of non blocking joining technique
 
A comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’sA comparative study on multicast routing using dijkstra’s
A comparative study on multicast routing using dijkstra’s
 
The detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networksThe detection of routing misbehavior in mobile ad hoc networks
The detection of routing misbehavior in mobile ad hoc networks
 
Visual cryptography scheme for color images
Visual cryptography scheme for color imagesVisual cryptography scheme for color images
Visual cryptography scheme for color images
 
Software process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various modelsSoftware process methodologies and a comparative study of various models
Software process methodologies and a comparative study of various models
 
Software metric analysis methods for product development
Software metric analysis methods for product developmentSoftware metric analysis methods for product development
Software metric analysis methods for product development
 
Software process and product quality assurance in it organizations
Software process and product quality assurance in it organizationsSoftware process and product quality assurance in it organizations
Software process and product quality assurance in it organizations
 
Regression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock indexRegression, theil’s and mlp forecasting models of stock index
Regression, theil’s and mlp forecasting models of stock index
 
Shared bandwidth reservation of backup paths of multiple
Shared bandwidth reservation of backup paths of multipleShared bandwidth reservation of backup paths of multiple
Shared bandwidth reservation of backup paths of multiple
 

Prediction of customer behavior using cma

  • 1. International Journal of Computer Engineering International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), and Technology (IJCET), 1, Number 2, Sep 6367(Print)© IAEME ISSN 0976 – 6375(Online) Volume ISSN 0976 – - Oct (2010), ISSN 0976 – 6375(Online) Volume 1 IJCET Number 2, Sep - Oct (2010), pp. 192-200 ©IAEME © IAEME, http://www.iaeme.com/ijcet.html PREDICTION OF CUSTOMER BEHAVIOR USING CMA F.Fenita Department of Computer Science and Engineering Karunya University, Coimbatore, India Email id: fenita87@gmail.com Sumitha C.H Department of Computer Science and Engineering Karunya University, Coimbatore, India Email id: sumithach@karunya.edu ABSTRACT: The process of extracting certain sequential pattern that exceeds the minimum support threshold is called Sequential Pattern Mining. It has been used to predict various aspects of customer buying behavior for a long time. The sequence that is discovered using sequential pattern mining algorithm reveals the chronological relation between the items and provides valuable information which will help in developing the marketing strategies. There are various approaches proposed for sequential pattern mining. But in the existing sequential pattern mining approaches, the cyclic behavior of the buying pattern and the interval between the two consecutive items is hardly known. The proposed algorithms can be implemented and supplied to test against teal time customer data from the consumer goods company. The experimental results illustrate how the model can be used to predict likely purchases within a certain time limit. Keywords: Apriori, FreeSpan, MEMSIP, PrefixSpan, SPIRIT, Trend Modeling, Trend distribution function. 1 INTRODUCTION A pattern is a type that predicts the manner in which the set of events occur. The process of extracting the information about such pattern is called as Pattern Mining. Sequential pattern is a sequence of item sets that occurs frequently in a specific order. All the customer transactions viewed together as a sequence is called as customer-sequence where each transaction is represented as an item sets in that sequence. Given a set of sequences, a complete set of Frequent subsequences can be mined using Sequential 192
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME Pattern Mining. The applications of Sequential Pattern Mining are customer shopping sequences, Telephone calling patterns, natural disasters like earth quakes etc. The process of knowledge discovery in database consists of three steps. The first one is called preprocessing which includes data cleaning, integration, selection and transformation. Secondly, data mining which is the main process where different algorithms are applied to produce the knowledge which are hidden. Thirdly, the post processing process which evaluates the mining results according to the user’s requirements. The knowledge can be presented only if the results are satisfactory. Different knowledge comes out as a result when various data mining techniques are applied to the data sources. Depending on the evaluation results obtained, mining can be redone or the user can modify the requirements. 2 SEQUENTIAL PATTERN MINING The process of finding out the inter-transaction pattern in a given set of sequence is called as sequential pattern mining. The sequential pattern mining algorithms are used in finding out the complete set of frequent sub- sequences in the given set of sequences. These algorithms are efficient in the case of sequential patterns which are too long. The discovery of sequential pattern in transaction data is Very important in many application domains especially in customer analyses where certain buying patterns such as follow-up purchases can be determined. Thus, discovered sequential patterns from the super markets are used to develop the marketing strategies. Sequential pattern is a sequence of item sets that occurs frequently in a specific order. All the customer transactions viewed together as a sequence is called as customer-sequence where each transaction is represented as an itemset in that sequence. The customer support a sequence s, if s is contained in the customer sequence, then the support of the sequence s is defined as the fraction of customers who support the sequence. (1) The problem of discovering the sequence was first introduced by Agrawal and Sirkant [2]. Many algorithms were developed later and successfully improved the efficiency of the task of mining sequential patterns. The most basic and earlier algorithms are based upon Apriori algorithm [1]. The core of the Apriori property is that any sub 193
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME pattern of a frequent pattern must be frequent. Based on this heuristic, a series of Apriori- like algorithms such as AprioriAll, AprioriSome, DynamicSome [8], and GSP [5] were developed. Later on, different kinds of algorithms are proposed by different researchers. For example, FreeSpan [3] and PrefixSpan [7] were developed by the data projection approach. Han et al. [6] proposed two algorithms for mining partial periodic pattern single period and multiple periods. Yanget al. [11] proposed distance- based pruning to find the periodic patterns, which may contain a disturbance of length up to a certain threshold. The mining of frequent partial periodic sequential patterns in a time series is to find possibly with some restriction or disturbance. Ozden et al. [9] proposed the sequential algorithm and interleaved algorithm to determine cyclic association rules. Associate rules capture interrelationships between various items. The approaches presented above primarily target the orders of items purchased, or cyclic patterns occurring within a particular time frame defined by users. However, by these works, the cyclic behavior and the interval of the items purchased is hardly known. Hence the Data mining skills and the fundamentals of statistic are combined to introduce an algorithm Cyclic Model Analysis (CMA) to find out the model of recurring purchasing. The modeling process starts with the discovery of sequential patterns from the transactional database. From the discovered sequence, the trend distribution function (TDF) is calculated. Then the existence of periodicity and the interval of successive events are identified by the Generalized Periodicity Detection (GPD) and Trend Modeling (TM) computed, which will be explained in more detail later. Next, the CMA algorithm is used to obtain the period and trends of quantities of purchasing. Consequently, marketing people can recommend the right products to the right customers at the right time. 3 TREND DISTRIBUTION FUNCTION The distribution function is a time series representation of the transactions associated with the sequence discovered. The traditional sequential formulation reveals the chronological order of purchase only. If the plot of the function is sketched, the movement along the curve shows the tendency of the purchase. For any given subset of the domain of the function, a simple linear regression is used to construct the regression 194
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME line of the function within the sub-domain. If the slope of the obtained straight line is a positive number, the purchase of the item increases in a certain rate. If the slope is a negative number, the sales volume of the item decreases after a certain point is reached. If f(x) is linearly decreasing distribution function and reaches zero within the domain, the point x is called the degeneration point of the function. The degeneration point signifies the fact that the customers tend to stop purchasing. Hence, the business owner must be alerted before the degeneration point is reached. Once the frequent itemsets are generated the trend distribution function is calculated. First the trend distribution function for one itemset that is itemset containing only one item is calculated. The entire database is divided into four tables. The first item from the gene rated frequent item set is taken and the count of occurrence of that particular item is checked in the tables. The difference in the count value is calculated and it is incremented by one. The distribution calculated can be of three types. If the obtained value of the distribution function is positive then it is called as ascending type of distribution. If the obtained value of the distribution function is negative then it is called as descending type of distribution. If the obtained distribution function is zero, then the trend is constant that occurs more frequently in real world scenarios. Similarly the trend distribution function is calculated for two itemsets and three itemsets and so on. Once the trend distribution is calculated, these values can be given as inputs for further procedures. 4 GENERALIZED PERIODICITY DETECTION The scheme presented in this paper takes a two-phase approach to cope with all periodicity-related problems, which occur in the analysis process of sequential pattern mined from transactions. It is well known that a host of algorithms have been developed for efficient mining of sequential patterns. To solve the periodicity problem, a mathematical model constructed to portray the sequential pattern mined from the database. The scheme comprises the sequential pattern mining technique and the algorithms presented in this section. Once the result of sequential pattern mining is given, the primary concern is to know where there are regularities that can be found. Thus, the value of trend distribution function is computed and then the GPD is introduced to detect the periodicity of the function. If the periodicity can be identified, the analysts can have a 195
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME better understanding of the patterns and decide on the next appropriate action. After the value of trend distribution function is determined the GPD is introduced to detect the periodicity of the function. The trend distribution function values, the number of elements, the minimal period which is the smallest possible value of the period and the maximum error which is the error threshold used to judge if the investigated value is subject to the generalized periodicity are given as input to the GPD algorithm. The output generated from this algorithm will be the list of periods and empty if the periodicity does not exist. The first step is to find the linear model y=ax+b of the given function f. If the error (xi) is less than the maximum error threshold predefined by the user, then the value xi is the period prospect. The list of periods and the trend distribution function values are given as the input to the Trend Modeling (TM) algorithm. 5 TREND MODELING Once the existence of periodicity has been identified, the characteristic of the sequence can be captured more precisely. Hence, a mathematical structure, which symbolizes and describes the transactions occurring in the real world, is needed. The results of the modeling process must map the relationships between relevant attributes in the transaction databases to those relationships between relevant attributes of the function obtained by computing rules. Broad movements evolve more gradually than other movements which are evident. These gradual changes are called trends. The changes which are transitory in nature are described as fluctuations. Since regression analysis is frequently used for fitting equation to data, regression techniques are applied to construct Trend Modeling procedure. TM algorithm is straight forward and begins with the establishment of simple linear regression model y=ax+b. Combined with the result of GPD, the complete model characterizing the given input distribution function is obtained. The polynomial gained by the TM process is an aid to identify the nature of the sequence mined. To illustrate how Trend Modeling is used to find the approximating model of a given input, two typical linearly increasing/decreasing periodic functions are used as sample input. 196
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME 6 CYCLIC MODEL ANALYSIS The purpose of the establishment of the mathematical model is to help analysts obtain a better understanding of the whole picture of what happened and predict what is likely to happen. With GPD and TM it can be determined if customers tend to repeat buying at regular period and an equation can be formulated to approximate the distribution obtained from the sequence. In short, the mathematical model established by GPD/TM is used to describe characteristics of the sequential pattern mined from designated time frame. Next, CMA is proposed to analyze and describe the characteristics of the sequential pattern mined from the transactional databases. The user must determine the value of parameters minimum period, maximum error, degree and terminating condition. The value of the terminating condition. is the terminating condition of the process. If the length of the domain is too short, the process should be stopped since its meaningless to investigate the characteristic of repeated patterns. Then the type of the function is determined by finding the local maximum of the function. If the local maximum exists at the end of the domain of the function, the function belongs to ascending type. Figure 6.1 The process flow of CMA The descending type can be determined if the local maximum exists at the beginning of the domain. If the distribution function is ascending or descending type, 197
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME then apply GPD/TM directly to get the polynomial approximating the patterns and find the period of the distribution function. If the distribution is neither descending nor ascending type, the whole time frame is partitioned into two sub frames and invokes CMA recursively until the distribution function of the sub frame is simplified. If the length of the inspected sub frame is smaller than the predefined terminating condition trcd, the process will be stopped. CMA performs well in exploring the trends of repeat- buying behaviors and provides practical model for predicting the customer buying behavior. The process of applying CMA to real-world is displayed in Fig 6.1. Once the mining process is completed, the sequential patterns to be investigated were selected. The distribution function of the designated pattern is computed. Then, the CMA is applied to capture the model of the mathematical model. CMA divides the domain into smaller segments such that the distribution function defined on the segment is of simply increasing or decreasing type. Then GPD and TM were invoked to obtain the polynomial approximating to the pattern defined on the segment. The result of divide-and-conquer process will be collected and synthesized. Then this synthesized knowledge will be used to help analyze past and likely future behaviors. From the collected knowledge, marketers can predict the customer buying behavior. The consistency of the repeat-buying behavior over time and the item-based feature of the CMA algorithm suggested that a hybrid recommendation system can be formed to provide better prediction. Thus, marketing professionals will have a better tool with which they retain their customers. Consequently, the marketers can take appropriate action to favorably impact customers’ behavior. 7 ENHANCEMENTS In the cyclic model analysis algorithm, when mining the frequent itemsets, so many frequent itemsets will be pruned. Hence this is the disadvantage in this technique. This can be improved by using fuzzy techniques. Multi-valued logic derived from fuzzy set theory to deal with reasoning that is approximate rather than accurate is called fuzzy logic. In "crisp logic", where binary sets have binary logic, fuzzy logic variables may have a truth value that ranges between 0 and 1. Though fuzzy logic has been applied to many fields, from control theory to artificial intelligence, it still remains controversial 198
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME among most statisticians. Fuzzy logics take on a range of values High, medium and low rather than always taking true or false values. This prevents in loss of itemsets using pruning. Fuzzy values are not probabilities. Fuzzy logic and probability are different ways of expressing uncertainty. But both fuzzy logic and probability theory can be used to represent subjective belief, fuzzy set theory uses the concept of fuzzy set membership (i.e., number of variable in a set), probability theory uses the concept of subjective probability (i.e., how probable is a variable in a set). While this distinction is mostly philosophical, the fuzzy-logic-derived possibility measure is inherently different from the probability measure, hence they are not directly equivalent. Hence the fuzzy logics can be used to improve the existing techniques. 8 CONCLUSION In this paper some of the existing sequential Pattern Mining techniques are compared and analyzed. The cyclic model analysis on sequential pattern mining technique is more advantageous than the other approaches since this the only method which helps in revealing the cyclic property of the sequential patterns. CMA performs well in describing the buying behavior of the customer. However, there are some disadvantages in this technique. First, the periodicity detecting procedure can be improved by applying fuzzy techniques and other statistics tools. Fuzzy logics are used to prevent the loss of frequent item sets during the pruning phase thereby improves the marketing strategies. REFERENCES [1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD ’93, pp. 207-216, 1993. [2] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc.1995 IEEE 11th Int’l Conf. Data Eng. (ICDE ’95), pp. 3-14, 1995. [3] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and M. Hsu, “FreeSpan: Frequent Pattern- Projected Sequential Pattern Mining,” Proc.ACM SIGKDD Int’l Conf. Knowledge Discovery and Data, Data Mining (SIGKDD ’00), pp. 355-359, 2000. [4] M. Lin and S. Lee, “Fast Discovery of Sequential Patterns by Memory Indexing,” Proc. Fourth Int’l Conf. Data Warehousing Knowledge Discovery (DaWak, 02), pp 150-160, 2002. 199
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 2, Sep - Oct (2010), © IAEME [5] Yu Hirate and Hayato Yamana,” Generalized sequential Pattern Mining with Item Intervals”, Journal of Computers, Vol 1, No 3, pp.51-60, 2006. [6] J.Han, G.Dong, Y.Yin, "Efficient Mining of Partial Periodic Patterns in Time Series Database,”Proc.Int’l Conf. Data Eng. (ICDE ‘99) p.106-115, 1999. [7] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proc. 17th Int’l Conf. Data Eng. (ICDE ’01), pp. 215-224, 2001. [8] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 487-499, 1994. [9] B. Ozden, S. Ramaswamy, and A. Silberschatz, “Cyclic Association Rules,” Proc. 14th Int’l Conf. Data Eng. (ICDE ’98), pp. 412-421,1998. [10] J. Yang, W. Wang, P.S. Yu, and J. Han, “Mining Long Sequential Patterns in a Noisy Environment,” Proc. ACM SIGMOD ’02, pp. 406-417, 2002. [11] Ding-An Chiang, Cheng-Tzu Wang, Shao-Ping Chen, & Chun-Chi Chen , “The Cyclic Model Analysis on Sequential Patterns” IEEE Transactions on Knowledge and Data Engineering, Vol. 21 No.11, Nov 2009 Pa.No.1617-1627 . 200