Weitere ähnliche Inhalte
Ähnlich wie 50120140503013
Ähnlich wie 50120140503013 (20)
Mehr von IAEME Publication
Mehr von IAEME Publication (20)
Kürzlich hochgeladen (20)
50120140503013
- 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME
115
FREQUENT NEGATIVE SEQUENTIAL PATTERNS –A SURVEY
Sujatha Kamepalli #1
, Dr. Raja Sekhara Rao Kurra*2
#1
Research Scholar, CSE Department, Krishna University
Machilipatnam, Andhra Pradesh, India
*2
Dean Administration, CSE Department, K.L. University
Guntur, Andhra Pradesh, India
ABSTRACT
Data mining is the extraction of hidden predictive information from large databases, is a
powerful new technology with great potential to help companies focus on the most important
information in their data warehouses. A frequent pattern is defined as a pattern, which can be a set of
items, either with or without an order, occurs together in a database frequent enough to satisfy a
certain minimum threshold. Sequential pattern mining is an important task in data mining. It provides
an effective way to get special patterns from sequence data. Different from traditional positive
sequential patterns, negative sequential patterns focus on negative relationship between items sets, in
which case, absent items are taken into consideration. This paper provides the analysis of different
algorithms used for negative sequential patterns.
Keywords: Data Mining, Frequent Pattern, Sequential Pattern Mining, Sequence Data,
Positive Sequential Patterns, Negative Sequential Patterns.
I. INTRODUCTION
Data mining is the extraction of hidden predictive information from large databases, is a
powerful new technology with great potential to help companies focus on the most important
information in their data warehouses. The Apriori-based algorithms find frequent item sets based
upon an iterative bottom-up approach to generate candidate item sets. Since the first proposal of
association rules mining by R. Agrawal [3, 4], Nowadays, with the rapid development of information
technology, especially the web service-based application, service-oriented architecture and cloud-
computing, continually expanding data are integrated to generate useful information. Many
techniques have been used for data mining. Association rules mining (ARM) is one of the most
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 5, Issue 3, March (2014), pp. 115-121
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME
116
useful techniques. The challenges associated with ARM, especially for parallel and distributed data
mining, include minimizing I/O, increasing processing speed and reducing communication cost [5].
A major concern in ARM today is to continue to improve algorithm performance.
A frequent pattern is defined as a pattern, which can be a set of items, either with or without
an order, occurs together in a database frequent enough to satisfy a certain minimum threshold
[2](Han, Cheng, Xin & Yan 2007). Frequent pattern mining is the key step to find interesting
patterns from databases, such as association rule mining, sequential patterns mining, etc, and is vital
in data mining tasks.
1.1Sequence and Sequence Dataset
A sequence is an ordered list of elements like < e1 e2 e3 : : : en >, where ei is an element,
and could be either one item or a set of items. The elements can be ordered by time, position or any
other standard. Each element could also contain one or more items with no order between them. The
length of a sequence is usually not fixed. Sequence data is an important type of data which is popular
in much scientific, medical, business service, bioinformatics, and some other applications. An
example of transactions data is shown in Table 1.1.
Table 1.1: A Transactional Data Table
In the data, customer 002, he/she has three transactions. If all of his/her transactions were
ordered by the transaction time, they can be built into a sequence as < (30; 31; 32) 28 (22; 32) >.
Another example comes from Bioinformatics. Following is a gene sequence which is ordered by
position [1].
ACTGCTGCCAATC
1.2 Sequential pattern mining
Sequential pattern mining is an important task in data mining. It provides an effective way to
get special patterns from sequence data. Sequential pattern considers the order of item sets, but
association rule doesn’t take that into account. For example, given a sequence, such as buying a
desktop first, then an laptop, and then a router, if it occurs frequently in customers’ shopping history
with this special order, it is a (frequent) sequential pattern. When a frequent pattern only contains
item sets without any order, it becomes a classical association rule problem; for example, the same
customer buys desktop, laptop and router without considering their orders. Finding sequential pattern
has been widely recognized as a hot area in data mining and machine learning. It has been proven to
be very useful or even essential while handling critical business problems, such as customer behavior
analysis, event detection and bioinformatics. For example, it is widely employed in DNA, protein,
and medicine identification, where it helps scientists to find out identical and different structures and
functions of molecular or DNA sequences [1].
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME
117
1.3Positive sequential patterns and negative sequential patterns
Different from traditional positive sequential patterns, negative sequential patterns focus on
negative relationship between items sets, in which case, absent items are taken into consideration.
We give a simple example to illustrate the differences: Suppose p1=<a b c d f> is a positive
sequential pattern; p2=<a b ¬c e f> is a negative sequential pattern; and each item a, b, c, etc, stands
for a medical item code in the customer claim database of a private health care insurance company.
By getting pattern p1, we can tell that an insurant usually claimed for a, b, c, d and f in a row; but
with pattern p2, we are also able to find that given an insurant claim for medical items a and b, and
the customer does NOT claim c, he/she would claim item e instead of d later. A number of methods
have been proposed to discover sequential patterns. Most of conventional methods for sequential
pattern mining were developed to discover positive sequential patterns from database [6, 7, 8, 9, 10,
and 11]. Positive sequential patterns mining consider only the occurrences of item sets in sequences.
In practice, however, the absences of item sets in sequences may imply valuable information. For
example, web pages A, B, C, and D are accessed frequently by users, but D is seldom accessed after
the sequence A, B and C. The web page access sequence can be denoted as < A, B, C ¬D >, and
called a negative sequence. Such sequence could give us some valuable information to improve the
company’s website structure. For example, a new link between C and D could improve users’
convenience to access web page D from C [12].
1.4 Applications of sequential pattern mining
• Customer shopping sequences: First buy computer, then CD-ROM, and then digital camera,
within 3 months.
• Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and
markets, etc.
• Telephone calling patterns, Weblog click streams.
• DNA sequences and gene structures [16].
II. SURVEY ON NEGATIVE SEQUENTIAL PATTERNS
1.Nancy P. Lin, Hung-Jen Chen, Wei-Hua Hao, Hao-En chueh, Chung-I Chang in” Mining Strong
Positive and Negative Sequential Patterns” proposed a method for mining strong positive and
negative sequential patterns, called PNSPM. In this method, absences of item sets in sequences are
also considered [12].
2. K.M.V.Madan Kumar, P.V.S.Srinivas and C.Raghavendra Rao in” Sequential Pattern Mining
With Multiple Minimum Supports in Progressive Databases” proposed a new approach which can be
applied on any algorithm independent of that whether the particular algorithm may or may not use
the process of generating the candidate sets for identifying the frequent item sets. The proposed
algorithm will use the concept of “percentage of participation” instead of occurrence frequency for
every possible combination of items or item sets. The concept of percentage of participation will be
calculated based on the minimum support threshold for each item set [13].
3. Zhigang Zheng Yanchang Zhao Ziye Zuo Longbing Cao in” Negative-GSP: An Efficient Method
for Mining Negative Sequential Patterns” proposes a new method for mining negative sequential
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME
118
patterns, called Negative-GSP. Negative-GSP can find negative sequential patterns effectively and
efficiently [14]. They also designed effective pruning method to reduce the number of candidates.
4. Nancy P. Lin, Wei-Hua Hao, Hung-Jen Chen, Chung-I Chang, Hao-En Chuehin in “An Algorithm
for Mining Strong Negative Fuzzy Sequential Patterns” proposed a method for mining negative
fuzzy sequential patterns, called NFSPM. In this method, the absences of fuzzy item sets are also
considered. Besides, only sequences with high degree of interestingness can be selected as negative
fuzzy sequential patterns [15].
5. Xiangjun Dong Zhigang Zheng,Longbing Cao Yanchang Zhao in” e-NSP: Efficient Negative
Sequential Pattern Mining Based on Identified Positive Patterns Without Database Rescanning”
propose an efficient algorithm for mining NSP, called e-NSP, which mines for NSP by only
involving the identified PSP, without re-scanning databases. First, negative containment is defined to
determine whether or not a data sequence contains a negative sequence. Second, an efficient
approach is proposed to convert the negative containment problem to a positive containment
problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a
simple but efficient approach is proposed to generate NSC [17].
6. Vedant Rastogi Vinay Kumar Khare in” Apriori Based: Mining Positive and Negative Frequent
Sequential Patterns “proposed an algorithm for mining exception rules [18].
7. Yanchang Zhao, Huaifeng Zhang, Longbing Cao,Chengqi Zhang, and Hans Bohlscheid in”
Efficient Mining of Event-Oriented Negative Sequential Rules” This paper analyzes three types of
negative sequential rules and presents a new technique to find event-oriented negative sequential
rules[19].
8. Zhigang Zheng, Yanchang Zhao, Ziye Zuo, and Longbing Cao in” An Efficient GA-Based
Algorithm for Mining Negative Sequential Patterns” This paper proposes a Genetic Algorithm (GA)
based algorithm to find negative sequential patterns with novel crossover and mutation operations,
which are efficient at passing good genes on to next generations without generating candidates. An
effective dynamic fitness function and a pruning method are also provided to improve performance
[20].
9. Vinay Kumar Khare,Vedant Rastogi in” Mining Positive and Negative Sequential Pattern in
Incremental Transaction Databases” In this approach we can easily update existing transaction
database with the appended transaction database. The Merged transaction database (updated
database) will be mined to get the Positive & Negative Sequential patterns. Merging of Existing and
Appended database is performed by using the updated compact pattern tree approach. Proposed
model is Mining Positive and Negative Sequential patterns in incremental transaction Databases. To
mine Positive and Negative Sequential patterns in incremental transaction database in this Approach
we can update, existing transaction database with appended transaction database by the use of
Updated Compact pattern tree approach then according to their support the new updated transaction
database table is maintained and we can mine positive and negative sequential patterns with the help
of CPNFSP algorithms proposed by Weimin Quyang and Qinhua Huang [21][22].
10. Y. Li Y, A. Algarni, and N. Zhong in” Mining Positive and Negative Patterns for Relevance
Feature Discovery” Proposed An innovative approach to evaluate weights of terms according to both
their specificity and their distributions in the higher level features, where the higher level features
include both positive and negative patterns[23].
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME
119
S.NO. TITLE OF WORK AUTHORS PROPOSED WORK YEAR
1 Mining Strong
Positive and
Negative Sequential
Patterns
Nancy P. Lin, Wei-
Hua Hao, Hung-Jen
Chen,
Chung-I Chang, Hao-
En Chueh
Proposed a method for mining strong positive and
negative sequential patterns, called PNSPM.
2008
2 Sequential Pattern
Mining With
Multiple
Minimum Supports
in Progressive
Databases
K.M.V.Madan Kumar,
P.V.S.Srinivas and
C.Raghavendra Rao
Proposed a new approach which can be applied on
any algorithm independent of that whether the
particular algorithm may or may not use the
process of generating the candidate sets for
identifying the frequent item sets
2012
3 Negative-GSP: An
Efficient Method for
Mining Negative
Sequential Patterns
Zhigang Zheng
Yanchang Zhao Ziye
Zuo Longbing Cao
Proposes a new method for mining negative
sequential patterns, called Negative-GSP.
Negative-GSP can find negative sequential patterns
effectively and efficiently.
2009
4 An Algorithm for
Mining Strong
Negative Fuzzy
Sequential Patterns
Nancy P. Lin, Wei-
Hua Hao, Hung-Jen
Chen,
Chung-I Chang, Hao-
En Chueh
Proposed a method for mining negative fuzzy
sequential patterns, called NFSPM.
2007
5 e-NSP: Efficient
Negative Sequential
Pattern Mining
Based on Identified
Positive Patterns
Without Database
Rescanning
Xiangjun Dong
Zhigang
Zheng,Longbing Cao
Yanchang Zhao
Proposed an efficient algorithm for mining NSP,
called e-NSP, which mines for NSP by only
involving the identified PSP, without re-scanning
databases.
2011
6 Apriori Based:
Mining Positive and
Negative
Frequent Sequential
Patterns
Vedant Rastogi Vinay
Kumar Khare
Algorithm for mining exception rules. 2012
7 Efficient Mining of
Event-Oriented
Negative Sequential
Rules
Yanchang Zhao,
Huaifeng Zhang,
Longbing Cao,
Chengqi Zhang, and
Hans Bohlscheid
This paper analyzes three types of negative
sequential rules and presents a new technique to
find event-oriented negative sequential rules.
-----
8 An Efficient GA-
Based Algorithm for
Mining Negative
Sequential Patterns
Zhigang Zheng,
Yanchang Zhao, Ziye
Zuo, and Longbing
Cao
This paper proposes a Genetic Algorithm (GA)
based algorithm to find negative sequential
patterns with novel crossover and mutation
operations, which are efficient at passing good
genes on to next generations without generating
candidates.
2010
9 Mining Positive and
Negative Sequential
Pattern in
Incremental
Transaction
Databases
Vinay Kumar
Khare,Vedant Rastogi
In this approach we can easily update existing
transaction database with the appended transaction
database. The Merged transaction database
(updated database) will be mined to get the
Positive & Negative Sequential patterns. Merging
of Existing and Appended database is performed
by using the updated compact pattern tree
approach.
2013
10 Mining Positive and
Negative
Patterns for
Relevance Feature
Discovery
Y. Li Y, A. Algarni,
and N. Zhong
Proposed An innovative approach to evaluate
weights of terms according to both their specificity
and their distributions in the higher level features
where the higher level features include both
positive and negative patterns.
2010
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME
120
III. CONCLUSION
This paper provides the definitions for sequential data, sequential pattern mining, positive
sequential patterns and negative sequential patterns. It also explains about the importance of negative
sequential patterns and also the applications sequential patterns. This paper acts as a base for the
researchers who want to do research on negative sequential patterns.
REFERENCES
[1]. Zhigang Zheng, “Negative Sequential Pattern Mining”, A thesis submitted in partial fulfillment
of the requirements for the degree of Doctor of Philosophy, January 2012.
[2]. Han, J., Cheng, H., Xin, D. & Yan, X. (2007), ‘frequent pattern mining: current status and future
directions’, Data Mining and Knowledge Discovery 15, 55–86.
[3]. http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm.
[4]. http://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data.htm.
[5]. Agrawal R,Srikant R. “Mining sequential patterns”, In the Proc.1995 IntConf. On Data
Engineering, Taibei, Taiwan, March1995.
[6]. R. Agrawal and R. Srikant, Mining Sequential Patterns, Proceedings of the Elventh International
Conference on Data Engineering, Taipei, Taiwan, March, 1995, pp. 3-14.
[7]. M. J. Zaki, Efficient Enumeration of Frequent Sequences, Proceedings of the Seventh CIKM,
1998.
[8]. J. Ayres, J. E. Gehrke, T. Yiu, and J. Flannick, Sequential Pattern Mining Using Bitmaps,
Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining. Edmonton, Alberta,Canada, July 2002.
[9]. X. Yan, J. Han, and R. Afshar, CloSpan: Mining Closed Sequential Patterns in Large Datasets,
Proceedings of 2003 SIAM International Conference Data Mining (SDM’03), 2003,
pp. 166-177.
[10]. M. Zaki, SPADE: An Efficient Algorithm for Mining Frequent sequences, Machine Learning,
vol. 40, 2001, pp. 31-60.
[11]. M. Zaki, Efficient Enumeration of Frequent Sequences, Proceedings of the Seventh International
Conference Information and Knowledge Management (CIKM’98), 1998, pp. 68-75.
[12]. NANCY P. LIN, HUNG-JEN CHEN, WEI-HUA HAO, HAO-EN CHUEH, CHUNG-I
CHANG, “Mining Strong Positive and Negative Sequential Patterns” WSEAS
TRANSACTIONS on COMPUTERS, Issue 3, Volume 7, March 2008, ISSN: 1109-2750.
[13]. K.M.V.Madan Kumar1, P.V.S.Srinivas2 and C.Raghavendra Rao3, “Sequential Pattern Mining
With Multiple Minimum Supports in Progressive Databases” International Journal of Database
Management Systems ( IJDMS ) Vol.4, No.4, August 2012.
[14]. Zhigang Zheng Yanchang Zhao Ziye Zuo Longbing Cao, “Negative-GSP: An Efficient Method
for Mining Negative Sequential Patterns”, Australian Computer Society, Inc. (AusDM 2009),
Melbourne, Australia. Conferences in Research and Practice in Information Technology
(CRPIT), Vol. 101,
[15]. Nancy P. Lin, Wei-Hua Hao, Hung-Jen Chen, Chung-I Chang, Hao-En Chueh, “An Algorithm
for Mining Strong Negative Fuzzy Sequential Patterns”, INTERNATIONAL JOURNAL OF
COMPUTERS Issue 3, Volume 1, 2007.
[16]. Sequential Pattern Mining ppt.
www.is.informatik.uni-duisburg.de/.../im.../MiningSequentialPatterns.ppt.
[17]. Xiangjun Dong Zhigang Zheng,Longbing Cao Yanchang Zhao, “e-NSP: Efficient Negative
Sequential Pattern Mining Based on Identified Positive Patterns Without Database Rescanning
“CIKM’11, October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM 978-1-4503-
0717-8/11/10.
- 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 115-121 © IAEME
121
[18]. Vedant Rastogi Vinay Kumar Khare, “Apriori Based: Mining Positive and Negative Frequent
Sequential Patterns” International Journal of Latest Trends in Engineering and Technology
(IJLTET), Vol. 1 Issue 3 September 2012, ISSN: 2278-621X.
[19]. Yanchang Zhao, Huaifeng Zhang, Longbing Cao, Chengqi Zhang, and Hans Bohlscheid,
“Efficient Mining of Event-Oriented Negative Sequential Rules”.
[20]. Zhigang Zheng, Yanchang Zhao, Ziye Zuo, and Longbing Cao, “An Efficient GA-Based
Algorithm for Mining Negative Sequential Patterns” M.J. Zaki et al. (Eds.): PAKDD 2010, Part
I, LNAI 6118, pp. 262–273, 2010. _c Springer-Verlag Berlin Heidelberg 2010.
[21]. Vinay Kumar Khare,Vedant Rastogi, “Mining Positive and Negative Sequential Pattern in
Incremental Transaction Databases” International Journal of Computer Applications
(0975 – 8887) Volume 71– No.1, June 2013.
[22]. Weimin Ouyang, Qinhua Huang, “Mining Positive and Negative Sequential Patterns with
Multiple Minimum Supports in Large Transaction Databases”, IEEE Second WRI Global
Congress on Intelligent Systems 2010.
[23]. Y, A. Algarni, and N. Zhong, “Mining Positive and Negative Patterns for Relevance Feature
Discovery” 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
Washington, DC (KDD 2010), pp. 753-762.
[24]. A. K. Payra and S. Saha, “Generic Approach of Pattern Matching of Amino Acid Sequences
using Matching Policy & Pattern Policy”, International Journal of Computer Engineering &
Technology (IJCET), Volume 5, Issue 2, 2014, pp. 130 - 139, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
AUTHOR’S DETAIL
K. Sujatha is pursuing her Ph.D. in Krishna University, Machilipatnam, A.P. She
is interested doing research in data mining. She has three international journal
publications in data mining. She has two national journal publications. She has a
total of 10 years experience in teaching. She is working as associate professor in
CSE Department, Malineni Lakshmaiah Engineering College, Singaraya konda,
Prakasam District. A.P.
Prof. K. Rajasekhara Rao is a Professor of Computer Science & Engineering at
K.L.University and presently holding several key positions in K.L.University, as
Dean (Administration) & Principal, K L College of Engineering (Autonomous).
Having more than 26 years of teaching and research experience, Prof. Rao is
actively engaged in the research related to Embedded Systems, Software
Engineering and Knowledge Management. He had obtained Ph.D in Computer
Science & Engineering from Acharya Nagarjuna University (ANU), Guntur,
Andhra Pradesh and produced 58 publications in various International/National
Journals and Conferences. Prof.KRR was awarded with “Patron Award” for his outstanding
contribution, by India’s prestigious professional society Computer Society of India (CSI) for the
year 2011 in Ahmedabad. He has been adjudged as best teacher and has been honored with “Best
Teacher Award”, seven times.
Dr. Rajasekhar is a Fellow of IETE, Life Member’s of IE, ISTE, ISCA & CSI
(Computer Society of India). Dr.Rajasekhar is nominated as sectional committee member for
Engineering Sciences of 100th
Annual Convention of Indian Science Congress Association. He has
been the past Chairman of the Koneru Chapter of CSI.