SlideShare a Scribd company logo
1 of 5
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 14, Issue 4 (Sep. - Oct. 2013), PP 01-05
www.iosrjournals.org
www.iosrjournals.org 1 | Page
An improved Item-based Maxcover Algorithm to protect
Sensitive Patterns in Large Databases
Mrs. P.Cynthia Selvi1
, Dr.A.R.Mohamed Shanavas2
1. Associate Professor, Dept. of Computer Science, KNGA College(W), Thanjavur 613007 /Affiliated to
Bharathidasan University, Tiruchirapalli, TamilNadu, India.
2. Associate Professor, Dept. of Comuter Science, Jamal Mohamed College, Tiruchirapalli 620 020/ Affiliated
to Bharathidasan University, Tiruchirapalli, TamilNadu, India.
Abstract: Privacy Preserving Data Mining(PPDM) is a rising field of research in Data Mining and various
approaches are being introduced by the researchers. One of the approaches is a sanitization process, that
transforms the source database into a modified one by removing selective items so that the counterparts or
adversaries cannot extract the hidden patterns from. This study address this concept and proposes a revised
Item-based Maxcover Algorithm(IMA) which is aimed at less information loss in the large databases with
minimal removal of items.
Keywords: Privacy Preserving Data Mining, Restrictive Patterns, Sensitive Transactions, Maxcover, Sanitized
database.
I. Introduction
PPDM is a novel research direction in DM with the main objective to develop algorithms for
modifying the original data in some way, so that the private data and private knowledge remain private even
after the mining process[1]. This modification is done by deleting one or more items from source database or
even adding noise to the data by turning some items from 0 to 1 in some transactions. The procedure of
transforming the source database into a new database that hides some sensitive patterns or rules is called the
sanitization process[2] and the released database is called the sanitized database. The approach of modifying
some data is perfectly acceptable in some real applications[3, 4] provided the impact on the source database is
kept minimal.
This study mainly focus on the task of minimizing the impact on the source database by reducing the
number of removed items from the source database with only one scan. Section-2 briefly summarizes the
previous work done by various researchers; In Section-3 preliminaries and basic definitions are discussed. In
Section-4 the proposed algorithm is presented and Section-5 shows the experimental results.
II. Related work
The idea behind data sanitization to reduce the support values of restrictive patterns was first
introduced by Atallah et.al[1] and they have proved that the optimal sanitization process is NP-hard problem. In
[4], the authors generalized the problem and proposed algorithms that ensure privacy preservation; but they
require multiple scans over a transactional database. In the same direction, Saygin [5] introduced a method for
selectively removing individual values from a database and proposed some algorithms to obscure a given set of
sensitive rules by replacing known values with unknowns which also require various scans to sanitize a database
depending on the number of association rules to be hidden.
Oliveira introduced algorithms, IGA[6] & SWA[7] that aims at multiple rule hiding in which IGA has
low misses cost; It groups restrictive patterns and assigns a victim item to each group but the clustering is not
performed optimally and it can be improved further by reducing the number of deleted items. Whereas, SWA
improves the balance between protection of sensitive knowledge and pattern discovery but some rules are
removed inadvertently.
In [8] heuristic-based approach is proposed which perform the sanitization process with minimum
number of removal of restricted items. However, when the sensitive patterns to be hidden are mutually
exclusive, this approach has more hiding failure. Hence, in this work the algorithm proposed in [8] is revised
and it is tested with large database which ensure no hiding failure and very low information loss.
III. Preliminaries & Definitions
Transactional Database: A transactional database is a relation consisting of transactions in which each
transaction t is characterized by an ordered pair, defined as t = ˂Tid, list-of-elements˃, where Tid is a unique
transaction identifier number and list-of-elements represents a list of items making up the transactions.
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases
www.iosrjournals.org 2 | Page
Basics of Association Rules: One of the most studied problems in data mining is the process of discovering
association rules from large databases. Most of the existing algorithms for association rules rely on the support-
confidence framework introduced in [9].
Formally, association rules are defined as follows: Let I = {i1,...,in} be a set of literals, called items. Let D
be a database of transactions, where each transaction t is an itemset such that 𝑡 ⊆ 𝐼. A transaction t supports X, a
set of items in I, if 𝑋 ⊂ 𝑡. An association rule is an implication of the form 𝑋 ⟹ Y, where 𝑋 ⊂ 𝐼, 𝑌 ⊂ 𝐼 and
𝑋 ∩ 𝑌 = 𝜙. Thus, we say that a rule 𝑋 ⟹ Y holds in the database D with support (𝜎) if
|X∪Y|
N
≥ 𝜎, where N is
the number of transactions in D. Similarly, we say that a rule 𝑋 ⟹ Y holds in the database D with confidence
(𝜑) if
|X∪Y|
|X|
≥ 𝜑, where |𝐴| is the number of occurrences of the set of items A in the set of transactions D.
While the support is a measure of the frequency of a rule, the confidence is a measure of the strength of the
relation between sets of items. Association rule mining algorithms rely on the two attributes, minimum
Support(minSup 𝜎) and minimum Confidence(minConf 𝜑).
Frequent Pattern: A pattern X is called a frequent pattern if Sup(X) ≥ minSup or if the absolute support of X
satisfies the corresponding minimum support count threshold.
Privacy Preservation in Frequent Patterns: The task of privacy preserving data processing deals with
removing/modifying sensitive entries in the data which are basically decided by the user, who may either be the
owner or the contributor of the data.
Definitions :
Definition 1: Let D be a source database, containing a set of all transactions. T denotes a set of transactions,
each transaction containing itemset 𝑋 ∈ 𝐷. In addition, each k-itemset 𝑋 ⊆ 𝐼 has an associated set of
transactions 𝑇 ⊆ 𝐷, where 𝑋 ⊆ 𝑡 and 𝑡 ∈ 𝑇.
Definition 2: Restrictive Patterns : Let D be a source database, P be a set of all frequent patterns that can be
mined from D, and RulesH be a set of decision support rules that need to be hidden according to some security
policies. A set of patterns, denoted by RP is said to be restrictive, if RP ⊂ P and if and only if RP would derive
the set RulesH. RP is the set of non-restrictive patterns such that RP RP = P.
Definition 3 : Sensitive Transactions : Let T be a set of all transactions in a source database D, and RP be a set
of restrictive patterns mined from D. A set of transactions is said to be sensitive, denoted by ST, if every t ST
contain atleast one restrictive pattern, ie ST ={ 𝑡 ∈ T | X ∈ RP, X ⊆ t }. Moreover, if ST T then all restrictive
patterns can be mined one and only from ST.
Definition 4 : (i) Transaction Size : The number of items which make up a transaction is the size of the
transaction.
(ii) Transaction Degree : Let D be a source database and ST be a set of all sensitive transactions in D. The
degree of a sensitive transaction t, denoted as deg(t), such that t ST is defined as the number of restrictive
patterns that t contains.
Definition 5: Cover : The Cover[8] of an item Ak can be defined as, CAk = { rpi | Ak ∈ rpi ⊂ RP, 1 i |RP|}
i.e., set of all restrictive patterns(rpi) which contain Ak. The item that is included in a maximum number of rpi is
the one with maximal cover or maxCover; i.e., maxCover = max( |CA1|, |CA2| , … |CAn| ) such that Ak ∈ rpi ⊂ RP.
IV. Sanitization Algorithm
Given the source database (D), and the restrictive patterns(RP), the goal of the sanitization process is to
protect RP against the mining techniques used to disclose them. The sanitization process decreases the support
values of restrictive patterns by removing items from sensitive transactions using some heuristics. The heuristics
used in this work is given below:
Heuristic : Sensitive Transactions(ST) in source database(D) are identified and sorted in decreasing order of
(deg + size), which enable multiple patterns to be sanitized in a single iteration. Then the victim item is the one
which is selected based on maxcover (definition-5) value of the items in the restrictive patterns.
Algorithm
Input : (i) D – Source Database (ii) RP – Set of all Restrictive Patterns
Output : D’ – Sanitized Database
Item-based Maxcover Algorithm(IMA): // based on Heuristics 1 & 2 //
Step 1 : (i) find supCount(Ak) ∀ Ak rpi RP
(ii) find supCount(rpi), ∀ rpi RP and sort in decreasing order ;
Step 2 : (i) find Sensitive Transactions(ST) w.r.t. RP ;
(ii) obtain deg(t), size(t) ∀ t ST ;
(iii) sort t ST in decreasing order of deg & size ;
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases
www.iosrjournals.org 3 | Page
Step 3 : find ST D ST ; // ST - non sensitive transactions //
Step 4 : // Find ST’ //
find cover for every item Ak ∈ RP and sort in decreasing order of cover;
for each item Ak ∈ RP do
{
find T = 𝑡
|𝑟𝑝 𝑖 −𝑙𝑖𝑠𝑡 |
𝑖=1
for each t ∈ T do
{
delete item Ak in non VictimTransactions such that Ak rpi rpi-list ; // Ak – victimItem //
// initially all t are nonvictim //
decrease supCount of rpi’s for which t is nonVictim;
mark t as victimTransaction in each t-list( rpi ) rpi _list(Ak ) ;
}
}
for each rpi ∈ RP do
{
if (supCount < > 0)
{
for every t ∈ nonVictimTransactions do
{
delete item Ak with minimum supCount (round robin in case of tie);
decrease supCount of every rpi ; // ie, Ak rpi t //
}
}
}
Step 5 : D’ ← ST ∪ ST’
V. Implementation
The test run was performed using AMD Turion II N550 Dual core processor with 2.6 GHz speed and
2GB RAM operating on 32 bit OS; The implementation of the proposed algorithm was done with windows 7 -
Netbeans 6.9.1 - SQL 2005 for real dataset T10I4D100K[10] with characteristics given in table-I. The results
are obtained by varying the number of rules and also size of the source database in terms of number of
transactions (refer graphs). The restrictive patterns are chosen under various categories like overlapping,
mutually exclusive, random, high support, low support with their support ranging between 0.6 and 5, confidence
between 32.5 and 85.7 and length between 2 and 6.
The frequent patterns were obtained using Matrix Apriori[11] which uses simpler data structures for
implementation. The proposed algorithm makes use of preprocessed lookup(hashed) tables which links the
restrictive items and rules with their associated transactions; so that the source database is scanned not more
than once. Our algorithm is evaluated based on the performance measures suggested in [6,7].
Performance Analysis
Hiding Failure(HF) : It is measured by the ratio of the number of restrictive patterns in the released sanitized
database(D’) to the ones in the given source database; HF =
|RP (D′)|
|RP(D)|
. In other words, if a hidden restrictive
pattern cannot be extracted from the released database D’ with an arbitrary minSup, then it has no hiding failure
occurrence. Our approach has 0% HF for all the strategies under which the rules to be hidden were chosen.
Misses Cost(MC) : This measure deals with the legitimate patterns(non restrictive patterns) that were
accidently missed. MC =
~RP D − |~RP (D′)|
~RP D
. Our approach has very minimum MC which ranges between 0% and
2.43%. It is also observed that MC gets reduced linearly when the size of database is increased.
Artifactual Pattern(AP) : AP occurs when D’ is released with some artificially generated patterns after
applying the privacy preservation approach and it is given by, AP =
𝑷′ − |𝑷 ∩ 𝑷′|
𝑷′ . As our approach does not
introduce any false drops, the AP is 0%.
Sanitization Rate(SR) : It is defined as the ratio of the number of selectively deleted items(victim items) to the
total support count of restrictive patterns(rpi) in the source database D and is given by, SR =
victim items
total supCount rpi
.
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases
www.iosrjournals.org 4 | Page
Here the SR is found to be in the range from 40.32% to 68.82%, which shows that the number of restrictive
items deleted from the source database is kept minimal.
Dissimilarity(dif) : The dissimilarity between the original(D) and sanitized(D’) databases is measured in terms
of their contents which can be measured by the formula, dif(D,D’)=
1
fD(i)n
i=1
× [fD i − fD′
i ]n
i=1 , where fx(i)
represents the ith
item in the dataset X. Our approach has very low percentage of dissimilarity that ranges
between 0.24% and 1.1% and this gets reduced when size of the source database is increased. Hence it is
observed that information loss is very low and so the utility of the data is well preserved.
CPU Time : The range of time requirement for this algorithm with the rules chosen under various categories is
between 1.66 and 17.22 sec. This time requirement can still be reduced when parallelism is adapted with high
speed processor. Moreover, time is not a significant criterion as sanitization is performed offline.
Graphs :
0
20
40
60
5 10 15 20 25
0 0 0 0 0
HF(%)
No. of rules to sanitize
0
20
40
60
2k 4k 6k 8k
0 0 0 0
HF(%)
No. of Transactions
0
20
40
60
5 10 15 20 25
0.25 0.37 0.31 0.8 1.14
MC(%)
No. of rules to sanitize
0
20
40
60
2k 4k 6k 8k
2.43 0.42 0.18 0.14
MC(%)
No. of Transactions
52.75
40.37 40.78 43.54 45.68
20
40
60
80
100
5 10 15 20 25
SR(%)
No. of rules to sanitize
67.35 68.1 66.82 68.82
20
40
60
80
100
2k 4k 6k 8k
SR(%)
No. of Transactions
0
20
40
60
5 10 15 20 25
0.32 0.44 0.72 0.96 1.1
Diff(%)
No. of rules to sanitize
0
20
40
60
2k 4k 6k 8k
0.32 0.27 0.25 0.24
Diff(%)
No. of Transactions
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases
www.iosrjournals.org 5 | Page
Table – I :Characteristics of Dataset (D & D’)
VI. Conclusion
The proposed Item-based Maxcover Algorithm(IMA) is based on the strategy to simultaneously decrease
the support count of maximum number of sensitive patterns, with possibly minimal removal of items that results
in reduced impact on the source database with no hiding failure. IMA has reduced sanitization rate possibly with
very low misses cost. Moreover this algorithm scans the original database only once. It is important to note that
the proposed algorithm is robust in the sense that reconstruction of the source database is not at all possible;
because the alterations to the original database are not saved anywhere and also encryption techniques are not
involved in this approach. The dissimilarity between the source and sanitized database is very minimum that
minimizes the information loss and preserves improved data utility. The execution time is also significantly low
which can still be reduced to a minimum by parallelism techniques adapted with high speed processors.
References
[1] Verykios,V.S, Bertino.E, Fovino.I.N, ProvenzaL.P, Saygin.Y and Theodoridis.Y, “State-of-the-art in Privacy Preservation Data
Mining”, New York,ACM SIGMOD Record, vol.33, no.2, pp.50-57,2004.
[2] Atallah.M, Bertino.E, Elmagarmid.A, Ibrahim.M and Verykios.V.S, “Disclosure Limitation of Sensitive Rules”, In Proc. of IEEE
Knowledge and Data Engineering Workshop, pages 45–52, Chicago, Illinois, November 1999.
[3] Clifton.C and Marks.D, “ Security and Privacy Implications of Data Mining”, In Workshop on Data Mining and Knowledge
Discovery, pages 15–19, Montreal, Canada, February 1996.
[4] Dasseni.E, Verykios.V.S, Elmagarmid.A.K and Bertino.E, “ Hiding Association Rules by Using Confidence and Support”, In Proc.
of the 4th Information Hiding Workshop, pages 369– 383, Pittsburg, PA, April 2001.
[5] Saygin.Y, Verykios.V.S, and Clifton.C, “Using Unknowns to Prevent Discovery of Association Rules”, SIGMOD Record,
30(4):45–54, December 2001.
[6] Oliveira.S.R.M and Zaiane.O.R, “Privacy preserving Frequent Itemset Mining”, in the Proc. of the IEEE ICDM Workshop on
Privacy, Security, and Data Mining, Pages 43-54, Maebashi City, Japan, December 2002.
[7] Oliveira.S.R.M and Zaiane.O.R, “An Efficient One-Scan Sanitization for Improving the Balance between Privacy and Knowledge
Discovery”, Technical Report TR 03-15, June 2003.
[8] Cynthia Selvi P, Mohamed Shanavas A.R, “An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases”, IOSR-
Journal on Computer Engineering, Volume 5, Issue 1(Sep-Oct, 2012), PP 06-11, DOI. 10.9790/0661-0510611.
[9] Agrawal R, Imielinski T, and Swami.A, “Mining association rules between sets of items in large databases”, Proceedings of 1993
ACM SIGMOD international conference on management of data, Washington, DC; 1993. p. 207-16.
[10] http://fimi.cs.helsinki.fi/data/
[11] Pavon.J, Viana.S, and Gomez.S, “Matrix Apriori: speeding up the search for frequent patterns,” Proc. 24th IASTED International
Conference on Databases and Applications, 2006, pp. 75-82.
1.66
6.32 7.49 9.91 12.54
0
20
40
60
5 10 15 20 25
cputime(secs)
No.of rules to sanitize
6.89 8.22
14.69 17.22
0
20
40
60
2k 4k 6k 8k
cputime(secs)
No. of Transactions
Dataset name
T10I4D100K
No. of
transactions
No. of distinct
Items
Min.
Length
Max.
Length
Dataset Size (MB)
Source (D) 1K – 8K 795 - 862 1 26 97KB – 650 KB
Sanitized (D’) 1K – 8K 795 - 862 1 26 95KB – 648KB

More Related Content

What's hot

Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...ijsrd.com
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Miningijsrd.com
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns associationDeepaR42
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1Krish_ver2
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039ijsrd.com
 
Frequent Item Set Mining - A Review
Frequent Item Set Mining - A ReviewFrequent Item Set Mining - A Review
Frequent Item Set Mining - A Reviewijsrd.com
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...IOSR Journals
 
Comparative study of frequent item set in data mining
Comparative study of frequent item set in data miningComparative study of frequent item set in data mining
Comparative study of frequent item set in data miningijpla
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363SHIVA REDDY
 
Efficient Utility Based Infrequent Weighted Item-Set Mining
Efficient Utility Based Infrequent Weighted Item-Set MiningEfficient Utility Based Infrequent Weighted Item-Set Mining
Efficient Utility Based Infrequent Weighted Item-Set MiningIJTET Journal
 
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
An improvised tree algorithm for association rule mining using transaction re...
An improvised tree algorithm for association rule mining using transaction re...An improvised tree algorithm for association rule mining using transaction re...
An improvised tree algorithm for association rule mining using transaction re...Editor IJCATR
 
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...IJECEIAES
 
An efficient algorithm for mining frequent inter transaction patterns
An efficient algorithm for mining frequent inter transaction patternsAn efficient algorithm for mining frequent inter transaction patterns
An efficient algorithm for mining frequent inter transaction patternsHLV
 

What's hot (20)

Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
 
Review Over Sequential Rule Mining
Review Over Sequential Rule MiningReview Over Sequential Rule Mining
Review Over Sequential Rule Mining
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
 
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039
 
Frequent Item Set Mining - A Review
Frequent Item Set Mining - A ReviewFrequent Item Set Mining - A Review
Frequent Item Set Mining - A Review
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
 
Comparative study of frequent item set in data mining
Comparative study of frequent item set in data miningComparative study of frequent item set in data mining
Comparative study of frequent item set in data mining
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363
 
Efficient Utility Based Infrequent Weighted Item-Set Mining
Efficient Utility Based Infrequent Weighted Item-Set MiningEfficient Utility Based Infrequent Weighted Item-Set Mining
Efficient Utility Based Infrequent Weighted Item-Set Mining
 
Ijcatr04051004
Ijcatr04051004Ijcatr04051004
Ijcatr04051004
 
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
An improvised tree algorithm for association rule mining using transaction re...
An improvised tree algorithm for association rule mining using transaction re...An improvised tree algorithm for association rule mining using transaction re...
An improvised tree algorithm for association rule mining using transaction re...
 
W26142147
W26142147W26142147
W26142147
 
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...
Postdiffset Algorithm in Rare Pattern: An Implementation via Benchmark Case S...
 
05
0505
05
 
An efficient algorithm for mining frequent inter transaction patterns
An efficient algorithm for mining frequent inter transaction patternsAn efficient algorithm for mining frequent inter transaction patterns
An efficient algorithm for mining frequent inter transaction patterns
 

Viewers also liked

The Design and Implementation of Intelligent Campus Security Tracking System
The Design and Implementation of Intelligent Campus Security Tracking SystemThe Design and Implementation of Intelligent Campus Security Tracking System
The Design and Implementation of Intelligent Campus Security Tracking SystemIOSR Journals
 
SQl Injection Protector for Authentication in Distributed Applications
SQl Injection Protector for Authentication in Distributed ApplicationsSQl Injection Protector for Authentication in Distributed Applications
SQl Injection Protector for Authentication in Distributed ApplicationsIOSR Journals
 
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud ComputingOptimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud ComputingIOSR Journals
 
Intensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIntensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIOSR Journals
 
Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.IOSR Journals
 
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...IOSR Journals
 
Effect of sowing date and crop spacing on growth, yield attributes and qualit...
Effect of sowing date and crop spacing on growth, yield attributes and qualit...Effect of sowing date and crop spacing on growth, yield attributes and qualit...
Effect of sowing date and crop spacing on growth, yield attributes and qualit...IOSR Journals
 
Effect of Shading on Photovoltaic Cell
Effect of Shading on Photovoltaic CellEffect of Shading on Photovoltaic Cell
Effect of Shading on Photovoltaic CellIOSR Journals
 
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationApproximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationIOSR Journals
 
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...IOSR Journals
 
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...IOSR Journals
 
Resource Allocation for Antivirus Cloud Appliances
Resource Allocation for Antivirus Cloud AppliancesResource Allocation for Antivirus Cloud Appliances
Resource Allocation for Antivirus Cloud AppliancesIOSR Journals
 
Risk Assessment for Identifying Intrusion in Manet
Risk Assessment for Identifying Intrusion in ManetRisk Assessment for Identifying Intrusion in Manet
Risk Assessment for Identifying Intrusion in ManetIOSR Journals
 
Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...
Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...
Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...IOSR Journals
 
Improvement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor NetworkImprovement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor NetworkIOSR Journals
 

Viewers also liked (20)

The Design and Implementation of Intelligent Campus Security Tracking System
The Design and Implementation of Intelligent Campus Security Tracking SystemThe Design and Implementation of Intelligent Campus Security Tracking System
The Design and Implementation of Intelligent Campus Security Tracking System
 
SQl Injection Protector for Authentication in Distributed Applications
SQl Injection Protector for Authentication in Distributed ApplicationsSQl Injection Protector for Authentication in Distributed Applications
SQl Injection Protector for Authentication in Distributed Applications
 
S180304116124
S180304116124S180304116124
S180304116124
 
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud ComputingOptimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
 
Intensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image SegmentationIntensity Non-uniformity Correction for Image Segmentation
Intensity Non-uniformity Correction for Image Segmentation
 
Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.Vegetables detection from the glossary shop for the blind.
Vegetables detection from the glossary shop for the blind.
 
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
Design and Performance of a Bidirectional Isolated Dc-Dc Converter for Renewa...
 
Effect of sowing date and crop spacing on growth, yield attributes and qualit...
Effect of sowing date and crop spacing on growth, yield attributes and qualit...Effect of sowing date and crop spacing on growth, yield attributes and qualit...
Effect of sowing date and crop spacing on growth, yield attributes and qualit...
 
Effect of Shading on Photovoltaic Cell
Effect of Shading on Photovoltaic CellEffect of Shading on Photovoltaic Cell
Effect of Shading on Photovoltaic Cell
 
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationApproximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
 
C013121320
C013121320C013121320
C013121320
 
C0161018
C0161018C0161018
C0161018
 
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
 
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
Economic Load Dispatch Optimization of Six Interconnected Generating Units Us...
 
Resource Allocation for Antivirus Cloud Appliances
Resource Allocation for Antivirus Cloud AppliancesResource Allocation for Antivirus Cloud Appliances
Resource Allocation for Antivirus Cloud Appliances
 
F012654552
F012654552F012654552
F012654552
 
I0426778
I0426778I0426778
I0426778
 
Risk Assessment for Identifying Intrusion in Manet
Risk Assessment for Identifying Intrusion in ManetRisk Assessment for Identifying Intrusion in Manet
Risk Assessment for Identifying Intrusion in Manet
 
Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...
Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...
Design, Modelling& Simulation of Double Sided Linear Segmented Switched Reluc...
 
Improvement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor NetworkImprovement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor Network
 

Similar to An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases

An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
 
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...IOSR Journals
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining IJDKP
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.pptSowmyaJyothi3
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.pptSowmyaJyothi3
 
Comparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streamsComparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streamsIJCI JOURNAL
 
An apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceAn apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceIJDKP
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
 
NEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUE
NEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUENEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUE
NEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUEcscpconf
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithmijceronline
 

Similar to An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases (20)

An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
 
K017167076
K017167076K017167076
K017167076
 
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
 
F0423038041
F0423038041F0423038041
F0423038041
 
Hiding slides
Hiding slidesHiding slides
Hiding slides
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
Comparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streamsComparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streams
 
A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASE...
A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASE...A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASE...
A FLEXIBLE APPROACH TO MINE HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASE...
 
An apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceAn apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distance
 
I43055257
I43055257I43055257
I43055257
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
NEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUE
NEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUENEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUE
NEW ALGORITHM FOR SENSITIVE RULE HIDING USING DATA DISTORTION TECHNIQUE
 
F04713641
F04713641F04713641
F04713641
 
F04713641
F04713641F04713641
F04713641
 
J0945761
J0945761J0945761
J0945761
 
An Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori AlgorithmAn Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori Algorithm
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithm
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 

Recently uploaded (20)

DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 

An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 14, Issue 4 (Sep. - Oct. 2013), PP 01-05 www.iosrjournals.org www.iosrjournals.org 1 | Page An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases Mrs. P.Cynthia Selvi1 , Dr.A.R.Mohamed Shanavas2 1. Associate Professor, Dept. of Computer Science, KNGA College(W), Thanjavur 613007 /Affiliated to Bharathidasan University, Tiruchirapalli, TamilNadu, India. 2. Associate Professor, Dept. of Comuter Science, Jamal Mohamed College, Tiruchirapalli 620 020/ Affiliated to Bharathidasan University, Tiruchirapalli, TamilNadu, India. Abstract: Privacy Preserving Data Mining(PPDM) is a rising field of research in Data Mining and various approaches are being introduced by the researchers. One of the approaches is a sanitization process, that transforms the source database into a modified one by removing selective items so that the counterparts or adversaries cannot extract the hidden patterns from. This study address this concept and proposes a revised Item-based Maxcover Algorithm(IMA) which is aimed at less information loss in the large databases with minimal removal of items. Keywords: Privacy Preserving Data Mining, Restrictive Patterns, Sensitive Transactions, Maxcover, Sanitized database. I. Introduction PPDM is a novel research direction in DM with the main objective to develop algorithms for modifying the original data in some way, so that the private data and private knowledge remain private even after the mining process[1]. This modification is done by deleting one or more items from source database or even adding noise to the data by turning some items from 0 to 1 in some transactions. The procedure of transforming the source database into a new database that hides some sensitive patterns or rules is called the sanitization process[2] and the released database is called the sanitized database. The approach of modifying some data is perfectly acceptable in some real applications[3, 4] provided the impact on the source database is kept minimal. This study mainly focus on the task of minimizing the impact on the source database by reducing the number of removed items from the source database with only one scan. Section-2 briefly summarizes the previous work done by various researchers; In Section-3 preliminaries and basic definitions are discussed. In Section-4 the proposed algorithm is presented and Section-5 shows the experimental results. II. Related work The idea behind data sanitization to reduce the support values of restrictive patterns was first introduced by Atallah et.al[1] and they have proved that the optimal sanitization process is NP-hard problem. In [4], the authors generalized the problem and proposed algorithms that ensure privacy preservation; but they require multiple scans over a transactional database. In the same direction, Saygin [5] introduced a method for selectively removing individual values from a database and proposed some algorithms to obscure a given set of sensitive rules by replacing known values with unknowns which also require various scans to sanitize a database depending on the number of association rules to be hidden. Oliveira introduced algorithms, IGA[6] & SWA[7] that aims at multiple rule hiding in which IGA has low misses cost; It groups restrictive patterns and assigns a victim item to each group but the clustering is not performed optimally and it can be improved further by reducing the number of deleted items. Whereas, SWA improves the balance between protection of sensitive knowledge and pattern discovery but some rules are removed inadvertently. In [8] heuristic-based approach is proposed which perform the sanitization process with minimum number of removal of restricted items. However, when the sensitive patterns to be hidden are mutually exclusive, this approach has more hiding failure. Hence, in this work the algorithm proposed in [8] is revised and it is tested with large database which ensure no hiding failure and very low information loss. III. Preliminaries & Definitions Transactional Database: A transactional database is a relation consisting of transactions in which each transaction t is characterized by an ordered pair, defined as t = ˂Tid, list-of-elements˃, where Tid is a unique transaction identifier number and list-of-elements represents a list of items making up the transactions.
  • 2. An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases www.iosrjournals.org 2 | Page Basics of Association Rules: One of the most studied problems in data mining is the process of discovering association rules from large databases. Most of the existing algorithms for association rules rely on the support- confidence framework introduced in [9]. Formally, association rules are defined as follows: Let I = {i1,...,in} be a set of literals, called items. Let D be a database of transactions, where each transaction t is an itemset such that 𝑡 ⊆ 𝐼. A transaction t supports X, a set of items in I, if 𝑋 ⊂ 𝑡. An association rule is an implication of the form 𝑋 ⟹ Y, where 𝑋 ⊂ 𝐼, 𝑌 ⊂ 𝐼 and 𝑋 ∩ 𝑌 = 𝜙. Thus, we say that a rule 𝑋 ⟹ Y holds in the database D with support (𝜎) if |X∪Y| N ≥ 𝜎, where N is the number of transactions in D. Similarly, we say that a rule 𝑋 ⟹ Y holds in the database D with confidence (𝜑) if |X∪Y| |X| ≥ 𝜑, where |𝐴| is the number of occurrences of the set of items A in the set of transactions D. While the support is a measure of the frequency of a rule, the confidence is a measure of the strength of the relation between sets of items. Association rule mining algorithms rely on the two attributes, minimum Support(minSup 𝜎) and minimum Confidence(minConf 𝜑). Frequent Pattern: A pattern X is called a frequent pattern if Sup(X) ≥ minSup or if the absolute support of X satisfies the corresponding minimum support count threshold. Privacy Preservation in Frequent Patterns: The task of privacy preserving data processing deals with removing/modifying sensitive entries in the data which are basically decided by the user, who may either be the owner or the contributor of the data. Definitions : Definition 1: Let D be a source database, containing a set of all transactions. T denotes a set of transactions, each transaction containing itemset 𝑋 ∈ 𝐷. In addition, each k-itemset 𝑋 ⊆ 𝐼 has an associated set of transactions 𝑇 ⊆ 𝐷, where 𝑋 ⊆ 𝑡 and 𝑡 ∈ 𝑇. Definition 2: Restrictive Patterns : Let D be a source database, P be a set of all frequent patterns that can be mined from D, and RulesH be a set of decision support rules that need to be hidden according to some security policies. A set of patterns, denoted by RP is said to be restrictive, if RP ⊂ P and if and only if RP would derive the set RulesH. RP is the set of non-restrictive patterns such that RP RP = P. Definition 3 : Sensitive Transactions : Let T be a set of all transactions in a source database D, and RP be a set of restrictive patterns mined from D. A set of transactions is said to be sensitive, denoted by ST, if every t ST contain atleast one restrictive pattern, ie ST ={ 𝑡 ∈ T | X ∈ RP, X ⊆ t }. Moreover, if ST T then all restrictive patterns can be mined one and only from ST. Definition 4 : (i) Transaction Size : The number of items which make up a transaction is the size of the transaction. (ii) Transaction Degree : Let D be a source database and ST be a set of all sensitive transactions in D. The degree of a sensitive transaction t, denoted as deg(t), such that t ST is defined as the number of restrictive patterns that t contains. Definition 5: Cover : The Cover[8] of an item Ak can be defined as, CAk = { rpi | Ak ∈ rpi ⊂ RP, 1 i |RP|} i.e., set of all restrictive patterns(rpi) which contain Ak. The item that is included in a maximum number of rpi is the one with maximal cover or maxCover; i.e., maxCover = max( |CA1|, |CA2| , … |CAn| ) such that Ak ∈ rpi ⊂ RP. IV. Sanitization Algorithm Given the source database (D), and the restrictive patterns(RP), the goal of the sanitization process is to protect RP against the mining techniques used to disclose them. The sanitization process decreases the support values of restrictive patterns by removing items from sensitive transactions using some heuristics. The heuristics used in this work is given below: Heuristic : Sensitive Transactions(ST) in source database(D) are identified and sorted in decreasing order of (deg + size), which enable multiple patterns to be sanitized in a single iteration. Then the victim item is the one which is selected based on maxcover (definition-5) value of the items in the restrictive patterns. Algorithm Input : (i) D – Source Database (ii) RP – Set of all Restrictive Patterns Output : D’ – Sanitized Database Item-based Maxcover Algorithm(IMA): // based on Heuristics 1 & 2 // Step 1 : (i) find supCount(Ak) ∀ Ak rpi RP (ii) find supCount(rpi), ∀ rpi RP and sort in decreasing order ; Step 2 : (i) find Sensitive Transactions(ST) w.r.t. RP ; (ii) obtain deg(t), size(t) ∀ t ST ; (iii) sort t ST in decreasing order of deg & size ;
  • 3. An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases www.iosrjournals.org 3 | Page Step 3 : find ST D ST ; // ST - non sensitive transactions // Step 4 : // Find ST’ // find cover for every item Ak ∈ RP and sort in decreasing order of cover; for each item Ak ∈ RP do { find T = 𝑡 |𝑟𝑝 𝑖 −𝑙𝑖𝑠𝑡 | 𝑖=1 for each t ∈ T do { delete item Ak in non VictimTransactions such that Ak rpi rpi-list ; // Ak – victimItem // // initially all t are nonvictim // decrease supCount of rpi’s for which t is nonVictim; mark t as victimTransaction in each t-list( rpi ) rpi _list(Ak ) ; } } for each rpi ∈ RP do { if (supCount < > 0) { for every t ∈ nonVictimTransactions do { delete item Ak with minimum supCount (round robin in case of tie); decrease supCount of every rpi ; // ie, Ak rpi t // } } } Step 5 : D’ ← ST ∪ ST’ V. Implementation The test run was performed using AMD Turion II N550 Dual core processor with 2.6 GHz speed and 2GB RAM operating on 32 bit OS; The implementation of the proposed algorithm was done with windows 7 - Netbeans 6.9.1 - SQL 2005 for real dataset T10I4D100K[10] with characteristics given in table-I. The results are obtained by varying the number of rules and also size of the source database in terms of number of transactions (refer graphs). The restrictive patterns are chosen under various categories like overlapping, mutually exclusive, random, high support, low support with their support ranging between 0.6 and 5, confidence between 32.5 and 85.7 and length between 2 and 6. The frequent patterns were obtained using Matrix Apriori[11] which uses simpler data structures for implementation. The proposed algorithm makes use of preprocessed lookup(hashed) tables which links the restrictive items and rules with their associated transactions; so that the source database is scanned not more than once. Our algorithm is evaluated based on the performance measures suggested in [6,7]. Performance Analysis Hiding Failure(HF) : It is measured by the ratio of the number of restrictive patterns in the released sanitized database(D’) to the ones in the given source database; HF = |RP (D′)| |RP(D)| . In other words, if a hidden restrictive pattern cannot be extracted from the released database D’ with an arbitrary minSup, then it has no hiding failure occurrence. Our approach has 0% HF for all the strategies under which the rules to be hidden were chosen. Misses Cost(MC) : This measure deals with the legitimate patterns(non restrictive patterns) that were accidently missed. MC = ~RP D − |~RP (D′)| ~RP D . Our approach has very minimum MC which ranges between 0% and 2.43%. It is also observed that MC gets reduced linearly when the size of database is increased. Artifactual Pattern(AP) : AP occurs when D’ is released with some artificially generated patterns after applying the privacy preservation approach and it is given by, AP = 𝑷′ − |𝑷 ∩ 𝑷′| 𝑷′ . As our approach does not introduce any false drops, the AP is 0%. Sanitization Rate(SR) : It is defined as the ratio of the number of selectively deleted items(victim items) to the total support count of restrictive patterns(rpi) in the source database D and is given by, SR = victim items total supCount rpi .
  • 4. An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases www.iosrjournals.org 4 | Page Here the SR is found to be in the range from 40.32% to 68.82%, which shows that the number of restrictive items deleted from the source database is kept minimal. Dissimilarity(dif) : The dissimilarity between the original(D) and sanitized(D’) databases is measured in terms of their contents which can be measured by the formula, dif(D,D’)= 1 fD(i)n i=1 × [fD i − fD′ i ]n i=1 , where fx(i) represents the ith item in the dataset X. Our approach has very low percentage of dissimilarity that ranges between 0.24% and 1.1% and this gets reduced when size of the source database is increased. Hence it is observed that information loss is very low and so the utility of the data is well preserved. CPU Time : The range of time requirement for this algorithm with the rules chosen under various categories is between 1.66 and 17.22 sec. This time requirement can still be reduced when parallelism is adapted with high speed processor. Moreover, time is not a significant criterion as sanitization is performed offline. Graphs : 0 20 40 60 5 10 15 20 25 0 0 0 0 0 HF(%) No. of rules to sanitize 0 20 40 60 2k 4k 6k 8k 0 0 0 0 HF(%) No. of Transactions 0 20 40 60 5 10 15 20 25 0.25 0.37 0.31 0.8 1.14 MC(%) No. of rules to sanitize 0 20 40 60 2k 4k 6k 8k 2.43 0.42 0.18 0.14 MC(%) No. of Transactions 52.75 40.37 40.78 43.54 45.68 20 40 60 80 100 5 10 15 20 25 SR(%) No. of rules to sanitize 67.35 68.1 66.82 68.82 20 40 60 80 100 2k 4k 6k 8k SR(%) No. of Transactions 0 20 40 60 5 10 15 20 25 0.32 0.44 0.72 0.96 1.1 Diff(%) No. of rules to sanitize 0 20 40 60 2k 4k 6k 8k 0.32 0.27 0.25 0.24 Diff(%) No. of Transactions
  • 5. An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in Large Databases www.iosrjournals.org 5 | Page Table – I :Characteristics of Dataset (D & D’) VI. Conclusion The proposed Item-based Maxcover Algorithm(IMA) is based on the strategy to simultaneously decrease the support count of maximum number of sensitive patterns, with possibly minimal removal of items that results in reduced impact on the source database with no hiding failure. IMA has reduced sanitization rate possibly with very low misses cost. Moreover this algorithm scans the original database only once. It is important to note that the proposed algorithm is robust in the sense that reconstruction of the source database is not at all possible; because the alterations to the original database are not saved anywhere and also encryption techniques are not involved in this approach. The dissimilarity between the source and sanitized database is very minimum that minimizes the information loss and preserves improved data utility. The execution time is also significantly low which can still be reduced to a minimum by parallelism techniques adapted with high speed processors. References [1] Verykios,V.S, Bertino.E, Fovino.I.N, ProvenzaL.P, Saygin.Y and Theodoridis.Y, “State-of-the-art in Privacy Preservation Data Mining”, New York,ACM SIGMOD Record, vol.33, no.2, pp.50-57,2004. [2] Atallah.M, Bertino.E, Elmagarmid.A, Ibrahim.M and Verykios.V.S, “Disclosure Limitation of Sensitive Rules”, In Proc. of IEEE Knowledge and Data Engineering Workshop, pages 45–52, Chicago, Illinois, November 1999. [3] Clifton.C and Marks.D, “ Security and Privacy Implications of Data Mining”, In Workshop on Data Mining and Knowledge Discovery, pages 15–19, Montreal, Canada, February 1996. [4] Dasseni.E, Verykios.V.S, Elmagarmid.A.K and Bertino.E, “ Hiding Association Rules by Using Confidence and Support”, In Proc. of the 4th Information Hiding Workshop, pages 369– 383, Pittsburg, PA, April 2001. [5] Saygin.Y, Verykios.V.S, and Clifton.C, “Using Unknowns to Prevent Discovery of Association Rules”, SIGMOD Record, 30(4):45–54, December 2001. [6] Oliveira.S.R.M and Zaiane.O.R, “Privacy preserving Frequent Itemset Mining”, in the Proc. of the IEEE ICDM Workshop on Privacy, Security, and Data Mining, Pages 43-54, Maebashi City, Japan, December 2002. [7] Oliveira.S.R.M and Zaiane.O.R, “An Efficient One-Scan Sanitization for Improving the Balance between Privacy and Knowledge Discovery”, Technical Report TR 03-15, June 2003. [8] Cynthia Selvi P, Mohamed Shanavas A.R, “An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases”, IOSR- Journal on Computer Engineering, Volume 5, Issue 1(Sep-Oct, 2012), PP 06-11, DOI. 10.9790/0661-0510611. [9] Agrawal R, Imielinski T, and Swami.A, “Mining association rules between sets of items in large databases”, Proceedings of 1993 ACM SIGMOD international conference on management of data, Washington, DC; 1993. p. 207-16. [10] http://fimi.cs.helsinki.fi/data/ [11] Pavon.J, Viana.S, and Gomez.S, “Matrix Apriori: speeding up the search for frequent patterns,” Proc. 24th IASTED International Conference on Databases and Applications, 2006, pp. 75-82. 1.66 6.32 7.49 9.91 12.54 0 20 40 60 5 10 15 20 25 cputime(secs) No.of rules to sanitize 6.89 8.22 14.69 17.22 0 20 40 60 2k 4k 6k 8k cputime(secs) No. of Transactions Dataset name T10I4D100K No. of transactions No. of distinct Items Min. Length Max. Length Dataset Size (MB) Source (D) 1K – 8K 795 - 862 1 26 97KB – 650 KB Sanitized (D’) 1K – 8K 795 - 862 1 26 95KB – 648KB