SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
International Journal of Computational Engineering Research||Vol, 04||Issue, 2||

Introducing Parallelism in Privacy Preserving Data Mining
Algorithms
1,

Mrs.P.Cynthia Selvi, 2, Dr.A.R.Mohamed Shanavas
1

Associate Professor, Dept. of Comp. Sci.
K.N.Govt.Arts College for Women, Thanjavur, Tamilnadu, India
2
Associate Professor, Dept. of Comp. Sci.
Jamal Mohamed College Trichirapalli, Tamilnadu, India

ABSTRACT
The development of parallel and distributed data mining algorithms in various functionalities have been
motivated by the huge size and wide distribution of the databases and also by the computational
complexity of the data mining methods. Such algorithms make partitions of the huge database that is
being used into segments that are processed in parallel. The results obtained from the processed
segments of database are then merged; This reduces the computational complexity and improves the
speed. This article aims at introducing parallelism in data sanitization technique in order to improve the
performance and throughput.

KEYWORDS: Cover, Parallelism, Privacy Preserving Data Mining, Restrictive Patterns, Sanitization,
Sensitive Transactions, Transaction-based Maxcover Algorithm.

I.

INTRODUCTION

Data mining is an emerging technology that enable the discovery of interesting patterns from large
collections of data. As the amount of data being collected continues to increase very rapidly, scalable algorithms
for data mining becomes essential; Moreover, it is a challenging task for data mining approaches to handle large
amount of data effectively and efficiently. Scaling up the data mining algorithms to be run in high-performance
parallel and distributed computing environments offers an alternative solution for effective data mining.
Parallelization is a process that consist of breaking up a large single process into multiple smaller tasks which
can run in parallel and the results of those tasks are combined to obtain an overall improvement in performance.
With a lot of information accessible in electronic forms and available on the web, and with increasingly
powerful data mining tools being developed and put into use, there are increasing concerns that data mining
pose a threat to privacy and data security. This motivated the area of Privacy Preserving Data Mining(PPDM)
and its main objective is to develop algorithms to transform the original data to protect the private data and
knowledge without much utility loss. There are many approaches for preserving privacy in data mining; to name
a few are perturbation, encryption, swapping, distortion, blocking, sanitization. The task of transforming the
source database into a new database that hides some sensitive knowledge is called sanitization process[1].This
article introduce the concept of parallelism in PPDM algorithms. Section-2 narrates the previous work on
PPDM. Section-3 introduces the basic terminologies and the proposed algorithm is presented in section-4.
Section-5 gives implementation details and the observed results.

II.

LITERATURE SURVEY

The idea behind data sanitization was introduced in [2], which considered the problem of modifying a
given database so that the support of a given set of sensitive rules decreases below the minimum support value.
The authors focused on the theoretical approach and showed that the optimal sanitization is an NP-hard
problem. In [3], the authors investigated confidentiality issues of a broad category of association rules and
proposed some algorithms to preserve privacy of such rules above a given privacy threshold.In the same
direction, Saygin[4] introduced some algorithms to obscure a given set of sensitive rules by replacing known
values with unknowns, while minimizing the side-effects on non-sensitive rules. Like the algorithms proposed
in [3], these algorithms are CPU-sensitive and require various scans depending on the no. of association rules to
be hidden. In [5,6], heuristic-based sanitization algorithms have been proposed. All these algorithms
concentrated on data hiding principle to be implemented on the source database as a whole and parallelism is
not dealt with. Hence this work makes an attempt to introduce parallelism in Transaction-based Maxcover
Algorithm(TMA) proposed in [6] and improved performance is obtained from the observed results.
||Issn 2250-3005 ||

||Febrauary||2014||

Page 38
Introducing Parallelism in Privacy Preserving …
III.

BASIC CONCEPTS OF PPDM ALGORITHM

Transactional Database : A transactional database consists of a file where each record represents a transaction
that typically includes a unique identity number (trans_id) and a list of items that make up the transaction. Let D
be a source database which is a transactional database containing a set of transactions T, where each transaction
t contain an itemset
. Also, every
has an associated set of transactions
, where
and
.
Association Rule : It is an expression of the form
, where X and Y contain one or more
itemsets(categorical values) without common elements (
).
Frequent Pattern : An itemset or pattern that forms an association rule is said to be frequent if it satisfies a
prespecified minimum support threshold(min_sup).
Restrictive Patterns : Let P be a set of significant patterns that can be mined from transactional source database
D, and RH be a set of rules to be hidden according to some privacy policies. A set of all patterns rpi denoted by
RP is said to be restrictive, if RP ⊂ P and if and only if RP would derive the set RH. RP is the set of nonrestrictive patterns such that RP RP = P [4].
Sensitive Transactions : A set of transactions is said to be sensitive, denoted by ST, if every t ST contain atleast
one restrictive pattern rpi . ie ST ={
T | rpi RP, rpi ⊆ t }.
Cover : The Cover[5] of an item Ak can be defined as,
CAk = { rpi | Ak

rpi

RP, 1

i

|RP|}

i.e., set of all restrictive patterns which contain Ak. The item that is included in a maximum number of rpi’s is
the one with maximal cover or maxCover;
i.e., maxCover = max( |CA1|, |CA2| , … |CAn| ) such that Ak

rpi

RP.

Principle of Transaction-based Maxcover Algorithm(TMA) [6]: Initially, identify the transaction-list of each
rpi RP. Starting with rpi having larger supCount, for every transaction t in t-list(rpi), find the cover(Ak) within
t such that Ak rpi
t. Delete item Ak with maxCover in t, and decrease the supCount of all rpi’s which are
included in t. Also mark this t as victim transaction in the t-list of the corresponding rpi’s. Repeat this process
until the supCount of all rpi’s are reduced to 0.

IV.

ALGORITHM

The sanitization task is distributed among the Server (a server is an entity that has some resource that
can be shared) and the Clients (a client is simply any other entity which wants to gain access to a particular
server) and the task is implemented as two modules namely Server Module and Client Module.
Procedure(Server module):
Input : Source transactional database(D)
Output : Sanitized database(D’)
Start;
Get Source Database(D);
Get N; //N- no. of data segments;
Horizontally partition the D into N segments;
Initialize Lookup Tables;
Allocate the segment to client; // one each
Merge the sanitized segments received from N clients;
Display Sanitized Database(D’);
End
Procedure(Client module):
Start;
Get data segment from Server;
Access the Lookup Tables;
Run TMA algorithm to sanitize the data segment;
Return Sanitized segment to Server;
Stop.
||Issn 2250-3005 ||

||Febrauary||2014||

Page 39
Introducing Parallelism in Privacy Preserving …
V.

IMPLEMENTATION

The proposed algorithm was tested on real databases RETAIL & T10I4D100K[7] with samples of
transactions between 1000 and 10000. The restrictive patterns were chosen in a random manner with their
support ranging between 0.6 and 5, confidence between 32.5 and 85.7 and length between 2 and 6. The test run
was made on AMD Turion II N550 Dual core processor with 2.6 GHz speed and 2GB RAM operating on 32 bit
OS; The implementation of the proposed algorithm was done with windows 7 - Netbeans 6.9.1 - SQL 2005.
The coding part is done with JDK 1.7; because Java’s clean and type-safe object-oriented programming model
together with its support for parallel and distributed computing make it an attractive environment for writing
reliable and parallel programs. The characteristics of Source and Sanitized databases are given in table-I & II
respectively; the improved performance in terms of execution time is shown in Table-III & IV and the graphs.
Table-I. Characteristics of source databases
Database Name

No. of
Transactions

No. of
Distinct
Items

Min.
Length

Max.
Length

Min. no. of
Sensitive
Transactions

Max. no. of
Sensitive
Transactions

Database Size (MB)

Retail

1K – 10K

3176 –
8126

1

58

22

2706

90.2KB–928 KB

T10I4D100K

1K – 10K

795 - 862

1

26

7

78

94.8KB–786 KB

Table-II. Characteristics of sanitized databases
Database
Name

No. of
Transactions

No. of Distinct
Items

Min. Length

Max.
Length

Database Size(MB)

Retail

1K – 10K

3176 – 8126

1

58

90.0KB – 916KB

T10I4D100K

1K – 10K

795 - 862

1

26

94.0KB – 778KB

5.1. Results

||Issn 2250-3005 ||

||Febrauary||2014||

Page 40
Introducing Parallelism in Privacy Preserving …

Fig.1. Execution time(Retail Database)

VI.

Fig.2. Execution time(T10I4D100K)

CONCLUSION

The main objective of this study is to improve the performance of the Privacy Preserving Data Mining
Algorithm, TMA which was proposed and proved its performance in the earlier work[6]. In this article we
implemented a parallel processing on TMA and we have observed very good improvement in the processing
speed. Hence this empirical study showed that improved performance can be achieved but performance would
degrade as the number of processors increased.

REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]

A.Evfimievski, R.Srikant, R.Agarwal, Gehrke. Privacy Preserving mining of association rules. Proceedings of 8th ACM SIGKDD
international conference on Knowledge discovery and data mining, Alberta, Canada. p.217-28. 2002.
M.Atallah, E.Bertino, A.Elamagarmid, M.Ibrahim and V.Verykios. Disclosure Limitation of Sensitive Rules. In Proc. of IEEE
knowledge and Data Engg.workshop. p.45-52. Chicogo, Illinois. Nov.1999.
E.Dessani, V.S.Verykios, A.K.Elamagarmid and E.Bertino. Hiding association rules by using confidence and support. In 4th
information hiding workshop. p.369-383. Pitsburg, PA. Apr 2001.
Y.Saygin, V.S.Verikios and C.Clifton. Using unknowns to Prevent Discovery of Association Rules. SIGMOD Record. 30(4):45-54.
Dec,2001.
P.Cynthia Selvi, A.R.Mohamed Shanavas. An Improved Item-based Maxcover Algorithm to Protect Sensitive Patterns in Large
Databases. IOSR-Journal on Computer Engineering. Volume 14, Issue 4 (Sep-Oct, 2013). PP 01-05. DOI. 10.9790/0661-1440105.
P.Cynthia Selvi, A.R.Mohamed Shanavas. Towards Information Privacy using Transaction-based Maxcover Algorithm. Presented
on International Conference on Data Mining and SoftComputing Techniques. SASTRA University, India. 25th & 26th Oct’2013.
The Dataset used in this work for experimental analysis was generated using the generator from IBM Almaden Quest research
group and is publicly available from http://fimi.ua.ac.be/data/

||Issn 2250-3005 ||

||Febrauary||2014||

Page 41

Weitere ähnliche Inhalte

Was ist angesagt?

Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Block-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplicationBlock-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplicationIRJET Journal
 
IoT Protocols Integration with Vortex Gateway
IoT Protocols Integration with Vortex GatewayIoT Protocols Integration with Vortex Gateway
IoT Protocols Integration with Vortex GatewayAngelo Corsaro
 
Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...IOSR Journals
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363SHIVA REDDY
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rulesijnlc
 
Ieeepro techno solutions 2013 ieee java project -building confidential and ...
Ieeepro techno solutions   2013 ieee java project -building confidential and ...Ieeepro techno solutions   2013 ieee java project -building confidential and ...
Ieeepro techno solutions 2013 ieee java project -building confidential and ...hemanthbbc
 
IRJET- Improving Data Spillage in Multi-Cloud Capacity Administration
IRJET- Improving Data Spillage in Multi-Cloud Capacity AdministrationIRJET- Improving Data Spillage in Multi-Cloud Capacity Administration
IRJET- Improving Data Spillage in Multi-Cloud Capacity AdministrationIRJET Journal
 
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...dbpublications
 
Cryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop ImplementationCryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop ImplementationIOSR Journals
 
An efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environmentAn efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environmenteSAT Journals
 
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud StorageIRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud StorageIRJET Journal
 
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...IRJET Journal
 
A location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsA location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsIAEME Publication
 
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Editor IJLRES
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEEGLOBALSOFTSTUDENTPROJECTS
 
NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...
NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...
NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...ijiert bestjournal
 
Architecting IoT Systems with Vortex
Architecting IoT Systems with VortexArchitecting IoT Systems with Vortex
Architecting IoT Systems with VortexAngelo Corsaro
 

Was ist angesagt? (19)

Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Block-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplicationBlock-Level Message-Locked Encryption for Secure Large File De-duplication
Block-Level Message-Locked Encryption for Secure Large File De-duplication
 
IoT Protocols Integration with Vortex Gateway
IoT Protocols Integration with Vortex GatewayIoT Protocols Integration with Vortex Gateway
IoT Protocols Integration with Vortex Gateway
 
Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
 
Ieeepro techno solutions 2013 ieee java project -building confidential and ...
Ieeepro techno solutions   2013 ieee java project -building confidential and ...Ieeepro techno solutions   2013 ieee java project -building confidential and ...
Ieeepro techno solutions 2013 ieee java project -building confidential and ...
 
IRJET- Improving Data Spillage in Multi-Cloud Capacity Administration
IRJET- Improving Data Spillage in Multi-Cloud Capacity AdministrationIRJET- Improving Data Spillage in Multi-Cloud Capacity Administration
IRJET- Improving Data Spillage in Multi-Cloud Capacity Administration
 
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
 
Cryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop ImplementationCryptographic Cloud Storage with Hadoop Implementation
Cryptographic Cloud Storage with Hadoop Implementation
 
An efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environmentAn efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environment
 
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud StorageIRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
 
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...
IRJET- An Integrity Auditing &Data Dedupe withEffective Bandwidth in Cloud St...
 
A location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsA location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applications
 
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
 
NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...
NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...
NEW SECURE CONCURRECY MANEGMENT APPROACH FOR DISTRIBUTED AND CONCURRENT ACCES...
 
Architecting IoT Systems with Vortex
Architecting IoT Systems with VortexArchitecting IoT Systems with Vortex
Architecting IoT Systems with Vortex
 

Andere mochten auch

2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...
2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...
2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...IEEEFINALYEARSTUDENTPROJECT
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches14894
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Miningidescitation
 
Privacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachPrivacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachNarendra Dhadhal
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman
 

Andere mochten auch (7)

2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...
2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...
2014 IEEE JAVA DATA MINING PROJECT Privacy preserving and content-protecting ...
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
 
Privacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachPrivacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approach
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Ähnlich wie F0423038041

Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...IOSR Journals
 
Output Privacy Protection With Pattern-Based Heuristic Algorithm
Output Privacy Protection With Pattern-Based Heuristic AlgorithmOutput Privacy Protection With Pattern-Based Heuristic Algorithm
Output Privacy Protection With Pattern-Based Heuristic Algorithmijcsit
 
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
 
Iaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data setsIaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data setsIaetsd Iaetsd
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
The Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion TheoryThe Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion Theoryinventionjournals
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
A Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data MiningA Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data MiningIRJET Journal
 
IRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET Journal
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageIJMER
 
JPJ1452 Privacy-Enhanced Web Service Composition
JPJ1452  Privacy-Enhanced Web Service CompositionJPJ1452  Privacy-Enhanced Web Service Composition
JPJ1452 Privacy-Enhanced Web Service Compositionchennaijp
 
IRJET-Auditing and Resisting Key Exposure on Cloud Storage
IRJET-Auditing and Resisting Key Exposure on Cloud StorageIRJET-Auditing and Resisting Key Exposure on Cloud Storage
IRJET-Auditing and Resisting Key Exposure on Cloud StorageIRJET Journal
 
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...IOSR Journals
 

Ähnlich wie F0423038041 (20)

K017167076
K017167076K017167076
K017167076
 
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
 
Output Privacy Protection With Pattern-Based Heuristic Algorithm
Output Privacy Protection With Pattern-Based Heuristic AlgorithmOutput Privacy Protection With Pattern-Based Heuristic Algorithm
Output Privacy Protection With Pattern-Based Heuristic Algorithm
 
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
 
Iaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data setsIaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd secured and efficient data scheduling of intermediate data sets
 
1670 1673
1670 16731670 1673
1670 1673
 
1670 1673
1670 16731670 1673
1670 1673
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
The Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion TheoryThe Architecture of Cloud Storage Model Based On Confusion Theory
The Architecture of Cloud Storage Model Based On Confusion Theory
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
A Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data MiningA Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data Mining
 
IRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and TechniquesIRJET- A Study of Privacy Preserving Data Mining and Techniques
IRJET- A Study of Privacy Preserving Data Mining and Techniques
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
 
JPJ1452 Privacy-Enhanced Web Service Composition
JPJ1452  Privacy-Enhanced Web Service CompositionJPJ1452  Privacy-Enhanced Web Service Composition
JPJ1452 Privacy-Enhanced Web Service Composition
 
Cr4201623627
Cr4201623627Cr4201623627
Cr4201623627
 
IRJET-Auditing and Resisting Key Exposure on Cloud Storage
IRJET-Auditing and Resisting Key Exposure on Cloud StorageIRJET-Auditing and Resisting Key Exposure on Cloud Storage
IRJET-Auditing and Resisting Key Exposure on Cloud Storage
 
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
 
Bi4201403406
Bi4201403406Bi4201403406
Bi4201403406
 
Cloud java titles adrit solutions
Cloud java titles adrit solutionsCloud java titles adrit solutions
Cloud java titles adrit solutions
 
dbms ppt .pptx
dbms ppt .pptxdbms ppt .pptx
dbms ppt .pptx
 

F0423038041

  • 1. International Journal of Computational Engineering Research||Vol, 04||Issue, 2|| Introducing Parallelism in Privacy Preserving Data Mining Algorithms 1, Mrs.P.Cynthia Selvi, 2, Dr.A.R.Mohamed Shanavas 1 Associate Professor, Dept. of Comp. Sci. K.N.Govt.Arts College for Women, Thanjavur, Tamilnadu, India 2 Associate Professor, Dept. of Comp. Sci. Jamal Mohamed College Trichirapalli, Tamilnadu, India ABSTRACT The development of parallel and distributed data mining algorithms in various functionalities have been motivated by the huge size and wide distribution of the databases and also by the computational complexity of the data mining methods. Such algorithms make partitions of the huge database that is being used into segments that are processed in parallel. The results obtained from the processed segments of database are then merged; This reduces the computational complexity and improves the speed. This article aims at introducing parallelism in data sanitization technique in order to improve the performance and throughput. KEYWORDS: Cover, Parallelism, Privacy Preserving Data Mining, Restrictive Patterns, Sanitization, Sensitive Transactions, Transaction-based Maxcover Algorithm. I. INTRODUCTION Data mining is an emerging technology that enable the discovery of interesting patterns from large collections of data. As the amount of data being collected continues to increase very rapidly, scalable algorithms for data mining becomes essential; Moreover, it is a challenging task for data mining approaches to handle large amount of data effectively and efficiently. Scaling up the data mining algorithms to be run in high-performance parallel and distributed computing environments offers an alternative solution for effective data mining. Parallelization is a process that consist of breaking up a large single process into multiple smaller tasks which can run in parallel and the results of those tasks are combined to obtain an overall improvement in performance. With a lot of information accessible in electronic forms and available on the web, and with increasingly powerful data mining tools being developed and put into use, there are increasing concerns that data mining pose a threat to privacy and data security. This motivated the area of Privacy Preserving Data Mining(PPDM) and its main objective is to develop algorithms to transform the original data to protect the private data and knowledge without much utility loss. There are many approaches for preserving privacy in data mining; to name a few are perturbation, encryption, swapping, distortion, blocking, sanitization. The task of transforming the source database into a new database that hides some sensitive knowledge is called sanitization process[1].This article introduce the concept of parallelism in PPDM algorithms. Section-2 narrates the previous work on PPDM. Section-3 introduces the basic terminologies and the proposed algorithm is presented in section-4. Section-5 gives implementation details and the observed results. II. LITERATURE SURVEY The idea behind data sanitization was introduced in [2], which considered the problem of modifying a given database so that the support of a given set of sensitive rules decreases below the minimum support value. The authors focused on the theoretical approach and showed that the optimal sanitization is an NP-hard problem. In [3], the authors investigated confidentiality issues of a broad category of association rules and proposed some algorithms to preserve privacy of such rules above a given privacy threshold.In the same direction, Saygin[4] introduced some algorithms to obscure a given set of sensitive rules by replacing known values with unknowns, while minimizing the side-effects on non-sensitive rules. Like the algorithms proposed in [3], these algorithms are CPU-sensitive and require various scans depending on the no. of association rules to be hidden. In [5,6], heuristic-based sanitization algorithms have been proposed. All these algorithms concentrated on data hiding principle to be implemented on the source database as a whole and parallelism is not dealt with. Hence this work makes an attempt to introduce parallelism in Transaction-based Maxcover Algorithm(TMA) proposed in [6] and improved performance is obtained from the observed results. ||Issn 2250-3005 || ||Febrauary||2014|| Page 38
  • 2. Introducing Parallelism in Privacy Preserving … III. BASIC CONCEPTS OF PPDM ALGORITHM Transactional Database : A transactional database consists of a file where each record represents a transaction that typically includes a unique identity number (trans_id) and a list of items that make up the transaction. Let D be a source database which is a transactional database containing a set of transactions T, where each transaction t contain an itemset . Also, every has an associated set of transactions , where and . Association Rule : It is an expression of the form , where X and Y contain one or more itemsets(categorical values) without common elements ( ). Frequent Pattern : An itemset or pattern that forms an association rule is said to be frequent if it satisfies a prespecified minimum support threshold(min_sup). Restrictive Patterns : Let P be a set of significant patterns that can be mined from transactional source database D, and RH be a set of rules to be hidden according to some privacy policies. A set of all patterns rpi denoted by RP is said to be restrictive, if RP ⊂ P and if and only if RP would derive the set RH. RP is the set of nonrestrictive patterns such that RP RP = P [4]. Sensitive Transactions : A set of transactions is said to be sensitive, denoted by ST, if every t ST contain atleast one restrictive pattern rpi . ie ST ={ T | rpi RP, rpi ⊆ t }. Cover : The Cover[5] of an item Ak can be defined as, CAk = { rpi | Ak rpi RP, 1 i |RP|} i.e., set of all restrictive patterns which contain Ak. The item that is included in a maximum number of rpi’s is the one with maximal cover or maxCover; i.e., maxCover = max( |CA1|, |CA2| , … |CAn| ) such that Ak rpi RP. Principle of Transaction-based Maxcover Algorithm(TMA) [6]: Initially, identify the transaction-list of each rpi RP. Starting with rpi having larger supCount, for every transaction t in t-list(rpi), find the cover(Ak) within t such that Ak rpi t. Delete item Ak with maxCover in t, and decrease the supCount of all rpi’s which are included in t. Also mark this t as victim transaction in the t-list of the corresponding rpi’s. Repeat this process until the supCount of all rpi’s are reduced to 0. IV. ALGORITHM The sanitization task is distributed among the Server (a server is an entity that has some resource that can be shared) and the Clients (a client is simply any other entity which wants to gain access to a particular server) and the task is implemented as two modules namely Server Module and Client Module. Procedure(Server module): Input : Source transactional database(D) Output : Sanitized database(D’) Start; Get Source Database(D); Get N; //N- no. of data segments; Horizontally partition the D into N segments; Initialize Lookup Tables; Allocate the segment to client; // one each Merge the sanitized segments received from N clients; Display Sanitized Database(D’); End Procedure(Client module): Start; Get data segment from Server; Access the Lookup Tables; Run TMA algorithm to sanitize the data segment; Return Sanitized segment to Server; Stop. ||Issn 2250-3005 || ||Febrauary||2014|| Page 39
  • 3. Introducing Parallelism in Privacy Preserving … V. IMPLEMENTATION The proposed algorithm was tested on real databases RETAIL & T10I4D100K[7] with samples of transactions between 1000 and 10000. The restrictive patterns were chosen in a random manner with their support ranging between 0.6 and 5, confidence between 32.5 and 85.7 and length between 2 and 6. The test run was made on AMD Turion II N550 Dual core processor with 2.6 GHz speed and 2GB RAM operating on 32 bit OS; The implementation of the proposed algorithm was done with windows 7 - Netbeans 6.9.1 - SQL 2005. The coding part is done with JDK 1.7; because Java’s clean and type-safe object-oriented programming model together with its support for parallel and distributed computing make it an attractive environment for writing reliable and parallel programs. The characteristics of Source and Sanitized databases are given in table-I & II respectively; the improved performance in terms of execution time is shown in Table-III & IV and the graphs. Table-I. Characteristics of source databases Database Name No. of Transactions No. of Distinct Items Min. Length Max. Length Min. no. of Sensitive Transactions Max. no. of Sensitive Transactions Database Size (MB) Retail 1K – 10K 3176 – 8126 1 58 22 2706 90.2KB–928 KB T10I4D100K 1K – 10K 795 - 862 1 26 7 78 94.8KB–786 KB Table-II. Characteristics of sanitized databases Database Name No. of Transactions No. of Distinct Items Min. Length Max. Length Database Size(MB) Retail 1K – 10K 3176 – 8126 1 58 90.0KB – 916KB T10I4D100K 1K – 10K 795 - 862 1 26 94.0KB – 778KB 5.1. Results ||Issn 2250-3005 || ||Febrauary||2014|| Page 40
  • 4. Introducing Parallelism in Privacy Preserving … Fig.1. Execution time(Retail Database) VI. Fig.2. Execution time(T10I4D100K) CONCLUSION The main objective of this study is to improve the performance of the Privacy Preserving Data Mining Algorithm, TMA which was proposed and proved its performance in the earlier work[6]. In this article we implemented a parallel processing on TMA and we have observed very good improvement in the processing speed. Hence this empirical study showed that improved performance can be achieved but performance would degrade as the number of processors increased. REFERENCES [1] [2] [3] [4] [5] [6] [7] A.Evfimievski, R.Srikant, R.Agarwal, Gehrke. Privacy Preserving mining of association rules. Proceedings of 8th ACM SIGKDD international conference on Knowledge discovery and data mining, Alberta, Canada. p.217-28. 2002. M.Atallah, E.Bertino, A.Elamagarmid, M.Ibrahim and V.Verykios. Disclosure Limitation of Sensitive Rules. In Proc. of IEEE knowledge and Data Engg.workshop. p.45-52. Chicogo, Illinois. Nov.1999. E.Dessani, V.S.Verykios, A.K.Elamagarmid and E.Bertino. Hiding association rules by using confidence and support. In 4th information hiding workshop. p.369-383. Pitsburg, PA. Apr 2001. Y.Saygin, V.S.Verikios and C.Clifton. Using unknowns to Prevent Discovery of Association Rules. SIGMOD Record. 30(4):45-54. Dec,2001. P.Cynthia Selvi, A.R.Mohamed Shanavas. An Improved Item-based Maxcover Algorithm to Protect Sensitive Patterns in Large Databases. IOSR-Journal on Computer Engineering. Volume 14, Issue 4 (Sep-Oct, 2013). PP 01-05. DOI. 10.9790/0661-1440105. P.Cynthia Selvi, A.R.Mohamed Shanavas. Towards Information Privacy using Transaction-based Maxcover Algorithm. Presented on International Conference on Data Mining and SoftComputing Techniques. SASTRA University, India. 25th & 26th Oct’2013. The Dataset used in this work for experimental analysis was generated using the generator from IBM Almaden Quest research group and is publicly available from http://fimi.ua.ac.be/data/ ||Issn 2250-3005 || ||Febrauary||2014|| Page 41