130509

International Journal of Technical Research & Application
International Journal of Technical Research & ApplicationInternational Journal of Technical Research & Application

A MODEL FOR PRESERVING PRIVACY OF SENSITIVE DATA

International Journal of Technical Research and Applications e-ISSN: 2320-8163, 
www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 
7 | P a g e 
ID 
Attributes 
Age 
Sex 
Pin code 
Name 
Age 
Sex 
1 
Anup 
22 
M 
2 
Rima 
33 
F 
3 
Asha 
44 
F 
4 
Sagar 
55 
M 
ID 
Attributes 
Non Sensitive Sensitive 
Age 
Sex 
Pin code 
Disease 
A MODEL FOR PRESERVING PRIVACY OF SENSITIVE DATA 
Ms Shalini Lamba #1 , Dr S. Qamar Abbas #2 
# 1 Research Scholar, Shri Venkateshwara University,Meerut, U.P., India 
#2 Visiting Guide, Shri Venkateshwara University,Meerut, U.P., India 
Abstract— Advancement in the field of Information technology has resulted in tremendous growth in data collection and extraction of unknown patters from this huge pool of data which has become the primary objective of any data mining algorithm. Besides being so effective, some sensitive information is also revealed by mining algorithm. The extracted knowledge is highly confidential and it needs refinement before giving to data mining researchers and the public in order to concentrate on privacy concerns. The data mining techniques have been developed by the researchers to be applied on data bases without violating the privacy of individuals. Over the last decade, many techniques for privacy preserving data mining have come up. In this paper we propose a new approach to preserve sensitive information using fuzzy logic. Clustering is done on the original data set, and then we add noise to the numeric data using a fuzzy membership function that results in distorted data. Set of Clusters generated using the fuzzified data is also equivalent to the original cluster as well as privacy is also achieved. It is also proved that the processing time of the data is considerably reduced when compared to the other methods that are being used for this purpose. 
Index Terms— K-Means, S-Shaped fuzzy Membership 
Function, Privacy, Clustering (key words) 
I. INTRODUCTION. 
Data mining is the process used to analyze large quantities of data and gather useful information from them. It extracts the hidden information from large heterogeneous databases in many different dimensions and finally summarizes it into categories and relations of data [1] In order to learn a system in detailed manner, we should be able to decrease the system complexity and increase our understanding about the system. For any application, if the information available is imprecise then fuzzy reasoning provides a better solution [13].The primary goal of privacy preserving is to hide the sensitive data before it gets published. For example, a hospital may release patient’s records to enable the researchers to study the characteristics of various diseases. The raw data contains some sensitive information of individuals, which are not published to protect individual privacy. However, using some other published attributes and some external data we can retrieve the personal identities. Table 1 shows a sample data published by a hospital after hiding sensitive attributes (Ex. Patient’s name). 
Table 1 – Patient Medical Data 
ID 
Attributes 
Age 
Sex 
Pin code 
Disease 
1 
22 
M 
789001 
Dengu 
2 
33 
F 
789002 
Dengu 
3 
44 
F 
789003 
SwineFlu 
4 
55 
M 
789004 
Malaria 
Table 2 - Patients Registration List 
Table 2 shows a sample patient’s registration list. If an opponent has access to this table he can easily identify the information about all the patients by comparing the two tables using the attributes like (Pin-code, age, sex). One can clearly identify who is having Dengu, Swine flu Malaria etc which is sensitive attribute. This Microdata is a valuable source of information for the research and allocation of public funds, trend analysis and medical research. Our work is publishing this data without revealing sensitive information about them as shown in Table 3. 
Table 3- A 2-ANONYMOUS TABLE
International Journal of Technical Research and Applications e-ISSN: 2320-8163, 
www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 
8 | P a g e 
1 
2* 
* 
7890* 
Dengu 
2 
3* 
F 
7890* 
Dengu 
3 
4* 
F 
7890* 
SwineFlu 
4 
5* 
* 
7890* 
Malaria 
This idea of using fuzzy logic is applied to preserve the individual information while revealing the details in public. This paper mainly focuses on converting the sensitive data into modified data by using S – shaped fuzzy membership function. K means clustering algorithm is applied on the modified data and it is found that the relativity of the data is also maintained. There are a number of methods used for preserving the privacy of the data while clustering. Some of the methods are use of cryptographic algorithms, noise addition, and data swapping. All of these methods introduce a bit of complexity in the algorithm and increase the processing time. Our main aim is to reduce this processing time and at the same time provide an optimum solution to the problem of privacy preserving. For this purpose we are using the concept of fuzzy approach. 
S – shaped fuzzy membership function is given by, 
Where x is the value of the sensitive attribute, a & b is the minimum and maximum value in the sensitive attribute list. 
The rest of the paper is organized as follows: Section 2 describes the various methods that can be used for privacy preserving in data mining. Section 3 explains the proposed system based around the fuzzy based membership function approach and how it can be used for privacy preserving. Section 4 shows the proposed method experimental result and comparison with K means algorithm. 
II. LITERATURE SURVEY 
In recent year’s lot of research work has been carried out to preserve data privacy before releasing the data for various research purposes which adopts various techniques like Data Auditing, Data Modification, Cryptographic methods and k- anonymity. 
A. Modification-Based Techniques 
A number of techniques have been developed for a quantity of data mining techniques like classification, association rule discovery and clustering, based on the hypothesis that discerning data modification or sanitization is an NP-Hard problem, and for this basis, alteration can be used to address the complexity issues. 
• Swapping values between records (e.g. [10]) 
• Replacing the original database by a sample from the 
same distribution (e.g. [18] [21] [23]) 
• Adding noise to the values in the database (e.g. [22] [11]) 
• Adding noise to the results of a query (e.g. [20]) 
• Sampling the result of a query (e.g. [25]). 
B. Cryptography-based techniques 
In Cryptographic methods [2] data is encrypted using protocols like secured multiparty computation (SMC). It is a study of mathematical techniques, related to aspects of information security such as confidentiality, data integrity, entity authentication, and data origin authentication is shaping the way that information is safely and securely transmitted over the Internet. Sensitive information is quit large, Such as Credit Card Information, Social Security Numbers, Private correspondence, Military statement, Bank account information. 
C. Refurbishing-based techniques 
Refurbishing-based techniques are techniques where the original circulation of the data is reconstructed from the randomized data. 
Algorithm for Refurbishing 
Step 1 : Creating randomized data replica by data perturbation of entity data records. 
Step 2 : Recreate distributions, not values in individual records. 
Step 3 : By means of using the reconstructed distributions, fabricate the original data. 
For refurbishing the predicament of reconstructing original distribution from a given distribution can be viewed in the broad-spectrum framework of inverse problems [24]. In [12], it was publicized that for smooth sufficient distributions (e.g. slowly varying time signals), it is possible to fully recuperate original distribution from Proceedings of the World Congress on Engineering 2008 Vol I non-overlapping, adjoining partial sums. Such partial sums of true values are not available to us. We cannot formulate priori assumptions about the original distribution; we merely be acquainted with the distribution used in randomizing values of an attribute. There is rich query optimization literature on estimating attribute distributions from partial information [17]. In the OLAP literature, there is work on approximating queries on sub-cubes from higher-level aggregations (e.g. [19]). However, these works did not have to cope with information that has been intentionally distorted. 
Kargupta et al. used a data refurbishing approach to derive private information from a disguised data set. Namely, a new data set X¤ is reconstructed from the disguised data using certain algorithms, and the difference between X¤ and the actual original data set X indicates how much private information can be disclosed. The further apart X¤ is from X, the higher level of the privacy preservation is achieved. Therefore, the difference between X¤ and X can be used as the measure to quantify how much privacy is preserved. 
In Noise addition methods [3] we add some random noise (number) to numerical attributes. This random number is usually drawn from a normal distribution with a small standard deviation and with zero mean. Noise is added in a controlled
International Journal of Technical Research and Applications e-ISSN: 2320-8163, 
www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 
9 | P a g e 
way so as to maintain means, variances and co-variances of the attributes of a data set. However, noise addition to categorical attributes is not as straightforward as the noise addition to numerical attributes, due to the absence of natural ordering in categorical values. 
Data swapping [4] [5] interchange the attribute values between different records. Similar attribute values are interchanged with higher probability. The unique feature of this approach is all original values are kept back within the data set and only the positions are swapped. 
In Aggregation [6] [7] instead of individual values the records are replaced by a group representative. Aggregation refers to both combining a few attribute values into one, or 
grouping a few records together and replacing them with a group representative. 
In [15] the clustering operation is performed after applying 
2-dimensional transformations to the data. A different approach for privacy preservation in data mining is given in [16]. This introduces the concept of fuzzy sets which is just an extension to the generic set theory. By using fuzzy sets we can perform a gradual assessment of the data set given to us and this is done by using a fuzzy membership function. Each linguistic term can be represented as a fuzzy set having its own membership function. 
Fuzzy c-Means (FCM) can be used for clustering (Fig. 1). But any element in the set may have membership in more than one category [14]. 
Fig.1 For Refurbishing 
III. PROPOSED SYSTEM 
A. Objective of the Problem 
The sensitive attribute can be selected from the numerical data and it can be modified by any data modification technique. After modification, the modified data can be released to data mining researchers or any agency or firm. If they can apply data mining techniques such as clustering, classification, etc for data analysis, the modified table does not affect the result. In this work, we have applied k-means clustering algorithm to the modified data and verified the result. The steps involved in this work are, 
1. Sensitive Attribute Selection. 
2. Data Transformation perturbative masking technique for modifying the sensitive attribute. 
3. Applying k-means algorithm for original and fuzzified data. 
4. Compare the processing time of the original and fuzzified data. 
B. Proposed Algorithm 
Algorithm for Data Transformation 
1. Consider a database D consists of T tuples. D={t1,t2,…tn}. Each tuple in T consists of set of attributes T={A1,A2,…Ap} where Ai €T and Ti € D. 
2. Identify the sensitive or confidential numeric attribute 
AR 
3. Identified sensitive attribute values are modified by using S – shaped fuzzy membership function and the fuzzified data is sent back to the user. 
4. The received data is grouped into different clusters Using K – means algorithm.The objective of the Clustering algorithms is to group the similar data together depending upon the characteristics they possess.[1]K-means clustering clusters the similar data with the help of the mean value and squared error criterion.[1]Apply the k-means clustering for both original and modified data set to get the clusters. 
C. Proposed system architecture
International Journal of Technical Research and Applications e-ISSN: 2320-8163, 
www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 
10 | P a g e 
Fig.2 Proposed system architecture 
IV. RESULT ANALYSIS 
For our experimental purpose we have used data sets like census details, air distance for our experiment. For a given original data, the equivalent fuzzified data is given in Table 4. Table 5 shows the result obtained by performing the clustering 
on the original data and the result of the clustering performed on the fuzzy data. It is evident that in both the cases, resultant clusters contain the same set of elements. This shows that, the use of fuzzy conversion does not hamper the relativity of the cluster data. 
Table 4 - Example Data Set 
Original Data 
2 
4 
10 
12 
3 
20 
30 
11 
25 
Fuzzified Data 
0 
0.0102 
0.1632 
0.2551 
0.2551 
0.7449 
1 
0.2066 
0.9362 
Say for the original data set {2,4,10,12,3,20,30,11,25}-we calculate the mean which is 13 and thus form two clusters, one of which has elements less than 13 and the other with elements greater than 13 as shown in Table 5. 
We now calculate the fuzzified data using S – shaped fuzzy membership function where x is the original data set 
{2,4,10,12,3,20,30,11,25} and a is 2 and b is 30. 
Case 1 when x=2 then the fuzzified data will be 0 as 
(x=2)<=(a=2) 
Case 2 when x=4 then the fuzzified data will be 2((4- 
2)/(30-2))2=2(2/28)2=1/98=0.0102 
Case 3 when x=10 then the fuzzified data will be 2((10- 
2)/(30-2))2=2(8/28)2=8/49=0.1632 and so on for other data sets. These fuzzified data would also form clusters as above mentioned for original data. 
The K- means algorithm will perform the following three steps until convergence and it Iterates until stable (= no object move group): 
Step 1: Determine the centric coordinate.(centroid) 
Step 2: Determine the distance of each item to the centric. Step 3: Group the item based on minimum distance. 
Fig.3 Flowchart 
Table 5 - Clustering Output 
Cluster 1 
Data 
Cluster 2 
Data 
{2,4,10,12,3,11} 
{20,30,25} 
{0, 0.0102, 0.1632, 
0.2551,0.2066} 
{0.7449,1,0.9362} 
Finally, we are going to analyze the data utility using cluster analysis. The original and perturbed data will be separately clustered using K-Means clustering algorithm; the two outputs are compared in terms of their statistical efficiency. 
From our experiment we found out that by using fuzzy approach, the processing time of the data is considerably reduced when compared to the other methods that are being used for this purpose. In the particular example that we took,
International Journal of Technical Research and Applications e-ISSN: 2320-8163, 
www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 
11 | P a g e 
we used an S-shaped fuzzy membership function in order to transform the original data into fuzzy data. 
Table 6 - Comparison Table 
Method 
Used 
No of 
Passes 
No of 
Clusters 
K-Means 
5 
2 
Fuzzy 
4 
2 
V. CONCLUSION 
Protecting the sensitive data and also extracting knowledge is a very complicated problem. This paper presents an approach to preserve the individual’s details by transforming the original data into fuzzy data using S shaped fuzzy membership function. The main advantage of this method is that it maintains the privacy and at the same time preserves the relativity between the data values. From the results obtained by our experiments (Table 6) it is proved that performing k-means algorithm on fuzzy data increases the efficiency of the process by decreasing the number of passes required to perform the clustering. We have used numerical data for our experimentation purpose and similarly this method can be extended to categorical data. In our process, the nature of the fuzzy membership function used also affects the processing time of the algorithm and hence we can improve the working of this process by applying a different fuzzy membership function. In future this work can also be extended for data classification purpose and we would develop new masking techniques for protecting the categorical attributes. 
REFERENCES 
[1] Sairam et al “Performance Analysis of Clustering Algorithms in Detecting outliers”,International Journal of Computer Science and Information Technologies, Vol. 2 (1) , Jan-Feb 2011, 486- 
488. 
[2] Pinkas,.”Cryptographic Techniques for Privacy-Preserving Data 
Mining”, ACM SIGKDD Explorations, 4(2), 2002. 
[3] Agrawal D, Aggarwal C.C, “On the Design and Quantification of Privacy Preserving Data mining algorithms”, ACM PODS Conference,2002. 
[4] Fienberg S.E. and McIntyre J. “Data Swapping:Variations on a theme by Dalenius and Reiss.” In Journal of Official Statistics,21:309-323,2005. 
[5] Muralidhar K. and Sarathy R. “ Data Shufflinga new masking approach for numerical data”,Management Science, forthcom- ing, 2006. 
[6] Y.Li,S.Zhu,L.Wang, and S.Jajodia “ A privacyenhanced micro- aggregation method”, In Poc. Of 2nd International Symposium on Foundations of Information and Knowledge Systems, pp148- 
159, 2002. 
[7] V.S. Iyengar, “Transforming data to satisfy privacy constraints”, In Proc. of SIGKDD’02,Edmonton, Alberta, Canada,2002. 
[8] Shuting Xu.,Shuhua Lai, “Fast Fourier Transform based data 
perturbation method for privacy protection”, In Proc. of IEEE 
conference on Intelligence and Security Informatics, New 
Brunswick New Jersey, May 2007. 
[9] Shibanth Mukharjee, Zhiyuan Chen, Arya Gangopadhyay,“A privacy preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms”, The VLDB journal 2006. 
[10] D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. 
M.Hellerstein, Y. Ioannidis, H. V.Jagadish, T. Johnson, R.Ng, V.Poosala, and K. Sevcik. The New Jersey Data Reduction Report.Data Engrg. Bull., 20:3{45, Dec. 1997. 
[11] D. Dobkin, A.K. Jones, and R.J. Lipton. Secure databases:Protection against user influence. ACM TODS, 
4(1):97-106, March 1979. 
[12] [12] D. Meng, K. Sivakumar, and H. Kargupta. Privacy sensitive bayesian network parameter learning. In The Fourth IEEE International Conference on Data Mining(ICDM), Brighton, UK, November 2004. 
[13] Zadeh L “Fuzzy sets”, Inf. Control. Vol.8, PP,338 – 353, 1965. [14] Timothy J. Ross “Fuzzy Logic with Engineering Applications”, 
McGraw Hill International Editions, 1997. 
[15] R.R.Rajalaxmi , A.M.Natarajan “An EffectiveData Transformation Approach for Privacy Preserving Clustering”, Journal of Computer Science 4(4): 320-326, 2008. 
[16] V.Vallikumari, S.Srinivasa Rao, KVSVN Raju, KV Ramana, BVS Avadhani “Fuzzy based approach for privacy preserving publication of data”, IJCSNS, Vol.8 No.1, January 2008. 
[17] B. Pinkas. Cryptographic techniques for privacy-preserving data mining. SIGKDD Explorations, 4(2), December 2002. 
[18] E. Lefons, A. Silvestri, and F. Tangorra. An analytic approach to statistical databases. In 9th Int. Conf. Very Large Data Bases, pages 260-- 274. Morgan Kaufmann, Oct-Nov 1983. 
[19] D.E. Denning. Cryptography and Data Security.Addison- Wesley, 1982. 
[20] H.W. Engl, M. Hanke, and A. Neubaue. Regularization of 
Inverse Problems. Kluwer, 1996. 
[21] Dakshi Agrawal and Charu C. Aggarwal, On the design andquantification of Privacy Preserving Data Mining algorithms, In Proceedings of the 20th ACM Symposium on Principles of Database Systems (2001), 247–255. 
[22] J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery 
and Ddata Mining, 2003. 
[23] D.E. Denning. Secure statistical databases with random sample queries. ACM TODS, 5(3):291-315, Sept. 1980. 
[24] R. Agrawal and R. Srikant. Privacy Preserving Data Mining . In 
Proceedings of the ACM SIGMOD, pages 439–450, 2000. 
[25] R. Conway and D. Strip. Selective partial access to a database.In 
Proc. ACM Annual Con].,pages 85-89, 1976.

Recomendados

TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA... von
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
852 views9 Folien
Protection models von
Protection modelsProtection models
Protection modelsG Prachi
1.5K views24 Folien
Privacy Preservation and Restoration of Data Using Unrealized Data Sets von
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
286 views5 Folien
A statistical data fusion technique in virtual data integration environment von
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
409 views14 Folien
Privacy Preserving Clustering on Distorted data von
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataIOSR Journals
474 views5 Folien
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US... von
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
623 views11 Folien

Más contenido relacionado

Was ist angesagt?

Recommendation system using bloom filter in mapreduce von
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceIJDKP
1.2K views8 Folien
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE von
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
1.2K views11 Folien
Paper id 212014109 von
Paper id 212014109Paper id 212014109
Paper id 212014109IJRAT
343 views7 Folien
Bi4101343346 von
Bi4101343346Bi4101343346
Bi4101343346IJERA Editor
231 views4 Folien
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I... von
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...Editor IJMTER
422 views10 Folien
Privacy preservation techniques in data mining von
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Publishing House
490 views5 Folien

Was ist angesagt?(17)

Recommendation system using bloom filter in mapreduce von IJDKP
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
IJDKP1.2K views
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE von IJDKP
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
IJDKP1.2K views
Paper id 212014109 von IJRAT
Paper id 212014109Paper id 212014109
Paper id 212014109
IJRAT343 views
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I... von Editor IJMTER
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
Editor IJMTER422 views
A Survey on Features and Techniques Description for Privacy of Sensitive Info... von IRJET Journal
A Survey on Features and Techniques Description for Privacy of Sensitive Info...A Survey on Features and Techniques Description for Privacy of Sensitive Info...
A Survey on Features and Techniques Description for Privacy of Sensitive Info...
IRJET Journal33 views
Data mining and data warehouse lab manual updated von Yugal Kumar
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
Yugal Kumar1.6K views
Classification on multi label dataset using rule mining technique von eSAT Publishing House
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
Introduction to feature subset selection method von IJSRD
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD 807 views
Data repository for sensor network a data mining approach von ijdms
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approach
ijdms263 views
IRJET- Improving the Performance of Smart Heterogeneous Big Data von IRJET Journal
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET Journal12 views
Bj32809815 von IJMER
Bj32809815Bj32809815
Bj32809815
IJMER305 views
Data Analysis and Prediction System for Meteorological Data von IRJET Journal
Data Analysis and Prediction System for Meteorological DataData Analysis and Prediction System for Meteorological Data
Data Analysis and Prediction System for Meteorological Data
IRJET Journal28 views
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da... von theijes
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes53 views

Destacado

STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI. von
STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI.STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI.
STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI.International Journal of Technical Research & Application
235 views3 Folien
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB... von
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...International Journal of Technical Research & Application
264 views4 Folien
STUDY OF CARBOHYDRATE METABOLISM IN SEVERE ACUTE MALNUTRITION AND CORRELATION... von
STUDY OF CARBOHYDRATE METABOLISM IN SEVERE ACUTE MALNUTRITION AND CORRELATION...STUDY OF CARBOHYDRATE METABOLISM IN SEVERE ACUTE MALNUTRITION AND CORRELATION...
STUDY OF CARBOHYDRATE METABOLISM IN SEVERE ACUTE MALNUTRITION AND CORRELATION...International Journal of Technical Research & Application
362 views9 Folien
USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC... von
USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC...USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC...
USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC...International Journal of Technical Research & Application
388 views4 Folien

Destacado(20)

Similar a 130509

Hy3414631468 von
Hy3414631468Hy3414631468
Hy3414631468IJERA Editor
367 views6 Folien
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco... von
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET Journal
19 views4 Folien
Query Processing with k-Anonymity von
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-AnonymityWaqas Tariq
547 views18 Folien
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud von
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudIOSR Journals
625 views8 Folien
Cluster Based Access Privilege Management Scheme for Databases von
Cluster Based Access Privilege Management Scheme for DatabasesCluster Based Access Privilege Management Scheme for Databases
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
210 views10 Folien
Bj32809815 (2) von
Bj32809815 (2)Bj32809815 (2)
Bj32809815 (2)Kalyani Kurra
220 views7 Folien

Similar a 130509(20)

IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco... von IRJET Journal
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET Journal19 views
Query Processing with k-Anonymity von Waqas Tariq
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
Waqas Tariq547 views
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud von IOSR Journals
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
IOSR Journals625 views
Cluster Based Access Privilege Management Scheme for Databases von Editor IJMTER
Cluster Based Access Privilege Management Scheme for DatabasesCluster Based Access Privilege Management Scheme for Databases
Cluster Based Access Privilege Management Scheme for Databases
Editor IJMTER210 views
Achieving Privacy in Publishing Search logs von IOSR Journals
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
IOSR Journals415 views
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY von Editor IJMTER
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER432 views
Data Transformation Technique for Protecting Private Information in Privacy P... von acijjournal
Data Transformation Technique for Protecting Private Information in Privacy P...Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...
acijjournal62 views
Effective data mining for proper von IJDKP
Effective data mining for properEffective data mining for proper
Effective data mining for proper
IJDKP450 views
F04713641 von IOSR-JEN
F04713641F04713641
F04713641
IOSR-JEN218 views
F04713641 von IOSR-JEN
F04713641F04713641
F04713641
IOSR-JEN56 views
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION von cscpconf
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATIONPRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
cscpconf53 views
Ijarcet vol-2-issue-4-1393-1397 von Editor IJARCET
Ijarcet vol-2-issue-4-1393-1397Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397
Editor IJARCET313 views
International Journal of Engineering Research and Development (IJERD) von IJERD Editor
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor386 views
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen... von IJSRD
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
IJSRD 141 views
A study and survey on various progressive duplicate detection mechanisms von eSAT Journals
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
eSAT Journals339 views
An Overview of Data Mining Concepts Applied in Mathematical Techniques von MangaiK4
An Overview of Data Mining Concepts Applied in Mathematical TechniquesAn Overview of Data Mining Concepts Applied in Mathematical Techniques
An Overview of Data Mining Concepts Applied in Mathematical Techniques
MangaiK45 views
An Overview Of Data Mining Techniques And Applications.Pdf von Michele Thomas
An Overview Of Data Mining Techniques And Applications.PdfAn Overview Of Data Mining Techniques And Applications.Pdf
An Overview Of Data Mining Techniques And Applications.Pdf
Michele Thomas31 views

Más de International Journal of Technical Research & Application

EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED... von
EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...
EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...International Journal of Technical Research & Application
294 views7 Folien
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T... von
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...International Journal of Technical Research & Application
312 views6 Folien
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL... von
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...International Journal of Technical Research & Application
222 views4 Folien

Más de International Journal of Technical Research & Application(20)

Último

Structure and Functions of Cell.pdf von
Structure and Functions of Cell.pdfStructure and Functions of Cell.pdf
Structure and Functions of Cell.pdfNithya Murugan
719 views10 Folien
Psychology KS5 von
Psychology KS5Psychology KS5
Psychology KS5WestHatch
119 views5 Folien
unidad 3.pdf von
unidad 3.pdfunidad 3.pdf
unidad 3.pdfMarcosRodriguezUcedo
117 views38 Folien
The Accursed House by Émile Gaboriau von
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile GaboriauDivyaSheta
223 views15 Folien
S1_SD_Resources Walkthrough.pptx von
S1_SD_Resources Walkthrough.pptxS1_SD_Resources Walkthrough.pptx
S1_SD_Resources Walkthrough.pptxLAZAROAREVALO1
64 views57 Folien
UNIDAD 3 6º C.MEDIO.pptx von
UNIDAD 3 6º C.MEDIO.pptxUNIDAD 3 6º C.MEDIO.pptx
UNIDAD 3 6º C.MEDIO.pptxMarcosRodriguezUcedo
134 views32 Folien

Último(20)

Structure and Functions of Cell.pdf von Nithya Murugan
Structure and Functions of Cell.pdfStructure and Functions of Cell.pdf
Structure and Functions of Cell.pdf
Nithya Murugan719 views
Psychology KS5 von WestHatch
Psychology KS5Psychology KS5
Psychology KS5
WestHatch119 views
The Accursed House by Émile Gaboriau von DivyaSheta
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile Gaboriau
DivyaSheta223 views
Use of Probiotics in Aquaculture.pptx von AKSHAY MANDAL
Use of Probiotics in Aquaculture.pptxUse of Probiotics in Aquaculture.pptx
Use of Probiotics in Aquaculture.pptx
AKSHAY MANDAL119 views
The basics - information, data, technology and systems.pdf von JonathanCovena1
The basics - information, data, technology and systems.pdfThe basics - information, data, technology and systems.pdf
The basics - information, data, technology and systems.pdf
JonathanCovena1146 views
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively von PECB
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
PECB 623 views
When Sex Gets Complicated: Porn, Affairs, & Cybersex von Marlene Maheu
When Sex Gets Complicated: Porn, Affairs, & CybersexWhen Sex Gets Complicated: Porn, Affairs, & Cybersex
When Sex Gets Complicated: Porn, Affairs, & Cybersex
Marlene Maheu85 views
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx von Ms. Pooja Bhandare
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptxPharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx
Ms. Pooja Bhandare113 views
11.28.23 Social Capital and Social Exclusion.pptx von mary850239
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptx
mary850239312 views
How to empty an One2many field in Odoo von Celine George
How to empty an One2many field in OdooHow to empty an One2many field in Odoo
How to empty an One2many field in Odoo
Celine George87 views
Relationship of psychology with other subjects. von palswagata2003
Relationship of psychology with other subjects.Relationship of psychology with other subjects.
Relationship of psychology with other subjects.
palswagata200352 views
Classification of crude drugs.pptx von GayatriPatra14
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra14101 views

130509

  • 1. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 7 | P a g e ID Attributes Age Sex Pin code Name Age Sex 1 Anup 22 M 2 Rima 33 F 3 Asha 44 F 4 Sagar 55 M ID Attributes Non Sensitive Sensitive Age Sex Pin code Disease A MODEL FOR PRESERVING PRIVACY OF SENSITIVE DATA Ms Shalini Lamba #1 , Dr S. Qamar Abbas #2 # 1 Research Scholar, Shri Venkateshwara University,Meerut, U.P., India #2 Visiting Guide, Shri Venkateshwara University,Meerut, U.P., India Abstract— Advancement in the field of Information technology has resulted in tremendous growth in data collection and extraction of unknown patters from this huge pool of data which has become the primary objective of any data mining algorithm. Besides being so effective, some sensitive information is also revealed by mining algorithm. The extracted knowledge is highly confidential and it needs refinement before giving to data mining researchers and the public in order to concentrate on privacy concerns. The data mining techniques have been developed by the researchers to be applied on data bases without violating the privacy of individuals. Over the last decade, many techniques for privacy preserving data mining have come up. In this paper we propose a new approach to preserve sensitive information using fuzzy logic. Clustering is done on the original data set, and then we add noise to the numeric data using a fuzzy membership function that results in distorted data. Set of Clusters generated using the fuzzified data is also equivalent to the original cluster as well as privacy is also achieved. It is also proved that the processing time of the data is considerably reduced when compared to the other methods that are being used for this purpose. Index Terms— K-Means, S-Shaped fuzzy Membership Function, Privacy, Clustering (key words) I. INTRODUCTION. Data mining is the process used to analyze large quantities of data and gather useful information from them. It extracts the hidden information from large heterogeneous databases in many different dimensions and finally summarizes it into categories and relations of data [1] In order to learn a system in detailed manner, we should be able to decrease the system complexity and increase our understanding about the system. For any application, if the information available is imprecise then fuzzy reasoning provides a better solution [13].The primary goal of privacy preserving is to hide the sensitive data before it gets published. For example, a hospital may release patient’s records to enable the researchers to study the characteristics of various diseases. The raw data contains some sensitive information of individuals, which are not published to protect individual privacy. However, using some other published attributes and some external data we can retrieve the personal identities. Table 1 shows a sample data published by a hospital after hiding sensitive attributes (Ex. Patient’s name). Table 1 – Patient Medical Data ID Attributes Age Sex Pin code Disease 1 22 M 789001 Dengu 2 33 F 789002 Dengu 3 44 F 789003 SwineFlu 4 55 M 789004 Malaria Table 2 - Patients Registration List Table 2 shows a sample patient’s registration list. If an opponent has access to this table he can easily identify the information about all the patients by comparing the two tables using the attributes like (Pin-code, age, sex). One can clearly identify who is having Dengu, Swine flu Malaria etc which is sensitive attribute. This Microdata is a valuable source of information for the research and allocation of public funds, trend analysis and medical research. Our work is publishing this data without revealing sensitive information about them as shown in Table 3. Table 3- A 2-ANONYMOUS TABLE
  • 2. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 8 | P a g e 1 2* * 7890* Dengu 2 3* F 7890* Dengu 3 4* F 7890* SwineFlu 4 5* * 7890* Malaria This idea of using fuzzy logic is applied to preserve the individual information while revealing the details in public. This paper mainly focuses on converting the sensitive data into modified data by using S – shaped fuzzy membership function. K means clustering algorithm is applied on the modified data and it is found that the relativity of the data is also maintained. There are a number of methods used for preserving the privacy of the data while clustering. Some of the methods are use of cryptographic algorithms, noise addition, and data swapping. All of these methods introduce a bit of complexity in the algorithm and increase the processing time. Our main aim is to reduce this processing time and at the same time provide an optimum solution to the problem of privacy preserving. For this purpose we are using the concept of fuzzy approach. S – shaped fuzzy membership function is given by, Where x is the value of the sensitive attribute, a & b is the minimum and maximum value in the sensitive attribute list. The rest of the paper is organized as follows: Section 2 describes the various methods that can be used for privacy preserving in data mining. Section 3 explains the proposed system based around the fuzzy based membership function approach and how it can be used for privacy preserving. Section 4 shows the proposed method experimental result and comparison with K means algorithm. II. LITERATURE SURVEY In recent year’s lot of research work has been carried out to preserve data privacy before releasing the data for various research purposes which adopts various techniques like Data Auditing, Data Modification, Cryptographic methods and k- anonymity. A. Modification-Based Techniques A number of techniques have been developed for a quantity of data mining techniques like classification, association rule discovery and clustering, based on the hypothesis that discerning data modification or sanitization is an NP-Hard problem, and for this basis, alteration can be used to address the complexity issues. • Swapping values between records (e.g. [10]) • Replacing the original database by a sample from the same distribution (e.g. [18] [21] [23]) • Adding noise to the values in the database (e.g. [22] [11]) • Adding noise to the results of a query (e.g. [20]) • Sampling the result of a query (e.g. [25]). B. Cryptography-based techniques In Cryptographic methods [2] data is encrypted using protocols like secured multiparty computation (SMC). It is a study of mathematical techniques, related to aspects of information security such as confidentiality, data integrity, entity authentication, and data origin authentication is shaping the way that information is safely and securely transmitted over the Internet. Sensitive information is quit large, Such as Credit Card Information, Social Security Numbers, Private correspondence, Military statement, Bank account information. C. Refurbishing-based techniques Refurbishing-based techniques are techniques where the original circulation of the data is reconstructed from the randomized data. Algorithm for Refurbishing Step 1 : Creating randomized data replica by data perturbation of entity data records. Step 2 : Recreate distributions, not values in individual records. Step 3 : By means of using the reconstructed distributions, fabricate the original data. For refurbishing the predicament of reconstructing original distribution from a given distribution can be viewed in the broad-spectrum framework of inverse problems [24]. In [12], it was publicized that for smooth sufficient distributions (e.g. slowly varying time signals), it is possible to fully recuperate original distribution from Proceedings of the World Congress on Engineering 2008 Vol I non-overlapping, adjoining partial sums. Such partial sums of true values are not available to us. We cannot formulate priori assumptions about the original distribution; we merely be acquainted with the distribution used in randomizing values of an attribute. There is rich query optimization literature on estimating attribute distributions from partial information [17]. In the OLAP literature, there is work on approximating queries on sub-cubes from higher-level aggregations (e.g. [19]). However, these works did not have to cope with information that has been intentionally distorted. Kargupta et al. used a data refurbishing approach to derive private information from a disguised data set. Namely, a new data set X¤ is reconstructed from the disguised data using certain algorithms, and the difference between X¤ and the actual original data set X indicates how much private information can be disclosed. The further apart X¤ is from X, the higher level of the privacy preservation is achieved. Therefore, the difference between X¤ and X can be used as the measure to quantify how much privacy is preserved. In Noise addition methods [3] we add some random noise (number) to numerical attributes. This random number is usually drawn from a normal distribution with a small standard deviation and with zero mean. Noise is added in a controlled
  • 3. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 9 | P a g e way so as to maintain means, variances and co-variances of the attributes of a data set. However, noise addition to categorical attributes is not as straightforward as the noise addition to numerical attributes, due to the absence of natural ordering in categorical values. Data swapping [4] [5] interchange the attribute values between different records. Similar attribute values are interchanged with higher probability. The unique feature of this approach is all original values are kept back within the data set and only the positions are swapped. In Aggregation [6] [7] instead of individual values the records are replaced by a group representative. Aggregation refers to both combining a few attribute values into one, or grouping a few records together and replacing them with a group representative. In [15] the clustering operation is performed after applying 2-dimensional transformations to the data. A different approach for privacy preservation in data mining is given in [16]. This introduces the concept of fuzzy sets which is just an extension to the generic set theory. By using fuzzy sets we can perform a gradual assessment of the data set given to us and this is done by using a fuzzy membership function. Each linguistic term can be represented as a fuzzy set having its own membership function. Fuzzy c-Means (FCM) can be used for clustering (Fig. 1). But any element in the set may have membership in more than one category [14]. Fig.1 For Refurbishing III. PROPOSED SYSTEM A. Objective of the Problem The sensitive attribute can be selected from the numerical data and it can be modified by any data modification technique. After modification, the modified data can be released to data mining researchers or any agency or firm. If they can apply data mining techniques such as clustering, classification, etc for data analysis, the modified table does not affect the result. In this work, we have applied k-means clustering algorithm to the modified data and verified the result. The steps involved in this work are, 1. Sensitive Attribute Selection. 2. Data Transformation perturbative masking technique for modifying the sensitive attribute. 3. Applying k-means algorithm for original and fuzzified data. 4. Compare the processing time of the original and fuzzified data. B. Proposed Algorithm Algorithm for Data Transformation 1. Consider a database D consists of T tuples. D={t1,t2,…tn}. Each tuple in T consists of set of attributes T={A1,A2,…Ap} where Ai €T and Ti € D. 2. Identify the sensitive or confidential numeric attribute AR 3. Identified sensitive attribute values are modified by using S – shaped fuzzy membership function and the fuzzified data is sent back to the user. 4. The received data is grouped into different clusters Using K – means algorithm.The objective of the Clustering algorithms is to group the similar data together depending upon the characteristics they possess.[1]K-means clustering clusters the similar data with the help of the mean value and squared error criterion.[1]Apply the k-means clustering for both original and modified data set to get the clusters. C. Proposed system architecture
  • 4. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 10 | P a g e Fig.2 Proposed system architecture IV. RESULT ANALYSIS For our experimental purpose we have used data sets like census details, air distance for our experiment. For a given original data, the equivalent fuzzified data is given in Table 4. Table 5 shows the result obtained by performing the clustering on the original data and the result of the clustering performed on the fuzzy data. It is evident that in both the cases, resultant clusters contain the same set of elements. This shows that, the use of fuzzy conversion does not hamper the relativity of the cluster data. Table 4 - Example Data Set Original Data 2 4 10 12 3 20 30 11 25 Fuzzified Data 0 0.0102 0.1632 0.2551 0.2551 0.7449 1 0.2066 0.9362 Say for the original data set {2,4,10,12,3,20,30,11,25}-we calculate the mean which is 13 and thus form two clusters, one of which has elements less than 13 and the other with elements greater than 13 as shown in Table 5. We now calculate the fuzzified data using S – shaped fuzzy membership function where x is the original data set {2,4,10,12,3,20,30,11,25} and a is 2 and b is 30. Case 1 when x=2 then the fuzzified data will be 0 as (x=2)<=(a=2) Case 2 when x=4 then the fuzzified data will be 2((4- 2)/(30-2))2=2(2/28)2=1/98=0.0102 Case 3 when x=10 then the fuzzified data will be 2((10- 2)/(30-2))2=2(8/28)2=8/49=0.1632 and so on for other data sets. These fuzzified data would also form clusters as above mentioned for original data. The K- means algorithm will perform the following three steps until convergence and it Iterates until stable (= no object move group): Step 1: Determine the centric coordinate.(centroid) Step 2: Determine the distance of each item to the centric. Step 3: Group the item based on minimum distance. Fig.3 Flowchart Table 5 - Clustering Output Cluster 1 Data Cluster 2 Data {2,4,10,12,3,11} {20,30,25} {0, 0.0102, 0.1632, 0.2551,0.2066} {0.7449,1,0.9362} Finally, we are going to analyze the data utility using cluster analysis. The original and perturbed data will be separately clustered using K-Means clustering algorithm; the two outputs are compared in terms of their statistical efficiency. From our experiment we found out that by using fuzzy approach, the processing time of the data is considerably reduced when compared to the other methods that are being used for this purpose. In the particular example that we took,
  • 5. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 1, Issue 3 (july-August 2013), PP. 07-11 11 | P a g e we used an S-shaped fuzzy membership function in order to transform the original data into fuzzy data. Table 6 - Comparison Table Method Used No of Passes No of Clusters K-Means 5 2 Fuzzy 4 2 V. CONCLUSION Protecting the sensitive data and also extracting knowledge is a very complicated problem. This paper presents an approach to preserve the individual’s details by transforming the original data into fuzzy data using S shaped fuzzy membership function. The main advantage of this method is that it maintains the privacy and at the same time preserves the relativity between the data values. From the results obtained by our experiments (Table 6) it is proved that performing k-means algorithm on fuzzy data increases the efficiency of the process by decreasing the number of passes required to perform the clustering. We have used numerical data for our experimentation purpose and similarly this method can be extended to categorical data. In our process, the nature of the fuzzy membership function used also affects the processing time of the algorithm and hence we can improve the working of this process by applying a different fuzzy membership function. In future this work can also be extended for data classification purpose and we would develop new masking techniques for protecting the categorical attributes. REFERENCES [1] Sairam et al “Performance Analysis of Clustering Algorithms in Detecting outliers”,International Journal of Computer Science and Information Technologies, Vol. 2 (1) , Jan-Feb 2011, 486- 488. [2] Pinkas,.”Cryptographic Techniques for Privacy-Preserving Data Mining”, ACM SIGKDD Explorations, 4(2), 2002. [3] Agrawal D, Aggarwal C.C, “On the Design and Quantification of Privacy Preserving Data mining algorithms”, ACM PODS Conference,2002. [4] Fienberg S.E. and McIntyre J. “Data Swapping:Variations on a theme by Dalenius and Reiss.” In Journal of Official Statistics,21:309-323,2005. [5] Muralidhar K. and Sarathy R. “ Data Shufflinga new masking approach for numerical data”,Management Science, forthcom- ing, 2006. [6] Y.Li,S.Zhu,L.Wang, and S.Jajodia “ A privacyenhanced micro- aggregation method”, In Poc. Of 2nd International Symposium on Foundations of Information and Knowledge Systems, pp148- 159, 2002. [7] V.S. Iyengar, “Transforming data to satisfy privacy constraints”, In Proc. of SIGKDD’02,Edmonton, Alberta, Canada,2002. [8] Shuting Xu.,Shuhua Lai, “Fast Fourier Transform based data perturbation method for privacy protection”, In Proc. of IEEE conference on Intelligence and Security Informatics, New Brunswick New Jersey, May 2007. [9] Shibanth Mukharjee, Zhiyuan Chen, Arya Gangopadhyay,“A privacy preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms”, The VLDB journal 2006. [10] D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M.Hellerstein, Y. Ioannidis, H. V.Jagadish, T. Johnson, R.Ng, V.Poosala, and K. Sevcik. The New Jersey Data Reduction Report.Data Engrg. Bull., 20:3{45, Dec. 1997. [11] D. Dobkin, A.K. Jones, and R.J. Lipton. Secure databases:Protection against user influence. ACM TODS, 4(1):97-106, March 1979. [12] [12] D. Meng, K. Sivakumar, and H. Kargupta. Privacy sensitive bayesian network parameter learning. In The Fourth IEEE International Conference on Data Mining(ICDM), Brighton, UK, November 2004. [13] Zadeh L “Fuzzy sets”, Inf. Control. Vol.8, PP,338 – 353, 1965. [14] Timothy J. Ross “Fuzzy Logic with Engineering Applications”, McGraw Hill International Editions, 1997. [15] R.R.Rajalaxmi , A.M.Natarajan “An EffectiveData Transformation Approach for Privacy Preserving Clustering”, Journal of Computer Science 4(4): 320-326, 2008. [16] V.Vallikumari, S.Srinivasa Rao, KVSVN Raju, KV Ramana, BVS Avadhani “Fuzzy based approach for privacy preserving publication of data”, IJCSNS, Vol.8 No.1, January 2008. [17] B. Pinkas. Cryptographic techniques for privacy-preserving data mining. SIGKDD Explorations, 4(2), December 2002. [18] E. Lefons, A. Silvestri, and F. Tangorra. An analytic approach to statistical databases. In 9th Int. Conf. Very Large Data Bases, pages 260-- 274. Morgan Kaufmann, Oct-Nov 1983. [19] D.E. Denning. Cryptography and Data Security.Addison- Wesley, 1982. [20] H.W. Engl, M. Hanke, and A. Neubaue. Regularization of Inverse Problems. Kluwer, 1996. [21] Dakshi Agrawal and Charu C. Aggarwal, On the design andquantification of Privacy Preserving Data Mining algorithms, In Proceedings of the 20th ACM Symposium on Principles of Database Systems (2001), 247–255. [22] J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Ddata Mining, 2003. [23] D.E. Denning. Secure statistical databases with random sample queries. ACM TODS, 5(3):291-315, Sept. 1980. [24] R. Agrawal and R. Srikant. Privacy Preserving Data Mining . In Proceedings of the ACM SIGMOD, pages 439–450, 2000. [25] R. Conway and D. Strip. Selective partial access to a database.In Proc. ACM Annual Con].,pages 85-89, 1976.