SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Complex Adaptive Systems 2012 – Washington DC USA,
                                                  November 14-16




Towards A Differential Privacy and Utility Preserving
           Machine Learning Classifier

       Kato Mivule, Claude Turner, and Soo-Yeon Ji

             Computer Science Department
                Bowie State University

  Complex Adaptive Systems 2012 – Washington DC USA,
                    November 14-16

                                                                                     1
Complex Adaptive Systems 2012 – Washington DC USA,
Outline                                   November 14-16




     Introduction
     Related work
     Essential Terms
     Methodology
     Results
     Conclusion




                                                                       2
Introduction

                 Entities transact in ‘big data’ containing personal identifiable
                  information (PII).

                 Organizations are bound by federal and state law to ensure data privacy.

                 In the process to achieve privacy, the utility of privatized datasets
                  diminishes.

                 Achieving balance between privacy and utility is an ongoing problem.

                 Therefore, we investigate a differential privacy preserving machine
                  learning classification approach that seeks an acceptable level of
                  utility.


Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                         3
Related Work
   There is a growing interest in investigating privacy preserving data mining
   solutions that provide a balance between data privacy and utility.

            Kifer and Gehrke (2006) did a broad study of enhanced data utility in
             privacy preserving data publishing by using statistical approaches.

            Wong (2007) described how achieving global optimal privacy while
             maintaining utility is an NP-hard problem.

            Krause and Horvitz (2010) noted that endeavours of finding trade-offs
             between privacy and utility is still an NP-hard problem.

            Muralidhar and Sarathy (2011) showed that differential privacy provides
             strong privacy guarantees but utility is still a problem due to noise levels.

            Finding the optimal balance between privacy and utility remains a
             challenge—even with differential privacy.                                       4
Complex Adaptive Systems 2012 – Washington DC USA, November 14-16
Data Utility verses Privacy

          Data utility is the extent of how useful a published dataset is to the
           consumer of that publicized dataset.

          In the course of a data privacy process, original data will lose statistical
           value despite privacy guarantees.




                                                 Image Source: Kenneth Corbin/Internet News.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                              5
Objective

                  Achieving an optimal balance between data privacy and utility
                   remains an ongoing challenge.

                  Such optimality is highly desired and remains our investigation goal.




                                                 Image Source: Wikipedia, on Confidentiality.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                               6
Ensemble classification
          Is a machine learning process, in which a collection of several
           independently trained classifiers are merged to achieve better prediction.




          Examples include single trained decision trees joined to make accurate
           predictions.
Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                   7
AdaBoost Ensemble – Adaptive Boosting
          Proposed by Freund and Schapire (1995), uses several iterations by adding weak
           learners to create a powerful learner, adjusting weights to center on misclassified
           data in earlier iterations.

          Classification Error in AdaBoost Ensemble is computed as below:




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                                8
AdaBoost Ensemble (Cont’d )
          AdaBoost Ensemble computes as follows:




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16   9
Differential Privacy




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16   10
Differential Privacy (Cont’d)




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16   11
Methodology (Cont’d)

          We utilized a public available Barack Obama 2008 campaign donations dataset.

          The data set contained 17,695 records of original unperturbed data.

          Two attributes, the donation amount and income status, are utilized to classify data
           into three classes.

          The three classes are low income, middle income, and high income, for donations
           $1 to $49, $50 to $80, $81 and above respectively.

          Validating our approach, the dataset comprised 50 percent on training and the
           remainder on testing, on both Original and Privatized datasets.

          Oracle database is queried via MATLAB ODBC connector. MATLAB is used for
           differential privacy and machine learning classification.


Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                           12
Results

          Essential statistical traits of the original and differential privacy datasets,
           a necessary requirement to publish privatized datasets, are kept.

          As depicted, the mean, standard deviation, and variance of the original
           and differential privacy datasets remained the same.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                        13
Results (Cont’d)
          There is a strong positive covariance of 1060.8 between the two datasets, which
           means that they grow simultaneously, as illustrated below:




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                            14
Results (Cont’d)
          There is almost no correlation (the correlation was 0.0054) between the
           original and differentially privatized datasets.

          Indicates some privacy assurances, and difficulty for an attacker, dealing
           only with the privatized dataset, to correctly infer any alterations.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                       15
Results (Cont’d)
          After applying differential privacy, AdaBoost ensemble classifier is
           performed.

          The outcome of the donors’ dataset was Low, Middle, and High income,
           for donations 0 to 50, 51 to 80, and 81 to 100, respectively.

          This same classification outcome is used for the perturbed dataset to
           investigate whether the classifier would categorize the perturbed dataset
           correctly.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                  16
Results (Cont’d)

          The training dataset from the original data showed that the classification
           error dropped from 0.25 to 0 with increased weak decision tree learners.

          The results changed with the training dataset on the differentially private
           data when the classification error dropped from 0.588 to 0.58.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                   17
Results (Cont’d)
   When the same procedure is applied to the test dataset of the original data the
    classification error dropped from 0.03 to 0.

   However, when this procedure perform on the differentially private data, the error rate
    did not change even with increased number of weak decision tree.




Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                       18
Conclusion
   In this study, we found that while differential privacy might guarantee strong
    confidentiality, providing data utility still remains a challenge.

   However, this study is instructive in a variety of ways:

               The level of Laplace noise does affect the classification error.

               Increasing the number of weak learners is not too significant.

               Adjusting the Laplace noise parameter, ε, is essential for further study.

               However, accurate classification means loss of privacy.

               Tradeoffs must be made between privacy and utility.

               We plan on investigating optimization approaches for such tradeoffs.
Complex Adaptive Systems 2012 – Washington DC USA, November 14-16                       19
Complex Adaptive Systems 2012 – Washington DC USA,
  Questions?                                          November 14-16




Contact:
Kato Mivule: kmivule@gmail.com



                            Thank You.




                                                                                    20

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Privacy, security and ethics in data science
Privacy, security and ethics in data sciencePrivacy, security and ethics in data science
Privacy, security and ethics in data scienceNikolaos Vasiloglou
 
What is Differential Privacy?
What is Differential Privacy?What is Differential Privacy?
What is Differential Privacy?Georgian
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Simplilearn
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureRajesh Piryani
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)Learnbay Datascience
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is funZhen Li
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 

Was ist angesagt? (20)

Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Privacy, security and ethics in data science
Privacy, security and ethics in data sciencePrivacy, security and ethics in data science
Privacy, security and ethics in data science
 
What is Differential Privacy?
What is Differential Privacy?What is Differential Privacy?
What is Differential Privacy?
 
Decision tree
Decision treeDecision tree
Decision tree
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Ch08
Ch08Ch08
Ch08
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Andere mochten auch

Wonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley IntroductionWonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley Introductiongmesmatch
 
Wmit introduction 2012 english
Wmit introduction 2012 englishWmit introduction 2012 english
Wmit introduction 2012 englishgmesmatch
 
Presentazione Peopleware Marcom
Presentazione Peopleware MarcomPresentazione Peopleware Marcom
Presentazione Peopleware Marcomrobertoiacobino
 
Реальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовРеальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовarsney
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plansAji Subramanyan
 
June 2013 IRMAC slides
June 2013 IRMAC slidesJune 2013 IRMAC slides
June 2013 IRMAC slidesAlistair Croll
 
Baker Business Bootcamp
Baker Business BootcampBaker Business Bootcamp
Baker Business BootcampLGLG Ministry
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013Jennifer L. Scheffer
 
4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip4 Seasons Virtual Field Trip
4 Seasons Virtual Field Triphhfricke
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaKato Mivule
 
Resolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina DisplaysResolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina Displayssteveschrab
 
17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringan17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringanAn Atsa
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
 

Andere mochten auch (20)

AM01PRO
AM01PROAM01PRO
AM01PRO
 
Wonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley IntroductionWonju Medical Industry Techno Valley Introduction
Wonju Medical Industry Techno Valley Introduction
 
Carta mordiscon
Carta mordisconCarta mordiscon
Carta mordiscon
 
Wmit introduction 2012 english
Wmit introduction 2012 englishWmit introduction 2012 english
Wmit introduction 2012 english
 
Presentazione Peopleware Marcom
Presentazione Peopleware MarcomPresentazione Peopleware Marcom
Presentazione Peopleware Marcom
 
Iltabloidmotori
IltabloidmotoriIltabloidmotori
Iltabloidmotori
 
About P&T
About P&TAbout P&T
About P&T
 
Реальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторовРеальные углы обзора видеорегистраторов
Реальные углы обзора видеорегистраторов
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plans
 
June 2013 IRMAC slides
June 2013 IRMAC slidesJune 2013 IRMAC slides
June 2013 IRMAC slides
 
Baker Business Bootcamp
Baker Business BootcampBaker Business Bootcamp
Baker Business Bootcamp
 
Oumh1103 bab 4
Oumh1103 bab 4Oumh1103 bab 4
Oumh1103 bab 4
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
 
4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip4 Seasons Virtual Field Trip
4 Seasons Virtual Field Trip
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
 
Resolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina DisplaysResolution Independence - Preparing Websites for Retina Displays
Resolution Independence - Preparing Websites for Retina Displays
 
17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringan17.mengadministrasi server dalam_jaringan
17.mengadministrasi server dalam_jaringan
 
Vocab dict
Vocab dictVocab dict
Vocab dict
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
 
Burton Industries ppt 2012
Burton Industries ppt 2012Burton Industries ppt 2012
Burton Industries ppt 2012
 

Ähnlich wie Towards A Differential Privacy Preserving Utility Machine Learning Classifier

Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databasestusharjadhav2611
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine LearningDelip Rao
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moVinaOconner450
 
Government Linked Data Projects in the Wild
Government Linked Data Projects in the WildGovernment Linked Data Projects in the Wild
Government Linked Data Projects in the WildBernadette Hyland-Wood
 
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESSDETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESSijcsit
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs3 Round Stones
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison Esteban Alcaide
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls Dan Elton
 
Second Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the BiosciencesSecond Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the BiosciencesPhilip Bourne
 
TID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseTID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseWanBK Leo
 
Information Literacy in an Age of Algorithms
Information Literacy in an Age of AlgorithmsInformation Literacy in an Age of Algorithms
Information Literacy in an Age of AlgorithmsKristen Yarmey
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsHong-Linh Truong
 
Modeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender SystemsModeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender Systemskib_83
 

Ähnlich wie Towards A Differential Privacy Preserving Utility Machine Learning Classifier (20)

Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
Datamodelling
DatamodellingDatamodelling
Datamodelling
 
Discussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated moDiscussion 1Knowledge-centric organizations have incorporated mo
Discussion 1Knowledge-centric organizations have incorporated mo
 
Government Linked Data Projects in the Wild
Government Linked Data Projects in the WildGovernment Linked Data Projects in the Wild
Government Linked Data Projects in the Wild
 
parth presentation
parth presentationparth presentation
parth presentation
 
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESSDETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
DETERMINING BUSINESS INTELLIGENCE USAGE SUCCESS
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
 
Data mining software comparison
Data mining software comparison Data mining software comparison
Data mining software comparison
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
Second Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the BiosciencesSecond Open Economics Workshop - Thoughts from the Biosciences
Second Open Economics Workshop - Thoughts from the Biosciences
 
Determining Business Intelligence Usage Success
Determining Business Intelligence Usage SuccessDetermining Business Intelligence Usage Success
Determining Business Intelligence Usage Success
 
Determining Business Intelligence Usage Success
Determining Business Intelligence Usage SuccessDetermining Business Intelligence Usage Success
Determining Business Intelligence Usage Success
 
TID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseTID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To Database
 
Information Literacy in an Age of Algorithms
Information Literacy in an Age of AlgorithmsInformation Literacy in an Age of Algorithms
Information Literacy in an Age of Algorithms
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data Analytics
 
Modeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender SystemsModeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender Systems
 

Mehr von Kato Mivule

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization Kato Mivule
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialKato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Kato Mivule
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsKato Mivule
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Kato Mivule
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoostKato Mivule
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...Kato Mivule
 

Mehr von Kato Mivule (20)

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy EngineeringKato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule - Towards Agent-based Data Privacy Engineering
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
 

Kürzlich hochgeladen

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Towards A Differential Privacy Preserving Utility Machine Learning Classifier

  • 1. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 Towards A Differential Privacy and Utility Preserving Machine Learning Classifier Kato Mivule, Claude Turner, and Soo-Yeon Ji Computer Science Department Bowie State University Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 1
  • 2. Complex Adaptive Systems 2012 – Washington DC USA, Outline November 14-16  Introduction  Related work  Essential Terms  Methodology  Results  Conclusion 2
  • 3. Introduction  Entities transact in ‘big data’ containing personal identifiable information (PII).  Organizations are bound by federal and state law to ensure data privacy.  In the process to achieve privacy, the utility of privatized datasets diminishes.  Achieving balance between privacy and utility is an ongoing problem.  Therefore, we investigate a differential privacy preserving machine learning classification approach that seeks an acceptable level of utility. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 3
  • 4. Related Work There is a growing interest in investigating privacy preserving data mining solutions that provide a balance between data privacy and utility.  Kifer and Gehrke (2006) did a broad study of enhanced data utility in privacy preserving data publishing by using statistical approaches.  Wong (2007) described how achieving global optimal privacy while maintaining utility is an NP-hard problem.  Krause and Horvitz (2010) noted that endeavours of finding trade-offs between privacy and utility is still an NP-hard problem.  Muralidhar and Sarathy (2011) showed that differential privacy provides strong privacy guarantees but utility is still a problem due to noise levels.  Finding the optimal balance between privacy and utility remains a challenge—even with differential privacy. 4 Complex Adaptive Systems 2012 – Washington DC USA, November 14-16
  • 5. Data Utility verses Privacy  Data utility is the extent of how useful a published dataset is to the consumer of that publicized dataset.  In the course of a data privacy process, original data will lose statistical value despite privacy guarantees. Image Source: Kenneth Corbin/Internet News. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 5
  • 6. Objective  Achieving an optimal balance between data privacy and utility remains an ongoing challenge.  Such optimality is highly desired and remains our investigation goal. Image Source: Wikipedia, on Confidentiality. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 6
  • 7. Ensemble classification  Is a machine learning process, in which a collection of several independently trained classifiers are merged to achieve better prediction.  Examples include single trained decision trees joined to make accurate predictions. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 7
  • 8. AdaBoost Ensemble – Adaptive Boosting  Proposed by Freund and Schapire (1995), uses several iterations by adding weak learners to create a powerful learner, adjusting weights to center on misclassified data in earlier iterations.  Classification Error in AdaBoost Ensemble is computed as below: Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 8
  • 9. AdaBoost Ensemble (Cont’d )  AdaBoost Ensemble computes as follows: Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 9
  • 10. Differential Privacy Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 10
  • 11. Differential Privacy (Cont’d) Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 11
  • 12. Methodology (Cont’d)  We utilized a public available Barack Obama 2008 campaign donations dataset.  The data set contained 17,695 records of original unperturbed data.  Two attributes, the donation amount and income status, are utilized to classify data into three classes.  The three classes are low income, middle income, and high income, for donations $1 to $49, $50 to $80, $81 and above respectively.  Validating our approach, the dataset comprised 50 percent on training and the remainder on testing, on both Original and Privatized datasets.  Oracle database is queried via MATLAB ODBC connector. MATLAB is used for differential privacy and machine learning classification. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 12
  • 13. Results  Essential statistical traits of the original and differential privacy datasets, a necessary requirement to publish privatized datasets, are kept.  As depicted, the mean, standard deviation, and variance of the original and differential privacy datasets remained the same. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 13
  • 14. Results (Cont’d)  There is a strong positive covariance of 1060.8 between the two datasets, which means that they grow simultaneously, as illustrated below: Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 14
  • 15. Results (Cont’d)  There is almost no correlation (the correlation was 0.0054) between the original and differentially privatized datasets.  Indicates some privacy assurances, and difficulty for an attacker, dealing only with the privatized dataset, to correctly infer any alterations. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 15
  • 16. Results (Cont’d)  After applying differential privacy, AdaBoost ensemble classifier is performed.  The outcome of the donors’ dataset was Low, Middle, and High income, for donations 0 to 50, 51 to 80, and 81 to 100, respectively.  This same classification outcome is used for the perturbed dataset to investigate whether the classifier would categorize the perturbed dataset correctly. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 16
  • 17. Results (Cont’d)  The training dataset from the original data showed that the classification error dropped from 0.25 to 0 with increased weak decision tree learners.  The results changed with the training dataset on the differentially private data when the classification error dropped from 0.588 to 0.58. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 17
  • 18. Results (Cont’d)  When the same procedure is applied to the test dataset of the original data the classification error dropped from 0.03 to 0.  However, when this procedure perform on the differentially private data, the error rate did not change even with increased number of weak decision tree. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 18
  • 19. Conclusion  In this study, we found that while differential privacy might guarantee strong confidentiality, providing data utility still remains a challenge.  However, this study is instructive in a variety of ways:  The level of Laplace noise does affect the classification error.  Increasing the number of weak learners is not too significant.  Adjusting the Laplace noise parameter, ε, is essential for further study.  However, accurate classification means loss of privacy.  Tradeoffs must be made between privacy and utility.  We plan on investigating optimization approaches for such tradeoffs. Complex Adaptive Systems 2012 – Washington DC USA, November 14-16 19
  • 20. Complex Adaptive Systems 2012 – Washington DC USA, Questions? November 14-16 Contact: Kato Mivule: kmivule@gmail.com Thank You. 20