SlideShare a Scribd company logo
1 of 14
Download to read offline
Technische Universität München
Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn
Chair for Biomedical Informatics
Institute for Medical Statistics and Epidemiology
Klinikum rechts der Isar der TU München
An Experimental Comparison of
Globally-Optimal Data
De-Identification Algorithms
Technische Universität München
Optimal de-identification algorithms
• Generalization hierarchies
• Pruning: predictive tagging
• Optimization: roll-up
• Privacy models, e.g.: k-anonymity, l-diversity, t-closeness, δ-presence
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 2
• Generalization lattice
K=2K=2
Age Gender Zipcode
34 male 81667
45 female 81667
66 male 81925
70 female 81925
70 male 81925
Age Gender Zipcode
20-60 * 81667
20-60 * 81667
≥ 61 * 81925
≥ 61 * 81925
≥ 61 * 81925
Technische Universität München
Algorithms – Incognito
• LeFevre et al.
– SIGMOD 2005
• Dynamic programming
– Breadth-first search on lattices for powerset of quasi-identifiers
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 3
Technische Universität München
Algorithms – OLA & Flash
• Emam et al.
– JAMIA 2009
• Divide & conquer
– Optimal Lattice Anonymization
– Binary search on sublattices
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 4
• Kohlmayer & Prasser et al.
– PASSAT 2012
• Greedy search
– Binary depth-first search
– Total order & priority queue
Technische Universität München
Algorithms – BFS, DFS & Questions
• Generic search methods
– Breadth-first search (BFS)
– Depth-first search (DFS)
→ Extended to use predictive tagging
• Research questions
– How do the algorithms compare in terms of performance?
– Are there further differences between them?
– Are the algorithms' properties influenced by the privacy models
used?
– How do problem-specific methods compare to generic search
algorithms?
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 5
Technische Universität München
Benchmark – Method
• Use all reasonable combinations of common privacy models with
typical parameters
– (k)-anonymity, (l)-diversity, (t)-closeness, (δ)-presence
• Properties of the search space are influenced by combining privacy
models:
– (k), (l), (t), (δ)
– (k, l), (k, t), (k, δ), (l, δ), (t, δ)
– (k, l, δ), (k, t, δ)
• Report three basic performance measures
– Pruning power: number of anonymity checks
– Optimizability: number of roll-ups
– Execution times in a highly efficient runtime environment (ARX)
• Five well-known benchmark datasets
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 6
Technische Universität München
Results – Averaged over datasets
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 7
#Roll-ups#ChecksExec.time[s]
Lower is
better
Higher is
better
Lower is
better
●
Allows analyzing variations in results for different sets of privacy models
Technische Universität München
Results – Averaged over datasets
●
Repeating patterns
→ Consistent results for different configurations
→ Differences between algorithms not influenced by privacy models used
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 8
#Roll-ups#ChecksExec.time[s]
Lower is
better
Higher is
better
Lower is
better
Technische Universität München
Results – Averaged over datasets
●
Breadth-first search is a worst-case strategy
→ No pruning-power, no optimizability
→ Incognito suffers from similar performance problems
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 9
#Roll-ups#ChecksExec.time[s]
Lower is
better
Higher is
better
Lower is
better
Technische Universität München
Results – Averaged over datasets
●
Depth-first search is pretty efficient
→ Can outperform domain-specific methods (OLA)
→ Because of its optimizability (best method in terms of #roll-ups)
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 10
#Roll-ups#ChecksExec.time[s]
Lower is
better
Higher is
better
Lower is
better
Technische Universität München
Results – Averaged over datasets
●
Number of checks: OLA < Flash < DFS < Incognito < BFS
●
Number of roll-ups: DFS > Flash > Incognito > OLA > BFS
●
Execution times: Flash < OLA < DFS < Incognito < BFS
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 11
#Roll-ups#ChecksExec.time[s]
Lower is
better
Higher is
better
Lower is
better
Technische Universität München
Results – Averaged over privacy models
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 12
– OLA provides performance comparable to Flash for smaller datasets
– DFS provides performance comparable to Flash for larger datasets
#Checks#Roll-upsExec.time[s]
Lower is
better
Higher is
better
Lower is
better
●
Shows variations in
results for different
datasets
●
Algorithms exhibit
similar properties
●
Flash provides the
best overall
performance
●
Differences are
mostly independent
of datasets
●
But
Technische Universität München
Lessons learned
• In general, domain-specific algorithms outperform generic methods
→ Up to several orders of magnitude (BFS)
→ OLA and Flash only check between 0.2% and 1.1% of all
transformations in the solution space
→ Not necessarily true for large datasets (DFS)
• Flash effectively balances optimizability with pruning power
→ Should be used if optimized runtime environments are available
• OLA provides best pruning power
→ Should be used in general-purpose environments
• DFS outperforms OLA for large datasets
→ In these cases, optimizability is more important than pruning power
→ Optimized runtime environments required
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 13
Technische Universität München
Thank you for your attention!
• ARX is free software
– Download – Use – Contribute
– Repository: https://github.com/arx-deidentifier/arx
• Further information
– Website: http://arx.deidentifier.org
– Contact
●
Fabian Prasser (prasser@in.tum.de)
●
Florian Kohlmayer (florian.kohlmayer@tum.de)
F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization
Methods for Biomedical Data, CBMS 2014
12/19/16 14

More Related Content

What's hot

Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Stuart Chalk
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Stuart Chalk
 
Standardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarStandardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarAhmad C. Bukhari
 
Model management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biologyModel management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biologyUniversity Medicine Greifswald
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningYashwant Rautela
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...Christiane Behnert
 
Application of-statistics-in-CSE
Application of-statistics-in-CSEApplication of-statistics-in-CSE
Application of-statistics-in-CSEMashudRana9
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sourcesCraig Knoblock
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraphOpenAIRE
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineeringJannatulFerdous160
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...Craig Knoblock
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsTobias Kuhn
 
TU Graz contribution to the AGILE-IoT project
TU Graz contribution to the AGILE-IoT projectTU Graz contribution to the AGILE-IoT project
TU Graz contribution to the AGILE-IoT projectAGILE IoT
 
Adopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projectsAdopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projectsCranfield University
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryNina Jeliazkova
 
bioAssist contribution to the AGILE-IoT project
bioAssist contribution to the AGILE-IoT project bioAssist contribution to the AGILE-IoT project
bioAssist contribution to the AGILE-IoT project AGILE IoT
 

What's hot (20)

Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
Data and Model Management for Systems Biology
Data and Model Management  for Systems BiologyData and Model Management  for Systems Biology
Data and Model Management for Systems Biology
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Standardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So FarStandardization of the HIPC Data Templates: The Story So Far
Standardization of the HIPC Data Templates: The Story So Far
 
Data and model management in Systems Biology
Data and model management in Systems BiologyData and model management in Systems Biology
Data and model management in Systems Biology
 
Model management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biologyModel management tools for improved reproducibility in systems biology
Model management tools for improved reproducibility in systems biology
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
Relevance Clues: Developing an Experimental Design to Examine the Criteria Be...
 
Application of-statistics-in-CSE
Application of-statistics-in-CSEApplication of-statistics-in-CSE
Application of-statistics-in-CSE
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sources
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...
 
Image Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical PublicationsImage Mining from Gel Diagrams in Biomedical Publications
Image Mining from Gel Diagrams in Biomedical Publications
 
TU Graz contribution to the AGILE-IoT project
TU Graz contribution to the AGILE-IoT projectTU Graz contribution to the AGILE-IoT project
TU Graz contribution to the AGILE-IoT project
 
Adopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projectsAdopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projects
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and query
 
bioAssist contribution to the AGILE-IoT project
bioAssist contribution to the AGILE-IoT project bioAssist contribution to the AGILE-IoT project
bioAssist contribution to the AGILE-IoT project
 

Similar to An experimental comparison of globally-optimal data de-identification algorithms

tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...David Peyruc
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...Phu H. Nguyen
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply ChainPaul Groth
 
ImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial dataImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial dataBarry Smith
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...
DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...
DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...Deltares
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014Microsoft Azure for Research
 
Accelerating the Design of Optical Networks using Surrogate Models
Accelerating the Design of Optical Networks using Surrogate ModelsAccelerating the Design of Optical Networks using Surrogate Models
Accelerating the Design of Optical Networks using Surrogate ModelsCPqD
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19Angelo Pugliese
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centreJisc
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...lucenerevolution
 

Similar to An experimental comparison of globally-optimal data de-identification algorithms (20)

tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
Some insights from a Systematic Mapping Study and a Systematic Review Study: ...
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply Chain
 
ImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial dataImmPort strategies to enhance discoverability of clinical trial data
ImmPort strategies to enhance discoverability of clinical trial data
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...
DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...
DSD-INT 2014 - Data Science symposium - 4th Paradigm - a technology perspecti...
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
 
Accelerating the Design of Optical Networks using Surrogate Models
Accelerating the Design of Optical Networks using Surrogate ModelsAccelerating the Design of Optical Networks using Surrogate Models
Accelerating the Design of Optical Networks using Surrogate Models
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Standardization of the HIPC Data Templates
Standardization of the HIPC Data TemplatesStandardization of the HIPC Data Templates
Standardization of the HIPC Data Templates
 

Recently uploaded

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 

An experimental comparison of globally-optimal data de-identification algorithms

  • 1. Technische Universität München Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn Chair for Biomedical Informatics Institute for Medical Statistics and Epidemiology Klinikum rechts der Isar der TU München An Experimental Comparison of Globally-Optimal Data De-Identification Algorithms
  • 2. Technische Universität München Optimal de-identification algorithms • Generalization hierarchies • Pruning: predictive tagging • Optimization: roll-up • Privacy models, e.g.: k-anonymity, l-diversity, t-closeness, δ-presence F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 2 • Generalization lattice K=2K=2 Age Gender Zipcode 34 male 81667 45 female 81667 66 male 81925 70 female 81925 70 male 81925 Age Gender Zipcode 20-60 * 81667 20-60 * 81667 ≥ 61 * 81925 ≥ 61 * 81925 ≥ 61 * 81925
  • 3. Technische Universität München Algorithms – Incognito • LeFevre et al. – SIGMOD 2005 • Dynamic programming – Breadth-first search on lattices for powerset of quasi-identifiers F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 3
  • 4. Technische Universität München Algorithms – OLA & Flash • Emam et al. – JAMIA 2009 • Divide & conquer – Optimal Lattice Anonymization – Binary search on sublattices F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 4 • Kohlmayer & Prasser et al. – PASSAT 2012 • Greedy search – Binary depth-first search – Total order & priority queue
  • 5. Technische Universität München Algorithms – BFS, DFS & Questions • Generic search methods – Breadth-first search (BFS) – Depth-first search (DFS) → Extended to use predictive tagging • Research questions – How do the algorithms compare in terms of performance? – Are there further differences between them? – Are the algorithms' properties influenced by the privacy models used? – How do problem-specific methods compare to generic search algorithms? F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 5
  • 6. Technische Universität München Benchmark – Method • Use all reasonable combinations of common privacy models with typical parameters – (k)-anonymity, (l)-diversity, (t)-closeness, (δ)-presence • Properties of the search space are influenced by combining privacy models: – (k), (l), (t), (δ) – (k, l), (k, t), (k, δ), (l, δ), (t, δ) – (k, l, δ), (k, t, δ) • Report three basic performance measures – Pruning power: number of anonymity checks – Optimizability: number of roll-ups – Execution times in a highly efficient runtime environment (ARX) • Five well-known benchmark datasets F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 6
  • 7. Technische Universität München Results – Averaged over datasets F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 7 #Roll-ups#ChecksExec.time[s] Lower is better Higher is better Lower is better ● Allows analyzing variations in results for different sets of privacy models
  • 8. Technische Universität München Results – Averaged over datasets ● Repeating patterns → Consistent results for different configurations → Differences between algorithms not influenced by privacy models used F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 8 #Roll-ups#ChecksExec.time[s] Lower is better Higher is better Lower is better
  • 9. Technische Universität München Results – Averaged over datasets ● Breadth-first search is a worst-case strategy → No pruning-power, no optimizability → Incognito suffers from similar performance problems F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 9 #Roll-ups#ChecksExec.time[s] Lower is better Higher is better Lower is better
  • 10. Technische Universität München Results – Averaged over datasets ● Depth-first search is pretty efficient → Can outperform domain-specific methods (OLA) → Because of its optimizability (best method in terms of #roll-ups) F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 10 #Roll-ups#ChecksExec.time[s] Lower is better Higher is better Lower is better
  • 11. Technische Universität München Results – Averaged over datasets ● Number of checks: OLA < Flash < DFS < Incognito < BFS ● Number of roll-ups: DFS > Flash > Incognito > OLA > BFS ● Execution times: Flash < OLA < DFS < Incognito < BFS F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 11 #Roll-ups#ChecksExec.time[s] Lower is better Higher is better Lower is better
  • 12. Technische Universität München Results – Averaged over privacy models F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 12 – OLA provides performance comparable to Flash for smaller datasets – DFS provides performance comparable to Flash for larger datasets #Checks#Roll-upsExec.time[s] Lower is better Higher is better Lower is better ● Shows variations in results for different datasets ● Algorithms exhibit similar properties ● Flash provides the best overall performance ● Differences are mostly independent of datasets ● But
  • 13. Technische Universität München Lessons learned • In general, domain-specific algorithms outperform generic methods → Up to several orders of magnitude (BFS) → OLA and Flash only check between 0.2% and 1.1% of all transformations in the solution space → Not necessarily true for large datasets (DFS) • Flash effectively balances optimizability with pruning power → Should be used if optimized runtime environments are available • OLA provides best pruning power → Should be used in general-purpose environments • DFS outperforms OLA for large datasets → In these cases, optimizability is more important than pruning power → Optimized runtime environments required F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 13
  • 14. Technische Universität München Thank you for your attention! • ARX is free software – Download – Use – Contribute – Repository: https://github.com/arx-deidentifier/arx • Further information – Website: http://arx.deidentifier.org – Contact ● Fabian Prasser (prasser@in.tum.de) ● Florian Kohlmayer (florian.kohlmayer@tum.de) F. Prasser, F. Kohlmayer et al.: A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data, CBMS 2014 12/19/16 14