SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Technische Universität München
Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn
Chair for Biomedical Informatics
Institute for Medical Statistics and Epidemiologie
University of Technology Munich (TUM)
Engineering data privacy -
The ARX data anonymization tool
Technische Universität München
What is ARX?
●
= +
●
A tool for analyzing and reducing the uniqueness of records
in a (relational) dataset
●
Variety of methods
●
Highly scalable
●
Up to 50 dimensions (i.e. attributes)
●
Millions of records
●
(Semi-)automatically and/or manually
●
Comprehensive graphical user interface
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 2
Images: https://commons.wikimedia.org/ users: Ysangkok, Scarce2
statistics computer science
Methods from
Technische Universität München
Example
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 3
Generalization
Suppression
Microaggregation
Reduce uniqueness
Technische Universität München
Overview of methods implemented by ARX
Sample-based methods
• Fraction of sample uniques
• Average sample uniqueness
• k-anonymity
Population-based methods
• Model by Zayatz [1]
• Model by Hoshino [2]
• Model by Chen et al. [3] / Rinott [4]
• Model by Dankar et al. [5]
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 4
[1] Zayatz, L.V.: Estimation of the percent of unique population
elements on a microdata file using the sample. Statistical
Research Division Report Number: Census/SRD/RR-91/08 (1991)
[2] Hoshino, N.: Applying pitmans sampling formula to microdata
disclosure risk assessment. J Off Stat 17(4), 499520 (2001)
[3] Chen, G., Keller-McNulty, S.: Estimation of identification disclosure
risk in microdata. J Off Stat 14, 7995 (1998)
[4] Rinott, Y.: On models for statistical disclosure risk estimation. In:
Proc ECE/Eurostat Work Session Stat Data Confid, p. 275285 (2003)
[5] Dankar, F., Emam, K.E., Neisa, A., Roffey, T.:
Estimating the re-identification risk of clinical
data sets. BMC Med Inform Decis Mak 12(1), 66 (2012)
Global and local recoding
• Can be weighted
Methods
• Categorization
• Generalization
• Cell suppression
• Record suppression
• Micro-aggregation
• Top/bottom coding
Weighted and parameterized
• Ability to control the application
of different coding models
Methods
• AECS, Discernibility, Precision
• (Normalized) Mean squared error
• (Normalized) Non-uniform entropy
• KL divergence
• Loss
Measures for utility Coding models Measures for uniqueness
Transform
Visualize
Analyze
Adapt
Technische Universität München
Screenshots
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 5
Technische Universität München
Screenshots (cont'd)
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 6
Technische Universität München
Further features offered by ARX
●
Syntactic privacy models
●
ℓ-diversity, t-closeness, δ-disclosure privacy, δ-presence
●
Risk-based anonymization
●
Differential privacy
●
Truthful (e,δ)-differentially private data release
●
Using random sampling
●
Detection of HIPAA identifiers
●
Based on heuristics
●
Import from multiple sources
●
RDBMS, Excel, CSV
●
Software library
●
Open source, cross-platform
ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 7
Technische Universität München
http://arx.deidentifier.org

Weitere ähnliche Inhalte

Was ist angesagt?

Data mining in telecommunications industry
Data mining in telecommunications industryData mining in telecommunications industry
Data mining in telecommunications industryIssa Memari
 
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014Amanda Makulec
 
Lec13: Clustering Based Medical Image Segmentation Methods
Lec13: Clustering Based Medical Image Segmentation MethodsLec13: Clustering Based Medical Image Segmentation Methods
Lec13: Clustering Based Medical Image Segmentation MethodsUlaş Bağcı
 
Data Governance and Data Science to Improve Data Quality
Data Governance and Data Science to Improve Data QualityData Governance and Data Science to Improve Data Quality
Data Governance and Data Science to Improve Data QualityDATAVERSITY
 
Ethics in Data Management.pptx
Ethics in Data Management.pptxEthics in Data Management.pptx
Ethics in Data Management.pptxRavindra Babu
 
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Craig Milroy
 
Data science for business leaders executive program
Data science for business leaders executive programData science for business leaders executive program
Data science for business leaders executive programmjitu309
 
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...ivaderivader
 
Big data in fintech ecosystem
Big data in fintech ecosystemBig data in fintech ecosystem
Big data in fintech ecosystemBBVA API Market
 
BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics Incorta
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big dataSeta Wicaksana
 
sas-data-governance-framework-107325
sas-data-governance-framework-107325sas-data-governance-framework-107325
sas-data-governance-framework-107325hurt3303
 

Was ist angesagt? (20)

Data mining in telecommunications industry
Data mining in telecommunications industryData mining in telecommunications industry
Data mining in telecommunications industry
 
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
 
Lec13: Clustering Based Medical Image Segmentation Methods
Lec13: Clustering Based Medical Image Segmentation MethodsLec13: Clustering Based Medical Image Segmentation Methods
Lec13: Clustering Based Medical Image Segmentation Methods
 
Data Governance and Data Science to Improve Data Quality
Data Governance and Data Science to Improve Data QualityData Governance and Data Science to Improve Data Quality
Data Governance and Data Science to Improve Data Quality
 
Ethics in Data Management.pptx
Ethics in Data Management.pptxEthics in Data Management.pptx
Ethics in Data Management.pptx
 
02 Data Mining
02 Data Mining02 Data Mining
02 Data Mining
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239Alignment: Office of the Chief Data Officer & BCBS 239
Alignment: Office of the Chief Data Officer & BCBS 239
 
Data science for business leaders executive program
Data science for business leaders executive programData science for business leaders executive program
Data science for business leaders executive program
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Management information system
Management information systemManagement information system
Management information system
 
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
 
Big data in fintech ecosystem
Big data in fintech ecosystemBig data in fintech ecosystem
Big data in fintech ecosystem
 
BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Comparative analysis
Comparative analysisComparative analysis
Comparative analysis
 
Data Monetization
Data MonetizationData Monetization
Data Monetization
 
Business Intelligence Banking
Business Intelligence   BankingBusiness Intelligence   Banking
Business Intelligence Banking
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
sas-data-governance-framework-107325
sas-data-governance-framework-107325sas-data-governance-framework-107325
sas-data-governance-framework-107325
 

Ähnlich wie Engineering data privacy - The ARX data anonymization tool

Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...FranciscoJAzuajeG
 
A review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsA review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsReza Sadeghi
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
 
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data MiningDevelopment of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data MiningEditor IJCATR
 
Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...ijfcstjournal
 
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...arx-deidentifier
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...JaresJournal
 
Prof. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicineProf. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicinemntbs1
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray dataGianluca Bontempi
 
An Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisAn Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisIOSR Journals
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Intelligent generator of big data medical
Intelligent generator of big data medicalIntelligent generator of big data medical
Intelligent generator of big data medicalNexgen Technology
 
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...cscpconf
 
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...csandit
 

Ähnlich wie Engineering data privacy - The ARX data anonymization tool (20)

Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligence
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...
 
A review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsA review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signals
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
 
Hanaa phd presentation 14-4-2017
Hanaa phd  presentation  14-4-2017Hanaa phd  presentation  14-4-2017
Hanaa phd presentation 14-4-2017
 
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data MiningDevelopment of Computational Tool for Lung Cancer Prediction Using Data Mining
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
 
Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...Pattern recognition using context dependent memory model (cdmm) in multimodal...
Pattern recognition using context dependent memory model (cdmm) in multimodal...
 
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
FAULT DIAGNOSIS USING CLUSTERING. WHAT STATISTICAL TEST TO USE FOR HYPOTHESIS...
 
Prof. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicineProf. Mark Coles (Oxford University) - Data-driven systems medicine
Prof. Mark Coles (Oxford University) - Data-driven systems medicine
 
TBerger_FinalReport
TBerger_FinalReportTBerger_FinalReport
TBerger_FinalReport
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
 
An Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisAn Overview on Gene Expression Analysis
An Overview on Gene Expression Analysis
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Intelligent generator of big data medical
Intelligent generator of big data medicalIntelligent generator of big data medical
Intelligent generator of big data medical
 
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
Fractal Parameters of Tumour Microscopic Images as Prognostic Indicators of C...
 
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
FRACTAL PARAMETERS OF TUMOUR MICROSCOPIC IMAGES AS PROGNOSTIC INDICATORS OF C...
 

Mehr von arx-deidentifier

An Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-IdentificationAn Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-Identificationarx-deidentifier
 
Anonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARXAnonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARXarx-deidentifier
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier
 
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health DataARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health Dataarx-deidentifier
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
 

Mehr von arx-deidentifier (6)

An Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-IdentificationAn Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-Identification
 
Anonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARXAnonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARX
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithms
 
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health DataARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 

Kürzlich hochgeladen

How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Kürzlich hochgeladen (20)

How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

Engineering data privacy - The ARX data anonymization tool

  • 1. Technische Universität München Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn Chair for Biomedical Informatics Institute for Medical Statistics and Epidemiologie University of Technology Munich (TUM) Engineering data privacy - The ARX data anonymization tool
  • 2. Technische Universität München What is ARX? ● = + ● A tool for analyzing and reducing the uniqueness of records in a (relational) dataset ● Variety of methods ● Highly scalable ● Up to 50 dimensions (i.e. attributes) ● Millions of records ● (Semi-)automatically and/or manually ● Comprehensive graphical user interface ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 2 Images: https://commons.wikimedia.org/ users: Ysangkok, Scarce2 statistics computer science Methods from
  • 3. Technische Universität München Example ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 3 Generalization Suppression Microaggregation Reduce uniqueness
  • 4. Technische Universität München Overview of methods implemented by ARX Sample-based methods • Fraction of sample uniques • Average sample uniqueness • k-anonymity Population-based methods • Model by Zayatz [1] • Model by Hoshino [2] • Model by Chen et al. [3] / Rinott [4] • Model by Dankar et al. [5] ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 4 [1] Zayatz, L.V.: Estimation of the percent of unique population elements on a microdata file using the sample. Statistical Research Division Report Number: Census/SRD/RR-91/08 (1991) [2] Hoshino, N.: Applying pitmans sampling formula to microdata disclosure risk assessment. J Off Stat 17(4), 499520 (2001) [3] Chen, G., Keller-McNulty, S.: Estimation of identification disclosure risk in microdata. J Off Stat 14, 7995 (1998) [4] Rinott, Y.: On models for statistical disclosure risk estimation. In: Proc ECE/Eurostat Work Session Stat Data Confid, p. 275285 (2003) [5] Dankar, F., Emam, K.E., Neisa, A., Roffey, T.: Estimating the re-identification risk of clinical data sets. BMC Med Inform Decis Mak 12(1), 66 (2012) Global and local recoding • Can be weighted Methods • Categorization • Generalization • Cell suppression • Record suppression • Micro-aggregation • Top/bottom coding Weighted and parameterized • Ability to control the application of different coding models Methods • AECS, Discernibility, Precision • (Normalized) Mean squared error • (Normalized) Non-uniform entropy • KL divergence • Loss Measures for utility Coding models Measures for uniqueness Transform Visualize Analyze Adapt
  • 5. Technische Universität München Screenshots ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 5
  • 6. Technische Universität München Screenshots (cont'd) ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 6
  • 7. Technische Universität München Further features offered by ARX ● Syntactic privacy models ● ℓ-diversity, t-closeness, δ-disclosure privacy, δ-presence ● Risk-based anonymization ● Differential privacy ● Truthful (e,δ)-differentially private data release ● Using random sampling ● Detection of HIPAA identifiers ● Based on heuristics ● Import from multiple sources ● RDBMS, Excel, CSV ● Software library ● Open source, cross-platform ARX | Dagtuhl Genomic Privacy Workshop 201522.10.15 7