SlideShare a Scribd company logo
1 of 30
Download to read offline
Technische Universität München
Fabian Prasser, Florian Kohlmayer
Lehrstuhl für Medizinische Informatik
Institut für Medizinische Statistik und Epidemiologie
Klinikum rechts der Isar der TU München
ARX 3.0.0
A Comprehensive Tool for
Anonymizing Biomedical Data
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Towards useful data anonymization
• Usability has many dimensions
• Ability to balance data utility with privacy requirements
• Need to support a broad spectrum of methods
• Privacy models
• Transformation models
• Methods for analyzing data utility
• Methods for analyzing risks
• Further “non-functional” requirements
• Integrated and harmonized: ARX is not a “tool box”
• Compatibility (syntactic and semantic)
• Performance and scalability
• Intuitive visualization and parameterization
• Provide methods to end-users as well as programmers
19.03.2015 2
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Compatibility
• Built-in data import facilities
• Relational databases (MS SQL, DB2, SQLite, MySQL)
• MS Excel
• CSV (all common formats, auto-detection)
• Support for different data types and scales of measure
• Strings (with nominal and ordinal scale)
• Dates (interval scale)
• Numbers (ratio scale)
• Automatic detection of data types and formats
• Methods for handling and cleaning low-quality data
• Handles missing and invalid values correctly
• In privacy models, transformation methods, visualizations
• Manual removal of tuples, query interface, find & replace
19.03.2015 3
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Flexible transformation methods
• Global recoding
• Full-domain generalization
• Top- & bottom coding
• Local recoding
• Tuple suppression
• Fully integrated and parameterizable
• Importance of attributes, suitability of methods
• Functional representations of transformation rules
• Especially functional representations of hierarchies
• Support for categorical and continuous variables (categorization)
• Multiple methods for measuring data utility
• Parametrizable, e.g., with different aggregate functions
• Use functional representations of transformation rules
19.03.2015 4
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Multiple apriori privacy models
• k-Anonymity
• ℓ-Diversity (distinct, entropy, recursive-(c, ℓ))
• t-Closeness (equal and hierarchical ground distance)
• δ-Presence
• Multiple methods for risk-based anonymization
• Sample characteristics
• Average cell size
• Sample uniqueness
• Super-population models
• Decision rule by Dankar et al.
• Based on models by Pitman, Zayatz and the SNB model
• Support for arbitrary combinations of these models
• Optimal solution within our coding model
19.03.2015 5
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Scalability: ARX can handle large datasets (several million data
entries) on commodity hardware
• Efficient in-memory data management engine
• Works with compressed data representations
• Tight coupling between transformation operators and the „database
kernel”
• Provides a space-time trade-off
• Optimized search strategy: Based on multiple pruning strategies
• Efficient implementations of further complex tasks
• Evaluations of privacy criteria (e.g. t-closeness, δ-presence)
• Methods for solving non-linear equation systems
• Background jobs in the user interface
19.03.2015 6
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Highlights
• Comprehensive graphical interface
• Scalablity comparable to the ARX API
• Supports all methods provided by the ARX API
• Cross-platform (Windows, Linux/GTK, OSX) with native interfaces
• Available as binary distributions with installers
• Independent API
• User interface sits on top
of the API
• Java library
• All methods provided by ARX
are first-class citizens in both
worlds
19.03.2015 7
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
ARX: Anonymization workflow
• Iterative process to successively refine transformations
• Supported by the scalability of our framework
• Three (repeating) steps, mapped to four perspectives
19.03.2015 8
- Define transformation model
- Define privacy model
- Define coding model
- Filter and analyze the
solution space
- Organize transformations
- Compare and analyze
input and output
- Regarding risks and
utility
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 9
ARX: Wizard for data import
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 10
ARX: Configuration (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 11
ARX: Configuration (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 12
ARX: Configuration (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 13
ARX: Configuration (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 14
ARX: Wizard for transformation rules
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 15
ARX: Further dialogs
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 16
ARX: Exploration (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 17
ARX: Exploration (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 18
ARX: Exploration (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 19
ARX: Utility analysis (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 20
ARX: Utility analysis (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 21
ARX: Utility analysis (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 22
ARX: Utility analysis (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 23
ARX: Utility analysis (5)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 24
ARX: Risk analysis (1)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 25
ARX: Risk analysis (2)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 26
ARX: Risk analysis (3)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 27
ARX: Risk analysis (4)
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
19.03.2015 28
ARX: Context-sensitive help
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
●
Current projects
●
Non-interactive Differential Privacy
●
Support for high-dimensional data: heuristic algorithms
●
Further analyses and visualizations: utility and risks
●
Support for transactional attributes: (k, km
)-anonymity
●
Integrated data masking methods
●
More flexible definition of quasi-identifiers
●
More risk models
●
Planned projects
●
Auto-detection of HIPAA identifiers
●
Implement more flexible privacy criteria
19.03.2015 29
ARX: Further developments
Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data
• ARX is open source software
• Contribute: feature requests, code reviews, criticism,
enhancements, questions
• Repository: https://github.com/arx-deidentifier/arx
• Further information & download: http://arx.deidentifier.org
• Get in touch
• Fabian Prasser (prasser@in.tum.de)
• Florian Kohlmayer (florian.kohlmayer@tum.de)
19.03.2015 30
• Any questions?
Thank you for your attention

More Related Content

What's hot

ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
PECB
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
Splunk
 
There's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKE
There's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKEThere's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKE
There's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKE
Jeff Beley
 

What's hot (20)

NetSecTR - "Siem / Log Korelasyon Sunumu" Huzeyfe Önal
NetSecTR - "Siem / Log Korelasyon Sunumu" Huzeyfe ÖnalNetSecTR - "Siem / Log Korelasyon Sunumu" Huzeyfe Önal
NetSecTR - "Siem / Log Korelasyon Sunumu" Huzeyfe Önal
 
Splunk workshop-Threat Hunting
Splunk workshop-Threat HuntingSplunk workshop-Threat Hunting
Splunk workshop-Threat Hunting
 
Misp(malware information sharing platform)
Misp(malware information sharing platform)Misp(malware information sharing platform)
Misp(malware information sharing platform)
 
Leveraging MITRE ATT&CK - Speaking the Common Language
Leveraging MITRE ATT&CK - Speaking the Common LanguageLeveraging MITRE ATT&CK - Speaking the Common Language
Leveraging MITRE ATT&CK - Speaking the Common Language
 
Telesoft Cyber Threat Hunting Infographic
Telesoft Cyber Threat Hunting InfographicTelesoft Cyber Threat Hunting Infographic
Telesoft Cyber Threat Hunting Infographic
 
Cylance Information Security: Compromise Assessment Datasheet
Cylance Information Security: Compromise Assessment DatasheetCylance Information Security: Compromise Assessment Datasheet
Cylance Information Security: Compromise Assessment Datasheet
 
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
 
SIEM - Varolan Verilerin Anlamı
SIEM - Varolan Verilerin AnlamıSIEM - Varolan Verilerin Anlamı
SIEM - Varolan Verilerin Anlamı
 
Crowdstrike .pptx
Crowdstrike .pptxCrowdstrike .pptx
Crowdstrike .pptx
 
Defence in Depth Architectural Decisions
Defence in Depth Architectural DecisionsDefence in Depth Architectural Decisions
Defence in Depth Architectural Decisions
 
Microsoft threat modeling tool 2016
Microsoft threat modeling tool 2016Microsoft threat modeling tool 2016
Microsoft threat modeling tool 2016
 
Building the Security Operations and SIEM Use CAse
Building the Security Operations and SIEM Use CAseBuilding the Security Operations and SIEM Use CAse
Building the Security Operations and SIEM Use CAse
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
 
Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
 
Blackhat USA 2016 - What's the DFIRence for ICS?
Blackhat USA 2016 - What's the DFIRence for ICS?Blackhat USA 2016 - What's the DFIRence for ICS?
Blackhat USA 2016 - What's the DFIRence for ICS?
 
Briefing the board lessons learned from cisos and directors
Briefing the board lessons learned from cisos and directorsBriefing the board lessons learned from cisos and directors
Briefing the board lessons learned from cisos and directors
 
02 teknik penyerangan
02 teknik penyerangan02 teknik penyerangan
02 teknik penyerangan
 
ATT&CK is the Best Defense - Emulating Sophisticated Adversary Malware to Bol...
ATT&CK is the Best Defense - Emulating Sophisticated Adversary Malware to Bol...ATT&CK is the Best Defense - Emulating Sophisticated Adversary Malware to Bol...
ATT&CK is the Best Defense - Emulating Sophisticated Adversary Malware to Bol...
 
There's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKE
There's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKEThere's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKE
There's always money in the banana stand: A BLUE TEAMER’S GUIDE TO COBALT STRIKE
 
Understanding Fileless (or Non-Malware) Attacks and How to Stop Them
Understanding Fileless (or Non-Malware) Attacks and How to Stop ThemUnderstanding Fileless (or Non-Malware) Attacks and How to Stop Them
Understanding Fileless (or Non-Malware) Attacks and How to Stop Them
 

Similar to ARX - a comprehensive tool for anonymizing / de-identifying biomedical data

ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
arx-deidentifier
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
ibemam
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to ARX - a comprehensive tool for anonymizing / de-identifying biomedical data (20)

ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Principles and risk assessment of managing distributed ontologies hosted by e...
Principles and risk assessment of managing distributed ontologies hosted by e...Principles and risk assessment of managing distributed ontologies hosted by e...
Principles and risk assessment of managing distributed ontologies hosted by e...
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Overview of the altmetrics landscape
Overview of the altmetrics landscapeOverview of the altmetrics landscape
Overview of the altmetrics landscape
 
TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015
 
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
International Journal of Chaos, Control, Modelling and Simulation (IJCCMS)
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Towards batch one size with industrial semantics email
Towards batch one size with industrial semantics emailTowards batch one size with industrial semantics email
Towards batch one size with industrial semantics email
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
CIKB poster_2015
CIKB poster_2015CIKB poster_2015
CIKB poster_2015
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
1 3 introduction to open_ehr
1 3 introduction to open_ehr1 3 introduction to open_ehr
1 3 introduction to open_ehr
 
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsDr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
 

More from arx-deidentifier

A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
arx-deidentifier
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithms
arx-deidentifier
 
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health DataARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
arx-deidentifier
 

More from arx-deidentifier (6)

An Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-IdentificationAn Open Source Tool for Game Theoretic Health Data De-Identification
An Open Source Tool for Game Theoretic Health Data De-Identification
 
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
 
Anonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARXAnonymisierung und Risikomanagement mit ARX
Anonymisierung und Risikomanagement mit ARX
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithms
 
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health DataARX - A Generic Method for Assessing the Quality of De-Identified Health Data
ARX - A Generic Method for Assessing the Quality of De-Identified Health Data
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymization
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

ARX - a comprehensive tool for anonymizing / de-identifying biomedical data

  • 1. Technische Universität München Fabian Prasser, Florian Kohlmayer Lehrstuhl für Medizinische Informatik Institut für Medizinische Statistik und Epidemiologie Klinikum rechts der Isar der TU München ARX 3.0.0 A Comprehensive Tool for Anonymizing Biomedical Data
  • 2. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Towards useful data anonymization • Usability has many dimensions • Ability to balance data utility with privacy requirements • Need to support a broad spectrum of methods • Privacy models • Transformation models • Methods for analyzing data utility • Methods for analyzing risks • Further “non-functional” requirements • Integrated and harmonized: ARX is not a “tool box” • Compatibility (syntactic and semantic) • Performance and scalability • Intuitive visualization and parameterization • Provide methods to end-users as well as programmers 19.03.2015 2
  • 3. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Compatibility • Built-in data import facilities • Relational databases (MS SQL, DB2, SQLite, MySQL) • MS Excel • CSV (all common formats, auto-detection) • Support for different data types and scales of measure • Strings (with nominal and ordinal scale) • Dates (interval scale) • Numbers (ratio scale) • Automatic detection of data types and formats • Methods for handling and cleaning low-quality data • Handles missing and invalid values correctly • In privacy models, transformation methods, visualizations • Manual removal of tuples, query interface, find & replace 19.03.2015 3
  • 4. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Flexible transformation methods • Global recoding • Full-domain generalization • Top- & bottom coding • Local recoding • Tuple suppression • Fully integrated and parameterizable • Importance of attributes, suitability of methods • Functional representations of transformation rules • Especially functional representations of hierarchies • Support for categorical and continuous variables (categorization) • Multiple methods for measuring data utility • Parametrizable, e.g., with different aggregate functions • Use functional representations of transformation rules 19.03.2015 4
  • 5. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Multiple apriori privacy models • k-Anonymity • ℓ-Diversity (distinct, entropy, recursive-(c, ℓ)) • t-Closeness (equal and hierarchical ground distance) • δ-Presence • Multiple methods for risk-based anonymization • Sample characteristics • Average cell size • Sample uniqueness • Super-population models • Decision rule by Dankar et al. • Based on models by Pitman, Zayatz and the SNB model • Support for arbitrary combinations of these models • Optimal solution within our coding model 19.03.2015 5
  • 6. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Scalability: ARX can handle large datasets (several million data entries) on commodity hardware • Efficient in-memory data management engine • Works with compressed data representations • Tight coupling between transformation operators and the „database kernel” • Provides a space-time trade-off • Optimized search strategy: Based on multiple pruning strategies • Efficient implementations of further complex tasks • Evaluations of privacy criteria (e.g. t-closeness, δ-presence) • Methods for solving non-linear equation systems • Background jobs in the user interface 19.03.2015 6
  • 7. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Highlights • Comprehensive graphical interface • Scalablity comparable to the ARX API • Supports all methods provided by the ARX API • Cross-platform (Windows, Linux/GTK, OSX) with native interfaces • Available as binary distributions with installers • Independent API • User interface sits on top of the API • Java library • All methods provided by ARX are first-class citizens in both worlds 19.03.2015 7
  • 8. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ARX: Anonymization workflow • Iterative process to successively refine transformations • Supported by the scalability of our framework • Three (repeating) steps, mapped to four perspectives 19.03.2015 8 - Define transformation model - Define privacy model - Define coding model - Filter and analyze the solution space - Organize transformations - Compare and analyze input and output - Regarding risks and utility
  • 9. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 9 ARX: Wizard for data import
  • 10. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 10 ARX: Configuration (1)
  • 11. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 11 ARX: Configuration (2)
  • 12. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 12 ARX: Configuration (3)
  • 13. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 13 ARX: Configuration (4)
  • 14. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 14 ARX: Wizard for transformation rules
  • 15. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 15 ARX: Further dialogs
  • 16. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 16 ARX: Exploration (1)
  • 17. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 17 ARX: Exploration (2)
  • 18. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 18 ARX: Exploration (3)
  • 19. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 19 ARX: Utility analysis (1)
  • 20. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 20 ARX: Utility analysis (2)
  • 21. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 21 ARX: Utility analysis (3)
  • 22. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 22 ARX: Utility analysis (4)
  • 23. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 23 ARX: Utility analysis (5)
  • 24. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 24 ARX: Risk analysis (1)
  • 25. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 25 ARX: Risk analysis (2)
  • 26. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 26 ARX: Risk analysis (3)
  • 27. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 27 ARX: Risk analysis (4)
  • 28. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data 19.03.2015 28 ARX: Context-sensitive help
  • 29. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data ● Current projects ● Non-interactive Differential Privacy ● Support for high-dimensional data: heuristic algorithms ● Further analyses and visualizations: utility and risks ● Support for transactional attributes: (k, km )-anonymity ● Integrated data masking methods ● More flexible definition of quasi-identifiers ● More risk models ● Planned projects ● Auto-detection of HIPAA identifiers ● Implement more flexible privacy criteria 19.03.2015 29 ARX: Further developments
  • 30. Technische Universität MünchenARX - A Comprehensive Tool for Anonymizing Biomedical Data • ARX is open source software • Contribute: feature requests, code reviews, criticism, enhancements, questions • Repository: https://github.com/arx-deidentifier/arx • Further information & download: http://arx.deidentifier.org • Get in touch • Fabian Prasser (prasser@in.tum.de) • Florian Kohlmayer (florian.kohlmayer@tum.de) 19.03.2015 30 • Any questions? Thank you for your attention