SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Privacy-Preserving Schema Reuse 
Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam, and Karl Aberer 
EPFL, Switzerland
Schema Reuse 
Query 
Output 
Contribute 
Query 
Output 
Contribute 
schema.org 
factual.com 
Traditional approach: shows all 
original schemas 
Our approach: shows an 
anonymized (unified) schema 
DASFAA Security, privacy & trust DASFAA | 04.2014 2
Motivation 
• Schema Reuse offers many benefits: 
– Reduce development complexity: 
• New schemas require small modifications 
 copy and adapt existing schemas 
• Large repositories exist: schema.org, freebase.com, factual.com, niem.gov 
– Increase the interoperability: 
• Share common standard 
• But, privacy needs to be considered: 
– Leak schema information 
 Potential attack (e.g. SQL injection) 
– Maintain competitiveness: some parts of schemas are the source of 
revenue and business strategy. 
DASFAA Security, privacy & trust DASFAA | 04.2014 3
Challenges 
• How to define privacy constraints? 
• How to define an anonymized schema 
from multiple schemas? 
• How to define a utility function for a 
certain anonymized schema? 
• How to find an anonymized schema 
that satisfies privacy constraints and 
maximizes the utility function? 
Query 
Anonymized 
Schema 
Privacy constraints 
Contributors 
Our approach: shows an 
anonymized (unified) schema 
DASFAA Security, privacy & trust DASFAA | 04.2014 4
Challenge 1 – Define privacy constraints 
• Need to identify two elements 
– Sensitive information 
• Attributes 
– Privacy requirement 
• Prevent leaking provenance of sensitive attributes 
• Use presence constraint: 
A presence constraint ߛ is a triple ൏ ݏ, ܦ, ߠ ൐, where ݏ is a schema, ܦ is a 
set of attributes, and ߠ is a specified threshold. An anonymized schema ܵ෡ 
satisfies the presence constraint ߛ if ܲݎ ܦ ∈ ݏ ܵ෡ ሻ ൑ ߠ. 
DASFAA Security, privacy & trust DASFAA | 04.2014 5
Challenge 2 – Define anonymized schema 
• How to define “anonymized 
schema” given a set of schemas 
– Enough information to understand 
but not overwhelming 
• Anonymized schema contains a 
set of “abstract” attributes 
– Abstract attribute is a set similar 
attributes 
… 
Original schemas 
Name 
Num 
Name 
CC Holder 
CC 
{Name, Holder} 
{CC, Num} 
Anonymized schema 
Abstract attribute 
DASFAA Security, privacy & trust DASFAA | 04.2014 6
Challenge 3 – Define utility function 
• How to define utility function for a 
certain “anonymized schema” 
– Importance: sum of popularity of 
attributes 
• A schema that contains more popular 
attributes is better 
• An attribute that appears in more schemas is 
more popular 
– Completeness: number of abstract 
attributes 
• The more abstract attributes, the better 
Let Σ be the set of all possible 
anonymized schemas. The utility 
function ݑ: Σ → Թ measures a 
mount of information of each 
anonymized schema. 
? 
ൌ ݅݉݌݋ݎݐܽ݊ܿ݁ ܵመ 
൅ ݓ݄݁݅݃ݐ ∗ ܿ݋݉݌݈݁ݐ݁݊݁ݏݏሺܵመ 
ሻ 
{Holder} 
{CC} 
Utility function: 
ݑ ܵመ 
{Holder} {Name, Holder} 
{CC, Num} 
Importance Completeness 
S1 S2 S3 
DASFAA Security, privacy & trust DASFAA | 04.2014 7
Challenge 4 – Optimization problem (1) 
Maximizing Anonymized Schema 
Given a schema group ܵ and a set of privacy constraints ߁, construct 
an anonymized schema ܵ∗ such that ܵ∗ satisfies all constraints ߁ and 
has the utility value. 
• NP‐Hard problem 
… 
DASFAA Security, privacy & trust DASFAA | 04.2014 8
Challenge 4 – Optimization problem (2) 
• Problem modeling 
– Schema group: Affinity matrix 
– Anonymized schema: Affinity instance 
• Affinity instance is an affinity matrix with some empty cells 
ݏଵ 
a1 
a2 
Affinity matrix 
Anonymized schema 
DASFAA Security, privacy & trust DASFAA | 04.2014 9 
b1 
b2 
c1 
c2 
a1 b1 c1 
a2 b2 c2 
{a1, b1} 
{a2, b2,c2} 
a1 b1 
a2 b2 c2 
a1 b1 c1 
b2 
… 
= 
= 
Affinity instance 
{a1, b1,c1} 
ݏ { b2} ଶ 
ݏଷ 
 Need to find an affinity instance satisfying privacy constraints and having 
highest utility value
Challenge 4 – Optimization problem (4) 
• Overall solution: 
– Meta‐heuristic with 2 steps 
• Greedy algorithm: find a possible solution 
• Randomized local search: find optimal solution 
– Improve performance 
• Divide and conquer: partition the set of constraints into independent sets 
 satisfy each set independently 
DASFAA Security, privacy & trust DASFAA | 04.2014 10
Experiments - Setting 
Datasets: 
• Real data: 117 schemas 
• Synthetic data: vary the number of schemas and the number of attributes 
Evaluation Metrics: 
– Utility loss: measures the amount of utility reduction w.r.t the existence 
of privacy constraints 
• Δݑ ൌ ௨∅ି௨౳ 
௨∅ 
where u∅ is utility without constraints, ݑ୻ is utility with a 
set of constraints Γ 
– Privacy loss: measures the amount of disagreement between actual 
privacy ܲ ൌ ሼ௜ 
݌ሽ and expected privacy Θ ൌ ሼ௜ 
ߠሽ. 
• Δ݌ ൌ ܭܮ ܲ ∥ Θ ൌ Σ ݌௜ log ௣೔ 
ఏ೔ 
௜ 
DASFAA Security, privacy & trust DASFAA | 04.2014 11
Experiments – Computation Time 
• 100 schemas, 50 attributes, 1500 constraints 
 running time is about 6s 
Computation Time (log2 of msec.) 
DASFAA Security, privacy & trust DASFAA | 04.2014 12
Experiment – Privacy & Utility 
• Validate the trade‐off between privacy and utility 
• Evaluation procedure 
– Relax constraint: increase privacy threshold θ to 1 ൅ ݎ ߠ , ݎ is relaxing ratio 
• Observation 
– The higher privacy you enforce, the more the utility loss. 
Both utility loss and privacy loss 
are normalized to [0,1] 
Δݑ ൌ 
Δݑ െ ݉݅݊Δ௨ 
݉ܽݔΔ௨ െ ݉݅݊Δ௨ 
Δ݌ ൌ 
Δ݌ െ ݉݅݊Δ௣ 
݉ܽݔΔ௣ െ ݉݅݊Δ௣ 
DASFAA Security, privacy & trust DASFAA | 04.2014 13
Conclusion 
 Introduced schema reuse with privacy constraints 
 Defined privacy constraints 
 Defined an anonymized schema from multiple schemas 
 Defined a utility function for a certain anonymized schema 
 Constructed an anonymized schema that satisfies privacy 
constraints and maximizes the utility function 
DASFAA Security, privacy & trust DASFAA | 04.2014 14
Thank you! 
Questions

Weitere ähnliche Inhalte

Ähnlich wie Privacy-Preserving Schema Reuse

Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraDataStax Academy
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...DataStax Academy
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Aiswaryadevi Jaganmohan
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptxNaveen Kumar
 
Advanced Apex Security Expert Tips and Best Practices (1).pptx
Advanced Apex Security Expert Tips and Best Practices (1).pptxAdvanced Apex Security Expert Tips and Best Practices (1).pptx
Advanced Apex Security Expert Tips and Best Practices (1).pptxmohayyudin7826
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Johnny Miller
 
Multi-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine LearningMulti-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine LearningAmendra Shrestha
 
Process.ppt
Process.pptProcess.ppt
Process.pptSK Chew
 
AzureML – zero to hero
AzureML – zero to heroAzureML – zero to hero
AzureML – zero to heroGovind Kanshi
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in sparkChester Chen
 
ATLRUG Rails Security Presentation - 9/10/2014
ATLRUG Rails Security Presentation - 9/10/2014ATLRUG Rails Security Presentation - 9/10/2014
ATLRUG Rails Security Presentation - 9/10/2014jasnow
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 

Ähnlich wie Privacy-Preserving Schema Reuse (20)

Social Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache CassandraSocial Security Company Nexgate's Success Relies on Apache Cassandra
Social Security Company Nexgate's Success Relies on Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
Advanced Apex Security Expert Tips and Best Practices (1).pptx
Advanced Apex Security Expert Tips and Best Practices (1).pptxAdvanced Apex Security Expert Tips and Best Practices (1).pptx
Advanced Apex Security Expert Tips and Best Practices (1).pptx
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
Multi-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine LearningMulti-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine Learning
 
Process.ppt
Process.pptProcess.ppt
Process.ppt
 
AzureML – zero to hero
AzureML – zero to heroAzureML – zero to hero
AzureML – zero to hero
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
ATLRUG Rails Security Presentation - 9/10/2014
ATLRUG Rails Security Presentation - 9/10/2014ATLRUG Rails Security Presentation - 9/10/2014
ATLRUG Rails Security Presentation - 9/10/2014
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 

Mehr von PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsPlanetData Network of Excellence
 

Mehr von PlanetData Network of Excellence (20)

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Building a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data CloudBuilding a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data Cloud
 
OntoGen Extension for Exploring Image Collections
OntoGen Extension for Exploring Image CollectionsOntoGen Extension for Exploring Image Collections
OntoGen Extension for Exploring Image Collections
 

Privacy-Preserving Schema Reuse

  • 1. Privacy-Preserving Schema Reuse Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam, and Karl Aberer EPFL, Switzerland
  • 2. Schema Reuse Query Output Contribute Query Output Contribute schema.org factual.com Traditional approach: shows all original schemas Our approach: shows an anonymized (unified) schema DASFAA Security, privacy & trust DASFAA | 04.2014 2
  • 3. Motivation • Schema Reuse offers many benefits: – Reduce development complexity: • New schemas require small modifications  copy and adapt existing schemas • Large repositories exist: schema.org, freebase.com, factual.com, niem.gov – Increase the interoperability: • Share common standard • But, privacy needs to be considered: – Leak schema information  Potential attack (e.g. SQL injection) – Maintain competitiveness: some parts of schemas are the source of revenue and business strategy. DASFAA Security, privacy & trust DASFAA | 04.2014 3
  • 4. Challenges • How to define privacy constraints? • How to define an anonymized schema from multiple schemas? • How to define a utility function for a certain anonymized schema? • How to find an anonymized schema that satisfies privacy constraints and maximizes the utility function? Query Anonymized Schema Privacy constraints Contributors Our approach: shows an anonymized (unified) schema DASFAA Security, privacy & trust DASFAA | 04.2014 4
  • 5. Challenge 1 – Define privacy constraints • Need to identify two elements – Sensitive information • Attributes – Privacy requirement • Prevent leaking provenance of sensitive attributes • Use presence constraint: A presence constraint ߛ is a triple ൏ ݏ, ܦ, ߠ ൐, where ݏ is a schema, ܦ is a set of attributes, and ߠ is a specified threshold. An anonymized schema ܵ෡ satisfies the presence constraint ߛ if ܲݎ ܦ ∈ ݏ ܵ෡ ሻ ൑ ߠ. DASFAA Security, privacy & trust DASFAA | 04.2014 5
  • 6. Challenge 2 – Define anonymized schema • How to define “anonymized schema” given a set of schemas – Enough information to understand but not overwhelming • Anonymized schema contains a set of “abstract” attributes – Abstract attribute is a set similar attributes … Original schemas Name Num Name CC Holder CC {Name, Holder} {CC, Num} Anonymized schema Abstract attribute DASFAA Security, privacy & trust DASFAA | 04.2014 6
  • 7. Challenge 3 – Define utility function • How to define utility function for a certain “anonymized schema” – Importance: sum of popularity of attributes • A schema that contains more popular attributes is better • An attribute that appears in more schemas is more popular – Completeness: number of abstract attributes • The more abstract attributes, the better Let Σ be the set of all possible anonymized schemas. The utility function ݑ: Σ → Թ measures a mount of information of each anonymized schema. ? ൌ ݅݉݌݋ݎݐܽ݊ܿ݁ ܵመ ൅ ݓ݄݁݅݃ݐ ∗ ܿ݋݉݌݈݁ݐ݁݊݁ݏݏሺܵመ ሻ {Holder} {CC} Utility function: ݑ ܵመ {Holder} {Name, Holder} {CC, Num} Importance Completeness S1 S2 S3 DASFAA Security, privacy & trust DASFAA | 04.2014 7
  • 8. Challenge 4 – Optimization problem (1) Maximizing Anonymized Schema Given a schema group ܵ and a set of privacy constraints ߁, construct an anonymized schema ܵ∗ such that ܵ∗ satisfies all constraints ߁ and has the utility value. • NP‐Hard problem … DASFAA Security, privacy & trust DASFAA | 04.2014 8
  • 9. Challenge 4 – Optimization problem (2) • Problem modeling – Schema group: Affinity matrix – Anonymized schema: Affinity instance • Affinity instance is an affinity matrix with some empty cells ݏଵ a1 a2 Affinity matrix Anonymized schema DASFAA Security, privacy & trust DASFAA | 04.2014 9 b1 b2 c1 c2 a1 b1 c1 a2 b2 c2 {a1, b1} {a2, b2,c2} a1 b1 a2 b2 c2 a1 b1 c1 b2 … = = Affinity instance {a1, b1,c1} ݏ { b2} ଶ ݏଷ  Need to find an affinity instance satisfying privacy constraints and having highest utility value
  • 10. Challenge 4 – Optimization problem (4) • Overall solution: – Meta‐heuristic with 2 steps • Greedy algorithm: find a possible solution • Randomized local search: find optimal solution – Improve performance • Divide and conquer: partition the set of constraints into independent sets  satisfy each set independently DASFAA Security, privacy & trust DASFAA | 04.2014 10
  • 11. Experiments - Setting Datasets: • Real data: 117 schemas • Synthetic data: vary the number of schemas and the number of attributes Evaluation Metrics: – Utility loss: measures the amount of utility reduction w.r.t the existence of privacy constraints • Δݑ ൌ ௨∅ି௨౳ ௨∅ where u∅ is utility without constraints, ݑ୻ is utility with a set of constraints Γ – Privacy loss: measures the amount of disagreement between actual privacy ܲ ൌ ሼ௜ ݌ሽ and expected privacy Θ ൌ ሼ௜ ߠሽ. • Δ݌ ൌ ܭܮ ܲ ∥ Θ ൌ Σ ݌௜ log ௣೔ ఏ೔ ௜ DASFAA Security, privacy & trust DASFAA | 04.2014 11
  • 12. Experiments – Computation Time • 100 schemas, 50 attributes, 1500 constraints  running time is about 6s Computation Time (log2 of msec.) DASFAA Security, privacy & trust DASFAA | 04.2014 12
  • 13. Experiment – Privacy & Utility • Validate the trade‐off between privacy and utility • Evaluation procedure – Relax constraint: increase privacy threshold θ to 1 ൅ ݎ ߠ , ݎ is relaxing ratio • Observation – The higher privacy you enforce, the more the utility loss. Both utility loss and privacy loss are normalized to [0,1] Δݑ ൌ Δݑ െ ݉݅݊Δ௨ ݉ܽݔΔ௨ െ ݉݅݊Δ௨ Δ݌ ൌ Δ݌ െ ݉݅݊Δ௣ ݉ܽݔΔ௣ െ ݉݅݊Δ௣ DASFAA Security, privacy & trust DASFAA | 04.2014 13
  • 14. Conclusion  Introduced schema reuse with privacy constraints  Defined privacy constraints  Defined an anonymized schema from multiple schemas  Defined a utility function for a certain anonymized schema  Constructed an anonymized schema that satisfies privacy constraints and maximizes the utility function DASFAA Security, privacy & trust DASFAA | 04.2014 14