SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Association Rule Mining with Privacy Preservation
In Horizontally Distributed Databases
Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
Introduction
Look before you leap
The Flow
Association
Rule Mining
Privacy
Preservation
Horizontally
Distributed
Datasets
Before we start mining!
trends or patterns in
large datasets
extracting useful
information
useful and
unexpected insights
analyze and
predicting system
behavior
Data Mining
Scalability
?
Artificial
Engineering
Machine
Learning
Statistics
Database
Systems
Association Rule Learning
By Rakesh Agarwal, IBM Almaden Research Center
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
What is an Association Rule?
Antecedent
Consequent
Antecedent Consequent
Definitions
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
Antecedent
• Prerequisites for
the rule to be
applied
Consequent
• The outcome
Support
• Percentage of
transaction
containing the
itemset
Confidence
• Faction of
transaction
satisfying the
rule
• Two different forms of constraints are used to generate the required association rules
• Syntactic Constraints: Restricts the attributes that may be present in a rule.
• Support Constraints: No of transactions that support a rule from the set of transactions.
Constraints
Association Rule Learning in Large Datasets
large datasets
• To find association rules
Generating
Large Itemset
• combinations of itemsets which are above a minimum support threshold
Generating
Association
Rules
•Mining all rules which are satisfied in that itemset
Association Rule Learning in Distributed Datasets
And Privacy Preservation
• Most tools used for mining association rules assume that data to be analyzed can be
collected at one central site.
• But issues like Privacy Preservation restrict the collection of data.
• Alternative methods for mining have to be devised for distributed datasets to the mining
process feasible while ensuring privacy.
Preview
• Dataset
• Combined data of Twitter and Facebook
• Rule
• How many percentage of people login into a social networking
site and post within the next 2 minutes?
Privacy Preservation
• Horizontally Partitioned (Example: Insurance Companies)
• Rule Being Mined: Does a procedure have an unusual rate of
complication?
• Implications:
• A company may have high cases of the procedure failing and
they may change policies to help.
• At the same time if this rule is exposed it may be a huge
problem for the company.
• The risks outweigh the gains.
Privacy Preservation
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Company A
Company C
Company B
• Vertically Partitioned
Privacy Preservation
Credit Card No. Bought
tablet
2365987545623526 1
3639871526589414 1
4365845698742563 1
5962845632561200 1
6621563289657412 1
Credit Card No. Bought
TCover
2365987545623526 0
7639871526589414 1
4365845698742563 1
9962845632561200 0
6621563289657412 1
Common Property
Not One We
can exploit.
Mining of Association Rules
In Horizontally Partitioned Databases
What we want
• Computing Association Rules without revealing private information and getting
• The global support
• The global confidence
What we have
• Only the following information is available
• Local Support
• Local Confidence
• Size of the DB
Fundamental Steps
Even this information may not be shared freely between sites.
But we’ll get to that.
Calculating Required Values
• It protects individual privacy but each site has to disclose information.
• It reveals the local support and confidence in a rule at each site.
• This information if revealed can be harmful to an organization.
Problems with the approach
• We will be exploring two algorithms that have been used.
• One algorithm that has been used incorporates encryption with data distortion
while data sharing between sites.
• The second algorithm uses a particular Check Sum as the method of encryption.
Introducing the two Algorithms
Algorithm Uno
Some people are honest
• Phase 1: Uses encryption for mining of the large itemsets
• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)
Two phased algorithm
Phase 1: Commutative Encryption
Phase 2: Data Distortion
Site A
ABC:5
Size=100
Site B
ABC:6
Size=200
Site C
ABC:20
Size=300
R+count-5%*Size
=17+5-5%*100
13+20-5%*300 17+6-5%*200
13
17
18 >= R
R=17
• Doesn’t work for a 2 party system
• Assumes honest parties
• Assumes Boolean responses to variable for support of rules rather than a
subjective or weighted approach.
• As the no of candidate itemsets increases the encryption overhead
increases.
• The encryption overhead also varies directly proportional to the no of
sites or partitions.
Problems with the Algorithm
I got
……
Algorithm Dua
Don’t trust anyone
• Primarily used for to tackle semi honest sites.
• Data of each site is broken down into segments.
• Two interleaved nodes have a probability of hacking the one in between them.
• The neighbors are changed for each round. Hence, they can only obtain one such segment.
CK Secure Sum
P1
P2
P3
P4
Changing Neighbors
P1
P2
P4
P3
P1
P4
P2
P3
Round 1
Round 2
Round 3
Conclusion
The moral of the story...
Before you leave
• It is interesting that association rules play a vital role in data mining.
• Through this, what appears to be unrelated can have a logical explanation through
careful analysis.
• This aspect of data mining can be very useful in predicting patterns and foreseeing
trends in consumer behavior, choices and preferences.
• Association rules are indeed one of the best ways to succeed in business and enjoy the
harvest from data mining.
There are no dumb questions
(No questions please shhhh…)

Weitere ähnliche Inhalte

Ähnlich wie Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conferencegppcpa
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity gppcpa
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesDATAVERSITY
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersBrian Griffith
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furcShani729
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxsalutiontechnology
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Alessa
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs MatterEric Kavanagh
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001Vijay Desai
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Moogsoft
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Risk Crew
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasShawn Tuma
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskShawn Tuma
 

Ähnlich wie Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases (20)

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conference
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
 
Data mining
Data miningData mining
Data mining
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
 
MVA Project.pptx
MVA Project.pptxMVA Project.pptx
MVA Project.pptx
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
 
Fraud detection analysis
Fraud detection analysis Fraud detection analysis
Fraud detection analysis
 
Design for Security
Design for SecurityDesign for Security
Design for Security
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber Risk
 

Mehr von Abhra Basak

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...Abhra Basak
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in javaAbhra Basak
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XMLAbhra Basak
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed databaseAbhra Basak
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire WithinAbhra Basak
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteAbhra Basak
 
Course Recommender
Course RecommenderCourse Recommender
Course RecommenderAbhra Basak
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100Abhra Basak
 

Mehr von Abhra Basak (8)

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed database
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire Within
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi Website
 
Course Recommender
Course RecommenderCourse Recommender
Course Recommender
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100
 

Kürzlich hochgeladen

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Kürzlich hochgeladen (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

  • 1. Association Rule Mining with Privacy Preservation In Horizontally Distributed Databases Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
  • 4. Before we start mining! trends or patterns in large datasets extracting useful information useful and unexpected insights analyze and predicting system behavior Data Mining Scalability ? Artificial Engineering Machine Learning Statistics Database Systems
  • 5. Association Rule Learning By Rakesh Agarwal, IBM Almaden Research Center
  • 6. • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} What is an Association Rule? Antecedent Consequent Antecedent Consequent
  • 7. Definitions • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} Antecedent • Prerequisites for the rule to be applied Consequent • The outcome Support • Percentage of transaction containing the itemset Confidence • Faction of transaction satisfying the rule
  • 8. • Two different forms of constraints are used to generate the required association rules • Syntactic Constraints: Restricts the attributes that may be present in a rule. • Support Constraints: No of transactions that support a rule from the set of transactions. Constraints
  • 9. Association Rule Learning in Large Datasets large datasets • To find association rules Generating Large Itemset • combinations of itemsets which are above a minimum support threshold Generating Association Rules •Mining all rules which are satisfied in that itemset
  • 10. Association Rule Learning in Distributed Datasets And Privacy Preservation
  • 11. • Most tools used for mining association rules assume that data to be analyzed can be collected at one central site. • But issues like Privacy Preservation restrict the collection of data. • Alternative methods for mining have to be devised for distributed datasets to the mining process feasible while ensuring privacy. Preview
  • 12. • Dataset • Combined data of Twitter and Facebook • Rule • How many percentage of people login into a social networking site and post within the next 2 minutes? Privacy Preservation
  • 13. • Horizontally Partitioned (Example: Insurance Companies) • Rule Being Mined: Does a procedure have an unusual rate of complication? • Implications: • A company may have high cases of the procedure failing and they may change policies to help. • At the same time if this rule is exposed it may be a huge problem for the company. • The risks outweigh the gains. Privacy Preservation Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Company A Company C Company B
  • 14. • Vertically Partitioned Privacy Preservation Credit Card No. Bought tablet 2365987545623526 1 3639871526589414 1 4365845698742563 1 5962845632561200 1 6621563289657412 1 Credit Card No. Bought TCover 2365987545623526 0 7639871526589414 1 4365845698742563 1 9962845632561200 0 6621563289657412 1 Common Property Not One We can exploit.
  • 15. Mining of Association Rules In Horizontally Partitioned Databases
  • 16. What we want • Computing Association Rules without revealing private information and getting • The global support • The global confidence What we have • Only the following information is available • Local Support • Local Confidence • Size of the DB Fundamental Steps Even this information may not be shared freely between sites. But we’ll get to that.
  • 18. • It protects individual privacy but each site has to disclose information. • It reveals the local support and confidence in a rule at each site. • This information if revealed can be harmful to an organization. Problems with the approach
  • 19. • We will be exploring two algorithms that have been used. • One algorithm that has been used incorporates encryption with data distortion while data sharing between sites. • The second algorithm uses a particular Check Sum as the method of encryption. Introducing the two Algorithms
  • 21. • Phase 1: Uses encryption for mining of the large itemsets • Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system) Two phased algorithm
  • 22. Phase 1: Commutative Encryption
  • 23. Phase 2: Data Distortion Site A ABC:5 Size=100 Site B ABC:6 Size=200 Site C ABC:20 Size=300 R+count-5%*Size =17+5-5%*100 13+20-5%*300 17+6-5%*200 13 17 18 >= R R=17
  • 24. • Doesn’t work for a 2 party system • Assumes honest parties • Assumes Boolean responses to variable for support of rules rather than a subjective or weighted approach. • As the no of candidate itemsets increases the encryption overhead increases. • The encryption overhead also varies directly proportional to the no of sites or partitions. Problems with the Algorithm I got ……
  • 26. • Primarily used for to tackle semi honest sites. • Data of each site is broken down into segments. • Two interleaved nodes have a probability of hacking the one in between them. • The neighbors are changed for each round. Hence, they can only obtain one such segment. CK Secure Sum
  • 28. Conclusion The moral of the story...
  • 29. Before you leave • It is interesting that association rules play a vital role in data mining. • Through this, what appears to be unrelated can have a logical explanation through careful analysis. • This aspect of data mining can be very useful in predicting patterns and foreseeing trends in consumer behavior, choices and preferences. • Association rules are indeed one of the best ways to succeed in business and enjoy the harvest from data mining.
  • 30. There are no dumb questions (No questions please shhhh…)

Hinweis der Redaktion

  1. Replace arrows :P
  2. Support - It provides the idea of feasibility of a rule; sometimes applied to antecedent only
  3. Replace arrow