SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Fostering an Ecosystem for
Smartphone Privacy
Jason Hong
jasonh@cs.cmu.edu
New Kinds of Guidelines and Regulations
US Federal Trade
Commission guidelines
California Attorney General
recommendations European Union
General Data Protection
My Research Focused on Smartphones
and Privacy
• Over 1B smartphones
sold every year
– Perhaps most widely
deployed platform
• Well over 100B apps
downloaded on each of
Android and iOS
• Incredibly intimate devices
Smartphones are Intimate
Fun Facts about Millennials
• 83% sleep with phones
Smartphones are Intimate
Fun Facts about Millennials
• 83% sleep with phones
• 90% check first thing in morning
Smartphones are Intimate
Fun Facts about Millennials
• 83% sleep with phones
• 90% check first thing in morning
• 1 in 3 use in bathroom
Smartphone Data is Intimate
Who we know
(contacts + call log)
Sensors
(accel, sound, light)
Where we go
(gps, photos)
The Opportunity and the Risk
• There are all these
amazing things we
could do
– Healthcare
– Urban analytics
– Sustainability
• But only if we can
legitimately address
privacy concerns
– Spam, misuse, breaches
http://www.flickr.com/photos/robby_van_moor/478725670/
My Main Point: We Need to Foster a
Better Ecosystem of Privacy
• Today, too much burden is on end-users
– Should I install this app?
– What are all the settings I need to know?
– What are all the terms and conditions?
– Trackers, cookies, VPNs, anonymizers, etc
• We need a better ecosystem for privacy
– Push burden from end-users onto rest of ecosystem
– Analogy: Spam email
– Other players: OS, app stores, developers, services,
crowds, policy makers, journalists
Today’s Talk
• Why is privacy hard?
• Our research in smartphone privacy
– PrivacyGrade.org for grading app privacy
– Studies on what developers know about privacy
– Helping developers
– Helping app stores
• What you can do to help with privacy
Why is Privacy Hard?
#1 Privacy is a broad and fuzzy term
• Privacy is a broad umbrella term that captures
concerns about our relationships with others
– The right to be left alone
– Control and feedback over one’s data
– Anonymity (popular among researchers)
– Presentation of self (impression management)
– Right to be forgotten
– Contextual integrity (take social norms into account)
• Each leads to different way of handling privacy
– Right to be left alone -> do not call list, blocking
– Right to be forgotten -> delete from search engines
Today, Will Focus on One Form of Privacy
Data Privacy
• Data privacy is primarily about how orgs collect,
use, and protect sensitive data
– Focuses on Personally Identifiable Information (PII)
• Ex. Name, street address, unique IDs, pictures
– Rules about data use, privacy notices
• Led to the Fair Information Practices
– Notice / Awareness
– Choice / Consent
– Access / Participation
– Integrity / Security
– Enforcement / Redress
Some Comments on Data Privacy
• Data privacy tends to be procedurally-oriented
– Did you follow this set of rules?
– Did you check off all of the boxes?
– Somewhat hard to measure too (Better? Worse?)
– This is in contrast to outcome-oriented
• Many laws embody the Fair Information Practices
– GDPR, HIPAA, Financial Privacy Act, COPPA, FERPA
– But, enforcement is a weakness here
• If an org violates, can be hard to detect
• In practice, limited resources for enforcement
Why is Privacy Hard?
#2 No Common Set of Best Practices for Privacy
• Security has lots of best practices + tools for devs
– Use TLS/SSL
– Devices should not have common default passwords
– Use firewalls to block unauthorized traffic
• For privacy, not so much
– Choice / Consent: Best way of offering choice?
– Access / Participation: Best way of offering access?
– Notice / Awareness: Typically privacy policies, useful?
• New York Times privacy policy
• Still state of the art for privacy notices
• But no one reads these
Why is Privacy Hard?
#3 Technological Capabilities Rapidly Growing
• Data gathering easier and pervasive
– Everything on the web (Google + FB)
– Sensors (smartphones, IoT)
• Data storage and querying bigger and faster
• Inferences more powerful
– Some examples shortly
• Data sharing more widespread
– Social media
– Lots of companies collecting and sharing with each
other, hard to explain to end-users (next slide)
• 2010 diagram of ad tech ecosystem
• Most of these are collecting and using
data about you
Built a logistic regression
to predict sexuality based
on what your friends on
Facebook disclosed, even
if you didn’t disclose
Inferences about people more powerful
“[An analyst at Target] was able to identify about
25 products that… allowed him to assign each
shopper a ‘pregnancy prediction’ score. [H]e
could also estimate her due date to within a small
window, so Target could send coupons timed to
very specific stages of her pregnancy.” (NYTimes)
Why is Privacy Hard?
#4 Multiple Use of the Same Data
• The same data can help as well as harm (or
creep people out) depending on use and re-use
Recap of Why Privacy is Hard
• Privacy is a broad and fuzzy term
• No common set of best practices
• Technological capabilities rapidly growing
• Same data can be used for good and for bad
• Note that these are just a few reasons,
there are many, many more
– But enough so that we have common ground
Some Smartphone Apps Use Your Data in
Unexpected Ways
Shared your location,
gender, unique phone ID,
phone# with advertisers
Uploaded your entire
contact list to their server
(including phone #s)
More Unexpected Uses of Your Data
Location Data
Unique device ID
Location Data
Network Access
Unique device ID
Location Data
Microphone
Unique device ID
PrivacyGrade.org
• Improve transparency
• Assign privacy grades to all
1M+ Android apps
• Does not help devs directly
Expectations vs Reality
Privacy as Expectations
Use crowdsourcing to compare what people expect
an app to do vs what an app actually does
App Behavior
(What an app
actually does)
User Expectations
(What people think
the app does)
How PrivacyGrade Works
• We crowdsourced people’s expectations of
core set of 837 apps
– Ex. “How comfortable are you with
Drag Racing using your location for ads?”
• We generated purposes by examining
what third-party libraries used by app
• Created a model to predict people’s likely
privacy concerns and applied to 1M Android apps
How PrivacyGrade Works
How PrivacyGrade Works
• Long tail distribution of libraries
• We focused on top 400 libraries, which covers
vast majority of cases
Impact of PrivacyGrade
• Popular Press
– NYTimes, CNN, BBC, CBS, more
• Government
– Earlier work helped lead to FTC fines
• Google
– Google has something like PrivacyGrade internally
• Developers
Market Failure for Privacy
• Let’s say you want to purchase a web cam
– Go into store, can compare price, color, features
– But can’t easily compare security (hidden feature)
– So, security does not influence customer purchases
– So, devs not incentivized to improve
• Same is true for privacy
– This is where things like PrivacyGrade can help
– Improve transparency, address market failures
– More broadly, what other ways to incentivize?
Study 1
What Do Developers Know about Privacy?
• A lot of privacy research is about end-users
– Very little about developers
• Interviewed 13 app developers
• Surveyed 228 app developers
– Got a good mix of experiences and size of orgs
• What knowledge? What tools used? Incentives?
• Are there potential points of leverage?
Balebako et al, The Privacy and Security Behaviors
of Smartphone App Developers. USEC 2014.
Study 1 Summary of Findings
Third-party Libraries Problematic
• Use ads and analytics to monetize
• Hard to understand their behaviors
– A few didn’t know they were using libraries
(based on inconsistent answers)
– Some didn’t know the libraries collected data
– “If either Facebook or Flurry had a privacy policy that
was short and concise and condensed into real
English rather than legalese, we definitely would
have read it.”
– In a later study we did on apps, we found 40% apps
used sensitive data only b/c of libraries [Chitkara 2017]
Study 1 Summary of Findings
Devs Don’t Know What to Do
• Low awareness of existing privacy guidelines
– Fair Information Practices, FTC guidelines, Google
– Often just ask others around them
• Low perceived value of privacy policies
– Mostly protection from lawsuits
– “I haven’t even read [our privacy policy]. I mean, it’s
just legal stuff that’s required, so I just put in there.”
Study 2
How do developers address privacy when coding?
• Interviewed 9 Android developers
• Semi-structured interview probing about their
three most recent apps
– Their understanding of privacy
– Any privacy training they received
– What data collected in app and how used
• Libraries used?
• Was data sent to cloud server?
• How and where data stored?
– We also checked against their app if on app store
Study 2 Findings
Inaccurate Understanding of Their Own Apps
• Some data practices they claimed didn’t match
app behaviors
• Lacked knowledge of library behaviors
• Fast iterations led to changes in data collection
and data use
• Team dynamics
– Division of labor, don’t know what other devs doing
– Turnover, use of sensitive data not documented
Study 2 Findings
Lack of Knowledge of Alternatives
• Many alternatives exist, but often went with first
solution found (e.g. StackOverflow)
• Example: Many apps use some kind of identifier,
and different identifiers have tradeoffs
– Hardware identifiers (riskiest since persistent)
– Application identifier (email, hashcode)
– Advertising identifier
Study 2 Findings
Lack of Motivation to Address Privacy Issues
• Might ignore privacy issues if not required
– Ex. Get location permission for one reason (maps),
but also use for other reasons (ads)
– Ex. Get name and email address, only need email
– Ex. Get device ID because no permission needed
• Android permissions and Play Store requirements
useful in forcing devs to improve
– In Android, have to declare use of most sensitive data
– Google Play has requirements too (ex. privacy policy)
How to Get People to Change Behaviors?
Security Sensitivity Stack
Awareness
Knowledge
Motivation
Does person know of existing threat?
Does person know tools, behaviors,
strategies to protect?
Can person identify attack / problem?
Can person use tools, behaviors,
strategies?
Does person care?
Security Sensitivity Stack Adapted for
Developers and Privacy
Awareness
Knowledge
Motivation
Are devs aware of privacy problem?
Ex. Identifier tradeoffs, library behavior
Do devs know how to address?
Ex. Might not know right API call
Do devs care?
Ex. Sometimes ignore issues if not required
Privacy-Enhanced Android
• A large DARPA project to improve privacy
• Key idea: have devs declare in apps the purpose
of why sensitive data being used
– Devs select from a small set of defined purposes
• Today: “Uses location”
• Tomorrow: “Uses location for advertising”
– Use these purposes to help developers
• Managing data better, generate privacy policies, etc
– … to check app behaviors throughout ecosystem
– … for new kinds of GUIs explaining app behaviors
Helping Developers
PrivacyStreams Programming Model
• Observations
– Most apps don’t need raw data (GPS vs City location)
– Many ancillary issues (threads, format, different APIs)
• PrivacyStreams works like Unix pipes on streams
– Easier for developers (threading, uniform API + format)
– Devs never see raw data, only final outputs
– Also easier to analyze, since one line of code
• “This app uses your microphone only to get loudness”
UQI.getData(Audio.recordPeriodic(DURATION, INTERVAL),
Purpose.HEALTH("monitor sleep"))
.setField("loudness", calcLoudness(Audio.AUDIO_DATA))
.forEach("loudness", callback);
Helping Developers
Coconut IDE Plugin
• Developers add some Java annotations for each
use of sensitive data
• Can offer alternatives
• Can aggregate all data use in one place
• (Future) Can auto-generate privacy policies
Helping App Stores
• Ways of checking the behavior of apps
– Ex. When devs upload to app store
• Decompile the app and examine the text
– If app uses location data, and if we see strings from
app like exif, photo, tag, probably geotagging
• Check network data of apps
– Similar to above, except for network traffic
• Add safety checks to apps
– If app has well-defined policy, can add extra checks
to app to make sure it does the right thing
Addressing Market Failure (Work in Progress)
Who Knows What About Us and Why
Addressing Market Failure (Work in Progress)
Who Knows What About Us and Why
ProtectMyPrivacy (PMP) for Making
Decisions
• For jailbroken iOS and Android
– Intercept calls to sensitive data
– Over 200k people using iOS PMP
– Over 6M decisions (+ stack traces)
– 20 data types protected
• Recommender system too
– Have recs for 97% of top 10k apps
• User study with 1321 people
– 1 year with old model (by app)
– 1 month with new model (by library)
Long Tail of Third-party Libraries
Chitkara, S. et al. Why does this app need my Location? Context aware
Privacy Management on Android. In IMWUT 1(3). 2017.
Most Popular 30 Libraries Account for
Over Half of all Sensitive Data Access
About 40% Apps Use Sensitive Data Only
because of Third-party Libraries
Allowing or Denying Access by Library
(vs by App) Reduces #Decisions Made
How You Can Help with Privacy
Some Opportunities
• Imagine a gigantic blob of privacy work
– This is amount of work needed for “good” privacy
– Right now, most of this blob is managed by end-users
– What are useful ways of slicing up this blob so that
other parts of ecosystem can manage better?
• Better decision making by crowds or by experts
– How good are decisions? Ways of making better ones?
– How to easily share these decisions?
How You Can Help with Privacy
Some Opportunities
• Economics of privacy
– GDPR; other ways of addressing market failures?
• Ex. Consumer Reports really interested in this area
– Third-party services and libraries a major problem
• Incentives for privacy
– Improving awareness, knowledge, motivation for devs
– User attention for privacy
• Special cases of privacy law
– Privacy for children, healthcare, finances
How can we create
a connected world we
would all want to live in?
Thanks!
More info at cmuchimps.org
or email jasonh@cs.cmu.edu
Special thanks to:
• DARPA Brandeis
• Google
• Yuvraj Agarwal
• Shah Amini
• Rebecca Balebako
• Mike Czapik
• Matt Fredrikson
• Shawn Hanna
• Haojian Jin
• Tianshi Li
• Yuanchun Li
• Jialiu Lin
• Song Luan
• Swarup Sahoo
• Mike Villena
• Jason Wiese
• Alex Yu
• And many more…
• CMU Cylab
• NQ Mobile

Weitere ähnliche Inhalte

Was ist angesagt?

Lorrie Cranor - Usable Privacy & Security
Lorrie Cranor - Usable Privacy & SecurityLorrie Cranor - Usable Privacy & Security
Lorrie Cranor - Usable Privacy & Security
Amy Lenzo
 
Embracing the IT Consumerization Imperitive
Embracing the IT Consumerization ImperitiveEmbracing the IT Consumerization Imperitive
Embracing the IT Consumerization Imperitive
Barry Caplin
 

Was ist angesagt? (20)

The indonesia darknets revealed– mapping the uncharted territory of the internet
The indonesia darknets revealed– mapping the uncharted territory of the internetThe indonesia darknets revealed– mapping the uncharted territory of the internet
The indonesia darknets revealed– mapping the uncharted territory of the internet
 
A Case for Expectation Informed Design - Full
A Case for Expectation Informed Design - FullA Case for Expectation Informed Design - Full
A Case for Expectation Informed Design - Full
 
A Case for Expectation Informed Design
A Case for Expectation Informed DesignA Case for Expectation Informed Design
A Case for Expectation Informed Design
 
Ethical Issues In ICT
Ethical Issues In ICTEthical Issues In ICT
Ethical Issues In ICT
 
A Morning of Mobile Privacy - Presenter Slides
A Morning of Mobile Privacy - Presenter SlidesA Morning of Mobile Privacy - Presenter Slides
A Morning of Mobile Privacy - Presenter Slides
 
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
An Architecture for Privacy-Sensitive Ubiquitous Computing at Mobisys 2004
 
Avoid Pitfalls in Social Media Listening
Avoid Pitfalls in Social Media ListeningAvoid Pitfalls in Social Media Listening
Avoid Pitfalls in Social Media Listening
 
Unit 1
Unit 1Unit 1
Unit 1
 
Cellebrite Predictions Survey 2015
Cellebrite Predictions Survey 2015Cellebrite Predictions Survey 2015
Cellebrite Predictions Survey 2015
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
IoT is Something to Figure Out
IoT is Something to Figure OutIoT is Something to Figure Out
IoT is Something to Figure Out
 
PPIT Lecture 5
PPIT Lecture 5PPIT Lecture 5
PPIT Lecture 5
 
Lorrie Cranor - Usable Privacy & Security
Lorrie Cranor - Usable Privacy & SecurityLorrie Cranor - Usable Privacy & Security
Lorrie Cranor - Usable Privacy & Security
 
Using OSINT in times of social unrest
Using OSINT in times of social unrestUsing OSINT in times of social unrest
Using OSINT in times of social unrest
 
Embracing the IT Consumerization Imperitive
Embracing the IT Consumerization ImperitiveEmbracing the IT Consumerization Imperitive
Embracing the IT Consumerization Imperitive
 
Digital Breadcrums: Investigating Internet Crime with Open Source Intelligenc...
Digital Breadcrums: Investigating Internet Crime with Open Source Intelligenc...Digital Breadcrums: Investigating Internet Crime with Open Source Intelligenc...
Digital Breadcrums: Investigating Internet Crime with Open Source Intelligenc...
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
 
Cyber and ethics(cyber crime and many more topics)
Cyber and ethics(cyber crime and many more topics)Cyber and ethics(cyber crime and many more topics)
Cyber and ethics(cyber crime and many more topics)
 
Reining in the Data ITAG tech360 Penn State Great Valley 2015
Reining in the Data   ITAG tech360 Penn State Great Valley 2015 Reining in the Data   ITAG tech360 Penn State Great Valley 2015
Reining in the Data ITAG tech360 Penn State Great Valley 2015
 
IT Ethics
IT EthicsIT Ethics
IT Ethics
 

Ähnlich wie Fostering an Ecosystem for Smartphone Privacy

“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
Building a Mobile Security Model
Building a Mobile Security Model Building a Mobile Security Model
Building a Mobile Security Model
tmbainjr131
 
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Glenn Villanueva
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
Pranav Godse
 

Ähnlich wie Fostering an Ecosystem for Smartphone Privacy (20)

Helping Developers with Privacy
Helping Developers with PrivacyHelping Developers with Privacy
Helping Developers with Privacy
 
Helping Developers with Privacy
Helping Developers with PrivacyHelping Developers with Privacy
Helping Developers with Privacy
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Privacy, Encryption, and Anonymity in the Civil Legal Aid Context
Privacy, Encryption, and Anonymity in the Civil Legal Aid ContextPrivacy, Encryption, and Anonymity in the Civil Legal Aid Context
Privacy, Encryption, and Anonymity in the Civil Legal Aid Context
 
Helping Developers with Privacy, Distinguished Lecture at University of Wisco...
Helping Developers with Privacy, Distinguished Lecture at University of Wisco...Helping Developers with Privacy, Distinguished Lecture at University of Wisco...
Helping Developers with Privacy, Distinguished Lecture at University of Wisco...
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
Putting data science into perspective
Putting data science into perspectivePutting data science into perspective
Putting data science into perspective
 
How We Will Fail in Privacy and Ethics for the Emerging Internet of Things
How We Will Fail in Privacy and Ethics for the Emerging Internet of ThingsHow We Will Fail in Privacy and Ethics for the Emerging Internet of Things
How We Will Fail in Privacy and Ethics for the Emerging Internet of Things
 
Building a Mobile Security Model
Building a Mobile Security Model Building a Mobile Security Model
Building a Mobile Security Model
 
Data science and ethics in fundraising
Data science and ethics in fundraisingData science and ethics in fundraising
Data science and ethics in fundraising
 
BYOD: Device Control in the Wild, Wild, West
BYOD: Device Control in the Wild, Wild, WestBYOD: Device Control in the Wild, Wild, West
BYOD: Device Control in the Wild, Wild, West
 
3 - Social Media and Enterprise
3 - Social Media and Enterprise3 - Social Media and Enterprise
3 - Social Media and Enterprise
 
Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...
 
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
 
Scaling up learning analytics solutions: Is privacy a show-stopper?
Scaling up learning analytics solutions:  Is privacy a show-stopper?Scaling up learning analytics solutions:  Is privacy a show-stopper?
Scaling up learning analytics solutions: Is privacy a show-stopper?
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
 
Ethics and Politics of Big Data
Ethics and Politics of Big DataEthics and Politics of Big Data
Ethics and Politics of Big Data
 
260119 a digital approach towards market research upload
260119 a digital approach towards market research upload260119 a digital approach towards market research upload
260119 a digital approach towards market research upload
 
Data Mining & Engineering
Data Mining & EngineeringData Mining & Engineering
Data Mining & Engineering
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AI
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Fostering an Ecosystem for Smartphone Privacy

  • 1. Fostering an Ecosystem for Smartphone Privacy Jason Hong jasonh@cs.cmu.edu
  • 2.
  • 3. New Kinds of Guidelines and Regulations US Federal Trade Commission guidelines California Attorney General recommendations European Union General Data Protection
  • 4. My Research Focused on Smartphones and Privacy • Over 1B smartphones sold every year – Perhaps most widely deployed platform • Well over 100B apps downloaded on each of Android and iOS • Incredibly intimate devices
  • 5. Smartphones are Intimate Fun Facts about Millennials • 83% sleep with phones
  • 6. Smartphones are Intimate Fun Facts about Millennials • 83% sleep with phones • 90% check first thing in morning
  • 7. Smartphones are Intimate Fun Facts about Millennials • 83% sleep with phones • 90% check first thing in morning • 1 in 3 use in bathroom
  • 8. Smartphone Data is Intimate Who we know (contacts + call log) Sensors (accel, sound, light) Where we go (gps, photos)
  • 9. The Opportunity and the Risk • There are all these amazing things we could do – Healthcare – Urban analytics – Sustainability • But only if we can legitimately address privacy concerns – Spam, misuse, breaches http://www.flickr.com/photos/robby_van_moor/478725670/
  • 10. My Main Point: We Need to Foster a Better Ecosystem of Privacy • Today, too much burden is on end-users – Should I install this app? – What are all the settings I need to know? – What are all the terms and conditions? – Trackers, cookies, VPNs, anonymizers, etc • We need a better ecosystem for privacy – Push burden from end-users onto rest of ecosystem – Analogy: Spam email – Other players: OS, app stores, developers, services, crowds, policy makers, journalists
  • 11. Today’s Talk • Why is privacy hard? • Our research in smartphone privacy – PrivacyGrade.org for grading app privacy – Studies on what developers know about privacy – Helping developers – Helping app stores • What you can do to help with privacy
  • 12. Why is Privacy Hard? #1 Privacy is a broad and fuzzy term • Privacy is a broad umbrella term that captures concerns about our relationships with others – The right to be left alone – Control and feedback over one’s data – Anonymity (popular among researchers) – Presentation of self (impression management) – Right to be forgotten – Contextual integrity (take social norms into account) • Each leads to different way of handling privacy – Right to be left alone -> do not call list, blocking – Right to be forgotten -> delete from search engines
  • 13. Today, Will Focus on One Form of Privacy Data Privacy • Data privacy is primarily about how orgs collect, use, and protect sensitive data – Focuses on Personally Identifiable Information (PII) • Ex. Name, street address, unique IDs, pictures – Rules about data use, privacy notices • Led to the Fair Information Practices – Notice / Awareness – Choice / Consent – Access / Participation – Integrity / Security – Enforcement / Redress
  • 14. Some Comments on Data Privacy • Data privacy tends to be procedurally-oriented – Did you follow this set of rules? – Did you check off all of the boxes? – Somewhat hard to measure too (Better? Worse?) – This is in contrast to outcome-oriented • Many laws embody the Fair Information Practices – GDPR, HIPAA, Financial Privacy Act, COPPA, FERPA – But, enforcement is a weakness here • If an org violates, can be hard to detect • In practice, limited resources for enforcement
  • 15. Why is Privacy Hard? #2 No Common Set of Best Practices for Privacy • Security has lots of best practices + tools for devs – Use TLS/SSL – Devices should not have common default passwords – Use firewalls to block unauthorized traffic • For privacy, not so much – Choice / Consent: Best way of offering choice? – Access / Participation: Best way of offering access? – Notice / Awareness: Typically privacy policies, useful?
  • 16. • New York Times privacy policy • Still state of the art for privacy notices • But no one reads these
  • 17. Why is Privacy Hard? #3 Technological Capabilities Rapidly Growing • Data gathering easier and pervasive – Everything on the web (Google + FB) – Sensors (smartphones, IoT) • Data storage and querying bigger and faster • Inferences more powerful – Some examples shortly • Data sharing more widespread – Social media – Lots of companies collecting and sharing with each other, hard to explain to end-users (next slide)
  • 18. • 2010 diagram of ad tech ecosystem • Most of these are collecting and using data about you
  • 19. Built a logistic regression to predict sexuality based on what your friends on Facebook disclosed, even if you didn’t disclose Inferences about people more powerful
  • 20. “[An analyst at Target] was able to identify about 25 products that… allowed him to assign each shopper a ‘pregnancy prediction’ score. [H]e could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.” (NYTimes)
  • 21. Why is Privacy Hard? #4 Multiple Use of the Same Data • The same data can help as well as harm (or creep people out) depending on use and re-use
  • 22. Recap of Why Privacy is Hard • Privacy is a broad and fuzzy term • No common set of best practices • Technological capabilities rapidly growing • Same data can be used for good and for bad • Note that these are just a few reasons, there are many, many more – But enough so that we have common ground
  • 23. Some Smartphone Apps Use Your Data in Unexpected Ways Shared your location, gender, unique phone ID, phone# with advertisers Uploaded your entire contact list to their server (including phone #s)
  • 24. More Unexpected Uses of Your Data Location Data Unique device ID Location Data Network Access Unique device ID Location Data Microphone Unique device ID
  • 25. PrivacyGrade.org • Improve transparency • Assign privacy grades to all 1M+ Android apps • Does not help devs directly
  • 26.
  • 27.
  • 28.
  • 29.
  • 31. Privacy as Expectations Use crowdsourcing to compare what people expect an app to do vs what an app actually does App Behavior (What an app actually does) User Expectations (What people think the app does)
  • 32. How PrivacyGrade Works • We crowdsourced people’s expectations of core set of 837 apps – Ex. “How comfortable are you with Drag Racing using your location for ads?” • We generated purposes by examining what third-party libraries used by app • Created a model to predict people’s likely privacy concerns and applied to 1M Android apps
  • 34. How PrivacyGrade Works • Long tail distribution of libraries • We focused on top 400 libraries, which covers vast majority of cases
  • 35. Impact of PrivacyGrade • Popular Press – NYTimes, CNN, BBC, CBS, more • Government – Earlier work helped lead to FTC fines • Google – Google has something like PrivacyGrade internally • Developers
  • 36. Market Failure for Privacy • Let’s say you want to purchase a web cam – Go into store, can compare price, color, features – But can’t easily compare security (hidden feature) – So, security does not influence customer purchases – So, devs not incentivized to improve • Same is true for privacy – This is where things like PrivacyGrade can help – Improve transparency, address market failures – More broadly, what other ways to incentivize?
  • 37. Study 1 What Do Developers Know about Privacy? • A lot of privacy research is about end-users – Very little about developers • Interviewed 13 app developers • Surveyed 228 app developers – Got a good mix of experiences and size of orgs • What knowledge? What tools used? Incentives? • Are there potential points of leverage? Balebako et al, The Privacy and Security Behaviors of Smartphone App Developers. USEC 2014.
  • 38. Study 1 Summary of Findings Third-party Libraries Problematic • Use ads and analytics to monetize • Hard to understand their behaviors – A few didn’t know they were using libraries (based on inconsistent answers) – Some didn’t know the libraries collected data – “If either Facebook or Flurry had a privacy policy that was short and concise and condensed into real English rather than legalese, we definitely would have read it.” – In a later study we did on apps, we found 40% apps used sensitive data only b/c of libraries [Chitkara 2017]
  • 39. Study 1 Summary of Findings Devs Don’t Know What to Do • Low awareness of existing privacy guidelines – Fair Information Practices, FTC guidelines, Google – Often just ask others around them • Low perceived value of privacy policies – Mostly protection from lawsuits – “I haven’t even read [our privacy policy]. I mean, it’s just legal stuff that’s required, so I just put in there.”
  • 40. Study 2 How do developers address privacy when coding? • Interviewed 9 Android developers • Semi-structured interview probing about their three most recent apps – Their understanding of privacy – Any privacy training they received – What data collected in app and how used • Libraries used? • Was data sent to cloud server? • How and where data stored? – We also checked against their app if on app store
  • 41. Study 2 Findings Inaccurate Understanding of Their Own Apps • Some data practices they claimed didn’t match app behaviors • Lacked knowledge of library behaviors • Fast iterations led to changes in data collection and data use • Team dynamics – Division of labor, don’t know what other devs doing – Turnover, use of sensitive data not documented
  • 42. Study 2 Findings Lack of Knowledge of Alternatives • Many alternatives exist, but often went with first solution found (e.g. StackOverflow) • Example: Many apps use some kind of identifier, and different identifiers have tradeoffs – Hardware identifiers (riskiest since persistent) – Application identifier (email, hashcode) – Advertising identifier
  • 43. Study 2 Findings Lack of Motivation to Address Privacy Issues • Might ignore privacy issues if not required – Ex. Get location permission for one reason (maps), but also use for other reasons (ads) – Ex. Get name and email address, only need email – Ex. Get device ID because no permission needed • Android permissions and Play Store requirements useful in forcing devs to improve – In Android, have to declare use of most sensitive data – Google Play has requirements too (ex. privacy policy)
  • 44. How to Get People to Change Behaviors? Security Sensitivity Stack Awareness Knowledge Motivation Does person know of existing threat? Does person know tools, behaviors, strategies to protect? Can person identify attack / problem? Can person use tools, behaviors, strategies? Does person care?
  • 45. Security Sensitivity Stack Adapted for Developers and Privacy Awareness Knowledge Motivation Are devs aware of privacy problem? Ex. Identifier tradeoffs, library behavior Do devs know how to address? Ex. Might not know right API call Do devs care? Ex. Sometimes ignore issues if not required
  • 46. Privacy-Enhanced Android • A large DARPA project to improve privacy • Key idea: have devs declare in apps the purpose of why sensitive data being used – Devs select from a small set of defined purposes • Today: “Uses location” • Tomorrow: “Uses location for advertising” – Use these purposes to help developers • Managing data better, generate privacy policies, etc – … to check app behaviors throughout ecosystem – … for new kinds of GUIs explaining app behaviors
  • 47. Helping Developers PrivacyStreams Programming Model • Observations – Most apps don’t need raw data (GPS vs City location) – Many ancillary issues (threads, format, different APIs) • PrivacyStreams works like Unix pipes on streams – Easier for developers (threading, uniform API + format) – Devs never see raw data, only final outputs – Also easier to analyze, since one line of code • “This app uses your microphone only to get loudness” UQI.getData(Audio.recordPeriodic(DURATION, INTERVAL), Purpose.HEALTH("monitor sleep")) .setField("loudness", calcLoudness(Audio.AUDIO_DATA)) .forEach("loudness", callback);
  • 48. Helping Developers Coconut IDE Plugin • Developers add some Java annotations for each use of sensitive data • Can offer alternatives • Can aggregate all data use in one place • (Future) Can auto-generate privacy policies
  • 49. Helping App Stores • Ways of checking the behavior of apps – Ex. When devs upload to app store • Decompile the app and examine the text – If app uses location data, and if we see strings from app like exif, photo, tag, probably geotagging • Check network data of apps – Similar to above, except for network traffic • Add safety checks to apps – If app has well-defined policy, can add extra checks to app to make sure it does the right thing
  • 50. Addressing Market Failure (Work in Progress) Who Knows What About Us and Why
  • 51. Addressing Market Failure (Work in Progress) Who Knows What About Us and Why
  • 52. ProtectMyPrivacy (PMP) for Making Decisions • For jailbroken iOS and Android – Intercept calls to sensitive data – Over 200k people using iOS PMP – Over 6M decisions (+ stack traces) – 20 data types protected • Recommender system too – Have recs for 97% of top 10k apps • User study with 1321 people – 1 year with old model (by app) – 1 month with new model (by library)
  • 53. Long Tail of Third-party Libraries Chitkara, S. et al. Why does this app need my Location? Context aware Privacy Management on Android. In IMWUT 1(3). 2017.
  • 54. Most Popular 30 Libraries Account for Over Half of all Sensitive Data Access
  • 55. About 40% Apps Use Sensitive Data Only because of Third-party Libraries
  • 56. Allowing or Denying Access by Library (vs by App) Reduces #Decisions Made
  • 57. How You Can Help with Privacy Some Opportunities • Imagine a gigantic blob of privacy work – This is amount of work needed for “good” privacy – Right now, most of this blob is managed by end-users – What are useful ways of slicing up this blob so that other parts of ecosystem can manage better? • Better decision making by crowds or by experts – How good are decisions? Ways of making better ones? – How to easily share these decisions?
  • 58. How You Can Help with Privacy Some Opportunities • Economics of privacy – GDPR; other ways of addressing market failures? • Ex. Consumer Reports really interested in this area – Third-party services and libraries a major problem • Incentives for privacy – Improving awareness, knowledge, motivation for devs – User attention for privacy • Special cases of privacy law – Privacy for children, healthcare, finances
  • 59.
  • 60. How can we create a connected world we would all want to live in?
  • 61. Thanks! More info at cmuchimps.org or email jasonh@cs.cmu.edu Special thanks to: • DARPA Brandeis • Google • Yuvraj Agarwal • Shah Amini • Rebecca Balebako • Mike Czapik • Matt Fredrikson • Shawn Hanna • Haojian Jin • Tianshi Li • Yuanchun Li • Jialiu Lin • Song Luan • Swarup Sahoo • Mike Villena • Jason Wiese • Alex Yu • And many more… • CMU Cylab • NQ Mobile

Hinweis der Redaktion

  1. Every week, there are headline news articles like these, capturing people’s growing concerns about technology and privacy.
  2. There are also a growing number of guidelines and regulations about how these technologies should be designed and be operated. So even if you don’t personally believe privacy is an issue, it’s still something that has to be addressed in the design and operation of systems we build. https://www.ftc.gov/sites/default/files/documents/reports/mobile-privacy-disclosures-building-trust-through-transparency-federal-trade-commission-staff-report/130201mobileprivacyreport.pdf https://oag.ca.gov/sites/all/files/agweb/pdfs/privacy/privacy_on_the_go.pdf
  3. Will just focus on smartphones for now, since they are the most pervasive devices we have today Representative of many of the problems and opportunities we will be grappling with in the future Smartphones are everywhere http://marketingland.com/report-us-smartphone-penetration-now-75-percent-117746 http://www.pewinternet.org/fact-sheets/mobile-technology-fact-sheet/ http://www.androidauthority.com/google-play-store-vs-the-apple-app-store-601836/
  4. These devices are also incredibly intimate, perhaps the most intimate computing devices we’ve ever created. From Pew Internet and Cisco 2012 study Main stats on this page are from: http://www.cisco.com/c/en/us/solutions/enterprise/connected-world-technology-report/index.html#~2012 Additional stats about mobile phones: http://www.pewinternet.org/fact-sheets/mobile-technology-fact-sheet/ ----------------------- What’s also interesting are trends in how people use these smartphones http://blog.sciencecreative.com/2011/03/16/the-authentic-online-marketer/ http://www.generationalinsights.com/millennials-addicted-to-their-smartphones-some-suffer-nomophobia/ In fact, Millennials don’t just sleep with their smartphones. 75% use them in bed before going to sleep and 90% check them again first thing in the morning.  Half use them while eating and third use them in the bathroom. A third check them every half hour. Another fifth check them every ten minutes. A quarter of them check them so frequently that they lose count. http://www.androidtapp.com/how-simple-is-your-smartphone-to-use-funny-videos/ Pew Research Center Around 83 percent of those 18- to 29-year-olds sleep with their cell phones within reach.  http://persquaremile.com/category/suburbia/
  5. From Cisco report
  6. Also from Cisco report
  7. But it’s not just the devices that are intimate, the data is also intimate. Location, call logs, SMS, pics, more
  8. A grand challenge for computer science http://www.flickr.com/photos/robby_van_moor/478725670/
  9. Data privacy and personal privacy
  10. In contrast to personal privacy, which is mostly about what you do to manage your persona Lots of forms of FIPs, here are the ones from FTC
  11. In contrast to personal privacy, which is mostly about what you do to manage your persona
  12. Hash user passwords
  13. Grade 12.5 About 10 min to read So based on Lorrie and Aleecia’s work, it will take 25 full days to read all privacy policies of all web sites But this assumes people read it Rationale behavior not to read privacy policies: we want to use the service, painful to read, clear cost but unclear benefit
  14. https://adexchanger.com/venture-capital/luma-partners-ad-tech-ecosystem-map-the-december-2010-update/ 2010 diagram
  15. http://firstmonday.org/article/view/2611/2302
  16. http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score.  Later in the article, talks about how one father accidentally discovered his daughter was pregnant b/c of these ads
  17. But, Many Smartphone Apps Access this Sensitive Data in Surprising Ways
  18. Moto Racing / https://play.google.com/store/apps/details?id=com.motogames.supermoto
  19. On the left is Nissan Maxima gear shift. It turns out my brother was driving in 3rd gear for over a year before I pointed out to him that 3 and D are separate. The older Nissan Maxima gear shift on the right makes it hard to make this mistake.
  20. Lin et al, Modeling Users’ Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings. SOUPS 2014. INTERNET, READ_PHONE_STATES, ACCESS_COARSE_LOCATION, ACCESS_FINE_LOCATION, CAMERA, GET_ACCOUNTS, SEND_SMS, READ_SMS, RECORD_AUDIO, BLUE_TOOTH and READ_CONTACT
  21. INTERNET, READ_PHONE_STATES, ACCESS_COARSE_LOCATION, ACCESS_FINE_LOCATION, CAMERA, GET_ACCOUNTS, SEND_SMS, READ_SMS, RECORD_AUDIO, BLUE_TOOTH and READ_CONTACT
  22. http://www.cmuchimps.org/publications/the_privacy_and_security_behaviors_of_smartphone_app_developers_2014/pub_download
  23. Separate study is Chitkara, S., N. Gothoskar, S. Harish, J.I. Hong, Y. Agarwal. Does this App Really Need My Location? Context aware Privacy Management on Android. PACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT) 1(3). 2017. http://www.cmuchimps.org/publications/does_this_app_really_need_my_location_context-aware_privacy_management_for_smartphones_2017
  24. Agarwal, Y., and M. Hall. ProtectMyPrivacy: Detecting and Mitigating Privacy Leaks on iOS Devices Using Crowdsourcing. Mobisys 2013.
  25. 26% decisions reduced in the far right
  26. https://www.flickr.com/photos/johnivara/536856713 https://creativecommons.org/licenses/by-nc-nd/2.0/ Today, we are at a crossroads. There is only one time in human history when a global network of computers is created, and that time is now. And there is only one time in human history when computation, communication, and sensing is woven into our everyday world, and that time is now. Now, I’ve avoided using the term Internet of Things because as you may remember from yesterday, I don’t really like the term. But regardless of what it’s called, it’s coming, and coming soon. And it will offer tremendous benefits to society in terms of safety, sustainability, transportation, health care, and more, but only if we can address the real privacy problems that these same technologies pose. So I’ll end with a question for you to consider:
  27. https://www.flickr.com/photos/johnivara/536856713 https://creativecommons.org/licenses/by-nc-nd/2.0/ Today, we are at a crossroads. There is only one time in human history when a global network of computers is created, and that time is now. And there is only one time in human history when computation, communication, and sensing is woven into our everyday world, and that time is now. Now, I’ve avoided using the term Internet of Things because as you may remember from yesterday, I don’t really like the term. But regardless of what it’s called, it’s coming, and coming soon. And it will offer tremendous benefits to society in terms of safety, sustainability, transportation, health care, and more, but only if we can address the real privacy problems that these same technologies pose. So I’ll end with a question for you to consider:
  28. DARPA Google CMU CyLab