This document discusses fostering an ecosystem for smartphone privacy. It notes that over 1 billion smartphones are sold each year, containing intimate personal data. The researcher's work focused on studying app privacy and building tools like PrivacyGrade.org to grade apps. Studies showed developers have low awareness of privacy issues and tools. The document calls for a better ecosystem where the burden of privacy is shared, and provides opportunities for others to help like improving incentives for developers or addressing economic issues.
3. New Kinds of Guidelines and Regulations
US Federal Trade
Commission guidelines
California Attorney General
recommendations European Union
General Data Protection
4. My Research Focused on Smartphones
and Privacy
• Over 1B smartphones
sold every year
– Perhaps most widely
deployed platform
• Well over 100B apps
downloaded on each of
Android and iOS
• Incredibly intimate devices
7. Smartphones are Intimate
Fun Facts about Millennials
• 83% sleep with phones
• 90% check first thing in morning
• 1 in 3 use in bathroom
8. Smartphone Data is Intimate
Who we know
(contacts + call log)
Sensors
(accel, sound, light)
Where we go
(gps, photos)
9. The Opportunity and the Risk
• There are all these
amazing things we
could do
– Healthcare
– Urban analytics
– Sustainability
• But only if we can
legitimately address
privacy concerns
– Spam, misuse, breaches
http://www.flickr.com/photos/robby_van_moor/478725670/
10. My Main Point: We Need to Foster a
Better Ecosystem of Privacy
• Today, too much burden is on end-users
– Should I install this app?
– What are all the settings I need to know?
– What are all the terms and conditions?
– Trackers, cookies, VPNs, anonymizers, etc
• We need a better ecosystem for privacy
– Push burden from end-users onto rest of ecosystem
– Analogy: Spam email
– Other players: OS, app stores, developers, services,
crowds, policy makers, journalists
11. Today’s Talk
• Why is privacy hard?
• Our research in smartphone privacy
– PrivacyGrade.org for grading app privacy
– Studies on what developers know about privacy
– Helping developers
– Helping app stores
• What you can do to help with privacy
12. Why is Privacy Hard?
#1 Privacy is a broad and fuzzy term
• Privacy is a broad umbrella term that captures
concerns about our relationships with others
– The right to be left alone
– Control and feedback over one’s data
– Anonymity (popular among researchers)
– Presentation of self (impression management)
– Right to be forgotten
– Contextual integrity (take social norms into account)
• Each leads to different way of handling privacy
– Right to be left alone -> do not call list, blocking
– Right to be forgotten -> delete from search engines
13. Today, Will Focus on One Form of Privacy
Data Privacy
• Data privacy is primarily about how orgs collect,
use, and protect sensitive data
– Focuses on Personally Identifiable Information (PII)
• Ex. Name, street address, unique IDs, pictures
– Rules about data use, privacy notices
• Led to the Fair Information Practices
– Notice / Awareness
– Choice / Consent
– Access / Participation
– Integrity / Security
– Enforcement / Redress
14. Some Comments on Data Privacy
• Data privacy tends to be procedurally-oriented
– Did you follow this set of rules?
– Did you check off all of the boxes?
– Somewhat hard to measure too (Better? Worse?)
– This is in contrast to outcome-oriented
• Many laws embody the Fair Information Practices
– GDPR, HIPAA, Financial Privacy Act, COPPA, FERPA
– But, enforcement is a weakness here
• If an org violates, can be hard to detect
• In practice, limited resources for enforcement
15. Why is Privacy Hard?
#2 No Common Set of Best Practices for Privacy
• Security has lots of best practices + tools for devs
– Use TLS/SSL
– Devices should not have common default passwords
– Use firewalls to block unauthorized traffic
• For privacy, not so much
– Choice / Consent: Best way of offering choice?
– Access / Participation: Best way of offering access?
– Notice / Awareness: Typically privacy policies, useful?
16. • New York Times privacy policy
• Still state of the art for privacy notices
• But no one reads these
17. Why is Privacy Hard?
#3 Technological Capabilities Rapidly Growing
• Data gathering easier and pervasive
– Everything on the web (Google + FB)
– Sensors (smartphones, IoT)
• Data storage and querying bigger and faster
• Inferences more powerful
– Some examples shortly
• Data sharing more widespread
– Social media
– Lots of companies collecting and sharing with each
other, hard to explain to end-users (next slide)
18. • 2010 diagram of ad tech ecosystem
• Most of these are collecting and using
data about you
19. Built a logistic regression
to predict sexuality based
on what your friends on
Facebook disclosed, even
if you didn’t disclose
Inferences about people more powerful
20. “[An analyst at Target] was able to identify about
25 products that… allowed him to assign each
shopper a ‘pregnancy prediction’ score. [H]e
could also estimate her due date to within a small
window, so Target could send coupons timed to
very specific stages of her pregnancy.” (NYTimes)
21. Why is Privacy Hard?
#4 Multiple Use of the Same Data
• The same data can help as well as harm (or
creep people out) depending on use and re-use
22. Recap of Why Privacy is Hard
• Privacy is a broad and fuzzy term
• No common set of best practices
• Technological capabilities rapidly growing
• Same data can be used for good and for bad
• Note that these are just a few reasons,
there are many, many more
– But enough so that we have common ground
23. Some Smartphone Apps Use Your Data in
Unexpected Ways
Shared your location,
gender, unique phone ID,
phone# with advertisers
Uploaded your entire
contact list to their server
(including phone #s)
24. More Unexpected Uses of Your Data
Location Data
Unique device ID
Location Data
Network Access
Unique device ID
Location Data
Microphone
Unique device ID
31. Privacy as Expectations
Use crowdsourcing to compare what people expect
an app to do vs what an app actually does
App Behavior
(What an app
actually does)
User Expectations
(What people think
the app does)
32. How PrivacyGrade Works
• We crowdsourced people’s expectations of
core set of 837 apps
– Ex. “How comfortable are you with
Drag Racing using your location for ads?”
• We generated purposes by examining
what third-party libraries used by app
• Created a model to predict people’s likely
privacy concerns and applied to 1M Android apps
34. How PrivacyGrade Works
• Long tail distribution of libraries
• We focused on top 400 libraries, which covers
vast majority of cases
35. Impact of PrivacyGrade
• Popular Press
– NYTimes, CNN, BBC, CBS, more
• Government
– Earlier work helped lead to FTC fines
• Google
– Google has something like PrivacyGrade internally
• Developers
36. Market Failure for Privacy
• Let’s say you want to purchase a web cam
– Go into store, can compare price, color, features
– But can’t easily compare security (hidden feature)
– So, security does not influence customer purchases
– So, devs not incentivized to improve
• Same is true for privacy
– This is where things like PrivacyGrade can help
– Improve transparency, address market failures
– More broadly, what other ways to incentivize?
37. Study 1
What Do Developers Know about Privacy?
• A lot of privacy research is about end-users
– Very little about developers
• Interviewed 13 app developers
• Surveyed 228 app developers
– Got a good mix of experiences and size of orgs
• What knowledge? What tools used? Incentives?
• Are there potential points of leverage?
Balebako et al, The Privacy and Security Behaviors
of Smartphone App Developers. USEC 2014.
38. Study 1 Summary of Findings
Third-party Libraries Problematic
• Use ads and analytics to monetize
• Hard to understand their behaviors
– A few didn’t know they were using libraries
(based on inconsistent answers)
– Some didn’t know the libraries collected data
– “If either Facebook or Flurry had a privacy policy that
was short and concise and condensed into real
English rather than legalese, we definitely would
have read it.”
– In a later study we did on apps, we found 40% apps
used sensitive data only b/c of libraries [Chitkara 2017]
39. Study 1 Summary of Findings
Devs Don’t Know What to Do
• Low awareness of existing privacy guidelines
– Fair Information Practices, FTC guidelines, Google
– Often just ask others around them
• Low perceived value of privacy policies
– Mostly protection from lawsuits
– “I haven’t even read [our privacy policy]. I mean, it’s
just legal stuff that’s required, so I just put in there.”
40. Study 2
How do developers address privacy when coding?
• Interviewed 9 Android developers
• Semi-structured interview probing about their
three most recent apps
– Their understanding of privacy
– Any privacy training they received
– What data collected in app and how used
• Libraries used?
• Was data sent to cloud server?
• How and where data stored?
– We also checked against their app if on app store
41. Study 2 Findings
Inaccurate Understanding of Their Own Apps
• Some data practices they claimed didn’t match
app behaviors
• Lacked knowledge of library behaviors
• Fast iterations led to changes in data collection
and data use
• Team dynamics
– Division of labor, don’t know what other devs doing
– Turnover, use of sensitive data not documented
42. Study 2 Findings
Lack of Knowledge of Alternatives
• Many alternatives exist, but often went with first
solution found (e.g. StackOverflow)
• Example: Many apps use some kind of identifier,
and different identifiers have tradeoffs
– Hardware identifiers (riskiest since persistent)
– Application identifier (email, hashcode)
– Advertising identifier
43. Study 2 Findings
Lack of Motivation to Address Privacy Issues
• Might ignore privacy issues if not required
– Ex. Get location permission for one reason (maps),
but also use for other reasons (ads)
– Ex. Get name and email address, only need email
– Ex. Get device ID because no permission needed
• Android permissions and Play Store requirements
useful in forcing devs to improve
– In Android, have to declare use of most sensitive data
– Google Play has requirements too (ex. privacy policy)
44. How to Get People to Change Behaviors?
Security Sensitivity Stack
Awareness
Knowledge
Motivation
Does person know of existing threat?
Does person know tools, behaviors,
strategies to protect?
Can person identify attack / problem?
Can person use tools, behaviors,
strategies?
Does person care?
45. Security Sensitivity Stack Adapted for
Developers and Privacy
Awareness
Knowledge
Motivation
Are devs aware of privacy problem?
Ex. Identifier tradeoffs, library behavior
Do devs know how to address?
Ex. Might not know right API call
Do devs care?
Ex. Sometimes ignore issues if not required
46. Privacy-Enhanced Android
• A large DARPA project to improve privacy
• Key idea: have devs declare in apps the purpose
of why sensitive data being used
– Devs select from a small set of defined purposes
• Today: “Uses location”
• Tomorrow: “Uses location for advertising”
– Use these purposes to help developers
• Managing data better, generate privacy policies, etc
– … to check app behaviors throughout ecosystem
– … for new kinds of GUIs explaining app behaviors
47. Helping Developers
PrivacyStreams Programming Model
• Observations
– Most apps don’t need raw data (GPS vs City location)
– Many ancillary issues (threads, format, different APIs)
• PrivacyStreams works like Unix pipes on streams
– Easier for developers (threading, uniform API + format)
– Devs never see raw data, only final outputs
– Also easier to analyze, since one line of code
• “This app uses your microphone only to get loudness”
UQI.getData(Audio.recordPeriodic(DURATION, INTERVAL),
Purpose.HEALTH("monitor sleep"))
.setField("loudness", calcLoudness(Audio.AUDIO_DATA))
.forEach("loudness", callback);
48. Helping Developers
Coconut IDE Plugin
• Developers add some Java annotations for each
use of sensitive data
• Can offer alternatives
• Can aggregate all data use in one place
• (Future) Can auto-generate privacy policies
49. Helping App Stores
• Ways of checking the behavior of apps
– Ex. When devs upload to app store
• Decompile the app and examine the text
– If app uses location data, and if we see strings from
app like exif, photo, tag, probably geotagging
• Check network data of apps
– Similar to above, except for network traffic
• Add safety checks to apps
– If app has well-defined policy, can add extra checks
to app to make sure it does the right thing
52. ProtectMyPrivacy (PMP) for Making
Decisions
• For jailbroken iOS and Android
– Intercept calls to sensitive data
– Over 200k people using iOS PMP
– Over 6M decisions (+ stack traces)
– 20 data types protected
• Recommender system too
– Have recs for 97% of top 10k apps
• User study with 1321 people
– 1 year with old model (by app)
– 1 month with new model (by library)
53. Long Tail of Third-party Libraries
Chitkara, S. et al. Why does this app need my Location? Context aware
Privacy Management on Android. In IMWUT 1(3). 2017.
54. Most Popular 30 Libraries Account for
Over Half of all Sensitive Data Access
55. About 40% Apps Use Sensitive Data Only
because of Third-party Libraries
57. How You Can Help with Privacy
Some Opportunities
• Imagine a gigantic blob of privacy work
– This is amount of work needed for “good” privacy
– Right now, most of this blob is managed by end-users
– What are useful ways of slicing up this blob so that
other parts of ecosystem can manage better?
• Better decision making by crowds or by experts
– How good are decisions? Ways of making better ones?
– How to easily share these decisions?
58. How You Can Help with Privacy
Some Opportunities
• Economics of privacy
– GDPR; other ways of addressing market failures?
• Ex. Consumer Reports really interested in this area
– Third-party services and libraries a major problem
• Incentives for privacy
– Improving awareness, knowledge, motivation for devs
– User attention for privacy
• Special cases of privacy law
– Privacy for children, healthcare, finances
59.
60. How can we create
a connected world we
would all want to live in?
61. Thanks!
More info at cmuchimps.org
or email jasonh@cs.cmu.edu
Special thanks to:
• DARPA Brandeis
• Google
• Yuvraj Agarwal
• Shah Amini
• Rebecca Balebako
• Mike Czapik
• Matt Fredrikson
• Shawn Hanna
• Haojian Jin
• Tianshi Li
• Yuanchun Li
• Jialiu Lin
• Song Luan
• Swarup Sahoo
• Mike Villena
• Jason Wiese
• Alex Yu
• And many more…
• CMU Cylab
• NQ Mobile
Hinweis der Redaktion
Every week, there are headline news articles like these, capturing people’s growing concerns about technology and privacy.
There are also a growing number of guidelines and regulations about how these technologies should be designed and be operated. So even if you don’t personally believe privacy is an issue, it’s still something that has to be addressed in the design and operation of systems we build.
https://www.ftc.gov/sites/default/files/documents/reports/mobile-privacy-disclosures-building-trust-through-transparency-federal-trade-commission-staff-report/130201mobileprivacyreport.pdf
https://oag.ca.gov/sites/all/files/agweb/pdfs/privacy/privacy_on_the_go.pdf
Will just focus on smartphones for now, since they are the most pervasive devices we have today
Representative of many of the problems and opportunities we will be grappling with in the future
Smartphones are everywhere
http://marketingland.com/report-us-smartphone-penetration-now-75-percent-117746
http://www.pewinternet.org/fact-sheets/mobile-technology-fact-sheet/
http://www.androidauthority.com/google-play-store-vs-the-apple-app-store-601836/
These devices are also incredibly intimate, perhaps the most intimate computing devices we’ve ever created.
From Pew Internet and Cisco 2012 study
Main stats on this page are from:
http://www.cisco.com/c/en/us/solutions/enterprise/connected-world-technology-report/index.html#~2012
Additional stats about mobile phones:
http://www.pewinternet.org/fact-sheets/mobile-technology-fact-sheet/
-----------------------
What’s also interesting are trends in how people use these smartphones
http://blog.sciencecreative.com/2011/03/16/the-authentic-online-marketer/
http://www.generationalinsights.com/millennials-addicted-to-their-smartphones-some-suffer-nomophobia/
In fact, Millennials don’t just sleep with their smartphones. 75% use them in bed before going to sleep and 90% check them again first thing in the morning. Half use them while eating and third use them in the bathroom. A third check them every half hour. Another fifth check them every ten minutes. A quarter of them check them so frequently that they lose count.
http://www.androidtapp.com/how-simple-is-your-smartphone-to-use-funny-videos/
Pew Research Center
Around 83 percent of those 18- to 29-year-olds sleep with their cell phones within reach.
http://persquaremile.com/category/suburbia/
From Cisco report
Also from Cisco report
But it’s not just the devices that are intimate, the data is also intimate.
Location, call logs, SMS, pics, more
A grand challenge for computer science
http://www.flickr.com/photos/robby_van_moor/478725670/
Data privacy and personal privacy
In contrast to personal privacy, which is mostly about what you do to manage your persona
Lots of forms of FIPs, here are the ones from FTC
In contrast to personal privacy, which is mostly about what you do to manage your persona
Hash user passwords
Grade 12.5
About 10 min to read
So based on Lorrie and Aleecia’s work, it will take 25 full days to read all privacy policies of all web sites
But this assumes people read it
Rationale behavior not to read privacy policies: we want to use the service, painful to read, clear cost but unclear benefit
http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html
As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score.
Later in the article, talks about how one father accidentally discovered his daughter was pregnant b/c of these ads
But, Many Smartphone Apps Access this Sensitive Data in Surprising Ways
Moto Racing / https://play.google.com/store/apps/details?id=com.motogames.supermoto
On the left is Nissan Maxima gear shift. It turns out my brother was driving in 3rd gear for over a year before I pointed out to him that 3 and D are separate. The older Nissan Maxima gear shift on the right makes it hard to make this mistake.
Lin et al, Modeling Users’ Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings. SOUPS 2014.
INTERNET, READ_PHONE_STATES, ACCESS_COARSE_LOCATION, ACCESS_FINE_LOCATION, CAMERA, GET_ACCOUNTS, SEND_SMS, READ_SMS, RECORD_AUDIO, BLUE_TOOTH and READ_CONTACT
Separate study is
Chitkara, S., N. Gothoskar, S. Harish, J.I. Hong, Y. Agarwal. Does this App Really Need My Location? Context aware Privacy Management on Android. PACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT) 1(3). 2017.http://www.cmuchimps.org/publications/does_this_app_really_need_my_location_context-aware_privacy_management_for_smartphones_2017
Agarwal, Y., and M. Hall. ProtectMyPrivacy: Detecting and Mitigating Privacy Leaks on iOS Devices Using Crowdsourcing. Mobisys 2013.
26% decisions reduced in the far right
https://www.flickr.com/photos/johnivara/536856713
https://creativecommons.org/licenses/by-nc-nd/2.0/
Today, we are at a crossroads. There is only one time in human history when a global network of computers is created, and that time is now. And there is only one time in human history when computation, communication, and sensing is woven into our everyday world, and that time is now. Now, I’ve avoided using the term Internet of Things because as you may remember from yesterday, I don’t really like the term. But regardless of what it’s called, it’s coming, and coming soon. And it will offer tremendous benefits to society in terms of safety, sustainability, transportation, health care, and more, but only if we can address the real privacy problems that these same technologies pose. So I’ll end with a question for you to consider:
https://www.flickr.com/photos/johnivara/536856713
https://creativecommons.org/licenses/by-nc-nd/2.0/
Today, we are at a crossroads. There is only one time in human history when a global network of computers is created, and that time is now. And there is only one time in human history when computation, communication, and sensing is woven into our everyday world, and that time is now. Now, I’ve avoided using the term Internet of Things because as you may remember from yesterday, I don’t really like the term. But regardless of what it’s called, it’s coming, and coming soon. And it will offer tremendous benefits to society in terms of safety, sustainability, transportation, health care, and more, but only if we can address the real privacy problems that these same technologies pose. So I’ll end with a question for you to consider: